AI Research News Feeds for January 16th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

GANeXt: A Fully ConvNeXt-Enhanced Generative Adversarial Network for MRI- and CBCT-to-CT Synthesis : Abstract: The synthesis of computed tomography (CT) from magnetic resonance imaging (MRI) and cone-beam CT (CBCT) plays a critical role in clinical treatment planning by enabling accurate anatomical r...
SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping : Abstract: High-resolution mapping of canopy height is essential for forest management and biodiversity monitoring. Although recent studies have led to the advent of deep learning methods using satelli...
RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation : Abstract: Referring Remote Sensing Image Segmentation (RRSIS) aims to segment target objects in remote sensing (RS) images based on textual descriptions. Although Segment Anything Model 2 (SAM2) has s...
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation : Abstract: Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work...
Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement : Abstract: Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitig...
Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer : Abstract: In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling ...
Data-Driven Feature Tracking for Event Cameras With and Without Frames : Abstract: Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tr...
Spatial As Deep: Spatial CNN for Traffic Scene Understanding : Abstract: Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its...
Multi-Objective Pareto-Front Optimization for Efficient Adaptive VVC Streaming : Abstract: Adaptive video streaming has facilitated improved video streaming over the past years. A balance among coding performance objectives such as bitrate, video quality, and decoding complexity i...
Subjective evaluation of UHD video coded using VVC with LCEVC and ML-VVC : Abstract: This paper presents the results of a subjective quality assessment of a multilayer video coding configuration in which Low Complexity Enhancement Video Coding (LCEVC) is applied as an enhanc...
Cell Behavior Video Classification Challenge, a benchmark for computer vision methods in time-lapse microscopy : Abstract: The classification of microscopy videos capturing complex cellular behaviors is crucial for understanding and quantifying the dynamics of biological processes over time. However, it remains ...
EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing : Abstract: Speech-driven 3D facial animation aims to generate realistic and expressive facial motions directly from audio. While recent methods achieve high-quality lip synchronization, they often rely...
WildRayZer: Self-supervised Large View Synthesis in Dynamic Environments : Abstract: We present WildRayZer, a self-supervised framework for novel view synthesis (NVS) in dynamic environments where both the camera and objects move. Dynamic content breaks the multi-view consis...
Alterbute: Editing Intrinsic Attributes of Objects in Images : Abstract: We introduce Alterbute, a diffusion-based method for editing an object's intrinsic attributes in an image. We allow changing color, texture, material, and even the shape of an object, while ...
From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion : Abstract: Vision-Language Models (VLMs) create a severe visual feature bottleneck by using a crude, asymmetric connection that links only the output of the vision encoder to the input of the large lan...
A continental-scale dataset of ground beetles with high-resolution images and validated morphological trait measurements : Abstract: Despite the ecological significance of invertebrates, global trait databases remain heavily biased toward vertebrates and plants, limiting comprehensive ecological analyses of high-diversity...
CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning : Abstract: Recent advancements in video models have shown tremendous progress, particularly in long video understanding. However, current benchmarks predominantly feature western-centric data and Engli...
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos : Abstract: In this paper, we find that the generation of 3D human motions and 2D human videos is intrinsically coupled. 3D motions provide the structural prior for plausibility and consistency in video...
RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation : Abstract: Talking head generation is increasingly important in virtual reality (VR), especially for social scenarios involving multi-turn conversation. Existing approaches face notable limitations: me...
Action100M: A Large-scale Video Action Dataset : Abstract: Inferring physical actions from visual observations is a fundamental capability for advancing machine intelligence in the physical world. Achieving this requires large-scale, open-vocabulary...
Jordan-Segmentable Masks: A Topology-Aware definition for characterizing Binary Image Segmentation : Abstract: Image segmentation plays a central role in computer vision. However, widely used evaluation metrics, whether pixel-wise, region-based, or boundary-focused, often struggle to capture the stru...
DeepUrban: Interaction-Aware Trajectory Prediction and Planning for Automated Driving by Aerial Imagery : Abstract: The efficacy of autonomous driving systems hinges critically on robust prediction and planning capabilities. However, current benchmarks are impeded by a notable scarcity of scenarios featur...
Inference-time Physics Alignment of Video Generative Models with Latent World Models : Abstract: State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility. While some attribute this deficiency to insuffic...
Unleashing the Capabilities of Large Vision-Language Models for Intelligent Perception of Roadside Infrastructure : Abstract: Automated perception of urban roadside infrastructure is crucial for smart city management, yet general-purpose models often struggle to capture the necessary fine-grained attributes and dom...
Enhancing the quality of gauge images captured in smoke and haze scenes through deep learning : Abstract: Images captured in hazy and smoky environments suffer from reduced visibility, posing a challenge when monitoring infrastructures and hindering emergency services during critical situations....
SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery : Abstract: The automated creation of digital twins and precise asset inventories is a critical task in smart city construction and facility lifecycle management. However, utilizing cost-effective spars...
BikeActions: An Open Platform and Benchmark for Cyclist-Centric VRU Action Recognition : Abstract: Anticipating the intentions of Vulnerable Road Users (VRUs) is a critical challenge for safe autonomous driving (AD) and mobile robotics. While current research predominantly focuses on pede...
mergetune: Continued fine-tuning of vision-language models : Abstract: Fine-tuning vision-language models (VLMs) such as CLIP often leads to catastrophic forgetting of pretrained knowledge. Prior work primarily aims to mitigate forgetting during adaptation; how...
Lunar-G2R: Geometry-to-Reflectance Learning for High-Fidelity Lunar BRDF Estimation : Abstract: We address the problem of estimating realistic, spatially varying reflectance for complex planetary surfaces such as the lunar regolith, which is critical for high-fidelity rendering and vis...
Multi-Temporal Frames Projection for Dynamic Processes Fusion in Fluorescence Microscopy : Abstract: Fluorescence microscopy is widely employed for the analysis of living biological samples; however, the utility of the resulting recordings is frequently constrained by noise, temporal variab...
Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs : Abstract: Text-guided human pose editing has gained significant traction in AIGC applications. However,it remains plagued by structural anomalies and generative artifacts. Existing evaluation metrics ...
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders : Abstract: Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with lar...
SRAW-Attack: Space-Reweighted Adversarial Warping Attack for SAR Target Recognition : Abstract: Synthetic aperture radar (SAR) imagery exhibits intrinsic information sparsity due to its unique electromagnetic scattering mechanism. Despite the widespread adoption of deep neural network ...
Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models : Abstract: Existing adversarial attacks for VLP models are mostly sample-specific, resulting in substantial computational overhead when scaled to large datasets or new scenarios. To overcome this limit...
Attend to what I say: Highlighting relevant content on slides : Abstract: Imagine sitting in a presentation, trying to follow the speaker while simultaneously scanning the slides for relevant information. While the entire slide is visible, identifying the relevant...
Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge : Abstract: Multimodal Large Language Models (MLLMs) struggle with complex video QA benchmarks like HD-EPIC VQA due to ambiguous queries/options, poor long-range temporal reasoning, and non-standardized...
Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation : Abstract: Camera control has been extensively studied in conditioned video generation; however, performing precisely altering the camera trajectories while faithfully preserving the video content rema...
ELITE: Efficient Gaussian Head Avatar from a Monocular Video via Learned Initialization and TEst-time Generative Adaptation : Abstract: We introduce ELITE, an Efficient Gaussian head avatar synthesis from a monocular video via Learned Initialization and TEst-time generative adaptation. Prior works rely either on a 3D data pr...
From Physical Degradation Models to Task-Aware All-in-One Image Restoration : Abstract: All-in-one image restoration aims to adaptively handle multiple restoration tasks with a single trained model. Although existing methods achieve promising results by introducing prompt infor...
Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method : Abstract: Recent progress in reasoning capabilities of Multimodal Large Language Models(MLLMs) has highlighted their potential for performing complex video understanding tasks. However, in the domain ...
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation : Abstract: Consistency learning with feature perturbation is a widely used strategy in semi-supervised medical image segmentation. However, many existing perturbation methods rely on dropout, and thus ...
Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL : Abstract: Vision In-Context Learning (VICL) enables inpainting models to quickly adapt to new visual tasks from only a few prompts. However, existing methods suffer from two key issues: (1) selecting ...
Enhancing Visual In-Context Learning by Multi-Faceted Fusion : Abstract: Visual In-Context Learning (VICL) has emerged as a powerful paradigm, enabling models to perform novel visual tasks by learning from in-context examples. The dominant "retrieve-then-prompt" ...
InfoSculpt: Sculpting the Latent Space for Generalized Category Discovery : Abstract: Generalized Category Discovery (GCD) aims to classify instances from both known and novel categories within a large-scale unlabeled dataset, a critical yet challenging task for real-world, o...
UEOF: A Benchmark Dataset for Underwater Event-Based Optical Flow : Abstract: Underwater imaging is fundamentally challenging due to wavelength-dependent light attenuation, strong scattering from suspended particles, turbidity-induced blur, and non-uniform illuminatio...
Disentangled Concept Representation for Text-to-image Person Re-identification : Abstract: Text-to-image person re-identification (TIReID) aims to retrieve person images from a large gallery given free-form textual descriptions. TIReID is challenging due to the substantial modalit...
DW-DGAT: Dynamically Weighted Dual Graph Attention Network for Neurodegenerative Disease Diagnosis : Abstract: Parkinson's disease (PD) and Alzheimer's disease (AD) are the two most prevalent and incurable neurodegenerative diseases (NDs) worldwide, for which early diagnosis is critical to delay thei...
DR$^2$Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models : Abstract: Reasoning segmentation is an emerging vision-language task that requires reasoning over intricate text queries to precisely segment objects. However, existing methods typically suffer from o...
The Spatial Blindspot of Vision-Language Models : Abstract: Vision-language models (VLMs) have advanced rapidly, but their ability to capture spatial relationships remains a blindspot. Current VLMs are typically built with contrastive language-image ...
OT-Drive: Out-of-Distribution Off-Road Traversable Area Segmentation via Optimal Transport : Abstract: Reliable traversable area segmentation in unstructured environments is critical for planning and decision-making in autonomous driving. However, existing data-driven approaches often suffer ...
UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval : Abstract: Effective retrieval across both seen and unseen categories is crucial for modern image retrieval systems. Retrieval on seen categories ensures precise recognition of known classes, while ret...
NanoSD: Edge Efficient Foundation Model for Real Time Image Restoration : Abstract: Latent diffusion models such as Stable Diffusion 1.5 offer strong generative priors that are highly valuable for image restoration, yet their full pipelines remain too computationally heavy ...
LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving : Abstract: Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving. To ensure high detection performance, Autonomous Vehicles complement RGB c...
Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay : Abstract: Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass safety guardrails. Current safety alignment m...
ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding : Abstract: Recent Omni-multimodal Large Language Models show promise in unified audio, vision, and text modeling. However, streaming audio-video understanding remains challenging, as existing approache...
Self-reflection in Automated Qualitative Coding: Improving Text Annotation through Secondary LLM Critique : Abstract: Large language models (LLMs) allow for sophisticated qualitative coding of large datasets, but zero- and few-shot classifiers can produce an intolerable number of errors, even with careful, ...
Multi-Level Embedding Conformer Framework for Bengali Automatic Speech Recognition : Abstract: Bengali, spoken by over 300 million people, is a morphologically rich and lowresource language, posing challenges for automatic speech recognition (ASR). This research presents an end-to-end...
Detecting Winning Arguments with Large Language Models and Persuasion Strategies : Abstract: Detecting persuasion in argumentative text is a challenging task with important implications for understanding human communication. This work investigates the role of persuasion strategies -...
Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs : Abstract: Large language models (LLMs) can increase users' perceived trust by verbalizing confidence in their outputs. However, prior work has shown that LLMs are often overconfident, making their sta...
Form and Meaning in Intrinsic Multilingual Evaluations : Abstract: Intrinsic evaluation metrics for conditional language models, such as perplexity or bits-per-character, are widely used in both mono- and multilingual settings. These metrics are rather stra...
PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support. While Reinforcement Learning (RL) has been...
AEQ-Bench: Measuring Empathy of Omni-Modal Large Models : Abstract: While the automatic evaluation of omni-modal large models (OLMs) is essential, assessing empathy remains a significant challenge due to its inherent affectivity. To investigate this challeng...
DR-Arena: an Automated Evaluation Framework for Deep Research Agents : Abstract: As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance...
SurgGoal: Rethinking Surgical Planning Evaluation via Goal-Satisfiability : Abstract: Surgical planning integrates visual perception, long-horizon reasoning, and procedural knowledge, yet it remains unclear whether current evaluation protocols reliably assess vision-language ...
TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction : Abstract: Recent advances in synthetic data generation have shown that compact language models can be trained effectively when the underlying corpus is structurally controlled and linguistically coher...
INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects : Abstract: Recent NLP advances focus primarily on standardized languages, leaving most low-resource dialects under-served especially in Indian scenarios. In India, the issue is particularly important: ...
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models : Abstract: Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We investigate the structure of the space of ...
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text : Abstract: Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic mu...
Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis : Abstract: In this paper, we present BAR-SQL (Boundary-Aware Reliable NL2SQL), a unified training framework that embeds reliability and boundary awareness directly into the generation process. We intro...
ADVOSYNTH: A Synthetic Multi-Advocate Dataset for Speaker Identification in Courtroom Scenarios : Abstract: As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosyn...
Multilinguality as Sense Adaptation : Abstract: We approach multilinguality as sense adaptation: aligning latent meaning representations across languages rather than relying solely on shared parameters and scale. In this paper, we introdu...
The Straight and Narrow: Do LLMs Possess an Internal Moral Path? : Abstract: Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as superficial guardrails, leaving the intrinsic mo...
Measuring Affinity between Attention-Head Weight Subspaces via the Projection Kernel : Abstract: Understanding relationships between attention heads is essential for interpreting the internal structure of Transformers, yet existing metrics do not capture this structure well. We focus on...
coTherapist: A Behavior-Aligned Small Language Model to Support Mental Healthcare Experts : Abstract: Access to mental healthcare is increasingly strained by workforce shortages and rising demand, motivating the development of intelligent systems that can support mental healthcare experts. W...
GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients : Abstract: Recent advances in Large Language Models (LLMs) have improved multi-step reasoning. Most approaches rely on Chain-of-Thought (CoT) rationales. Previous studies have shown that LLMs often gen...
HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and generation, serving as the foundation for advanced persona simulation and Role-Playing Language Agents...
Credit C-GPT: A Domain-Specialized Large Language Model for Conversational Understanding in Vietnamese Debt Collection : Abstract: Debt collection is a critical function within the banking, financial services, and insurance (BFSI) sector, relying heavily on large-scale human-to-human conversational interactions conducte...
What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models : Abstract: Most interpretability work focuses on layer- or neuron-level mechanisms in Transformers, leaving expert-level behavior in MoE LLMs underexplored. Motivated by functional specialization in th...
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback : Abstract: While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks. Monitoring step-level tool invocation behaviors in...
Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation : Abstract: Large reasoning models such as DeepSeek-R1 and their distilled variants achieve strong performance on complex reasoning tasks. Yet, distilling these models often demands large-scale data for...
CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking : Abstract: Large Language Models (LLMs) are increasingly used in mental health-related settings, yet they struggle to sustain realistic, goal-directed dialogue over extended interactions. While LLMs ge...
Is MT Ready for the Next Crisis or Pandemic? : Abstract: Communication in times of crisis is essential. However, there is often a mismatch between the language of governments, aid providers, doctors, and those to whom they are providing aid. Comme...
Deriving Character Logic from Storyline as Codified Decision Trees : Abstract: Role-playing (RP) agents rely on behavioral profiles to act consistently across diverse narrative contexts, yet existing profiles are largely unstructured, non-executable, and weakly validat...
Long-Chain Reasoning Distillation via Adaptive Prefix Alignment : Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in solving complex mathematical problems. Recent studies show that distilling long reasoning tr...
EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels : Abstract: This paper introduces EmplifAI, a Japanese empathetic dialogue dataset designed to support patients coping with chronic medical conditions. They often experience a wide range of positive and...
EHRNavigator: A Multi-Agent System for Patient-Level Clinical Question Answering over Heterogeneous Electronic Health Records : Abstract: Clinical decision-making increasingly relies on timely and context-aware access to patient information within Electronic Health Records (EHRs), yet most existing natural language question-an...
SocraticKG: Knowledge Graph Construction via QA-Driven Fact Extraction : Abstract: Constructing Knowledge Graphs (KGs) from unstructured text provides a structured framework for knowledge representation and reasoning, yet current LLM-based approaches struggle with a fundam...
Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations : Abstract: Standardized math assessments require expensive human pilot studies to establish the difficulty of test items. We investigate the predictive value of open-source large language models (LLMs)...
Clozing the Gap: Exploring Why Language Model Surprisal Outperforms Cloze Surprisal : Abstract: How predictable a word is can be quantified in two ways: using human responses to the cloze task or using probabilities from language models (LMs).When used as predictors of processing effor...
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL : Abstract: Real-world clinical text-to-SQL requires reasoning over heterogeneous EHR tables, temporal windows, and patient-similarity cohorts to produce executable queries. We introduce CLINSQL, a benc...
Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences : Abstract: Language places subtle constraints on how we make inductive inferences. Developmental evidence by Gelman et al. (2002) has shown children (4 years and older) to differentiate among generic s...
Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations : Abstract: Evaluating personality traits in Large Language Models (LLMs) is key to model interpretation, comparison, and responsible deployment. However, existing questionnaire-based evaluation methods...
Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings : Abstract: With hundreds of multilingual embedding models available, practitioners lack clear guidance on which provide genuine cross-lingual semantic alignment versus task performance through language...
Geometric Patterns of Meaning: A PHATE Manifold Analysis of Multi-lingual Embeddings : Abstract: We introduce a multi-level analysis framework for examining semantic geometry in multilingual embeddings, implemented through Semanscope (a visualization tool that applies PHATE manifold lea...
Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation : Abstract: Punctuation plays a critical role in resolving semantic and structural ambiguity in written language. Machine Translation (MT) systems are now widely applied across diverse domains and langu...
Opportunities and Challenges of Natural Language Processing for Low-Resource Senegalese Languages in Social Science Research : Abstract: Natural Language Processing (NLP) is rapidly transforming research methodologies across disciplines, yet African languages remain largely underrepresented in this technological shift. This p...
LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue : Abstract: Proactively predicting a users next utterance in human-machine dialogue can streamline interaction and improve user experience. Existing commercial API-based solutions are subject to privacy...
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction : Abstract: Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed...
Instance-level quantitative saliency in multiple sclerosis lesion segmentation : Abstract: Explainable artificial intelligence (XAI) methods have been proposed to interpret model decisions in classification and, more recently, in semantic segmentation. However, instance-level XAI ...
Arbitrary Polynomial Separations in Trainable Quantum Machine Learning : Abstract: Recent theoretical results in quantum machine learning have demonstrated a general trade-off between the expressive power of quantum neural networks (QNNs) and their trainability; as a corol...
Learning Physics-Informed Noise Models from Dark Frames for Low-Light Raw Image Denoising : Abstract: Recently, the mainstream practice for training low-light raw image denoising methods has shifted towards employing synthetic data. Noise modeling, which focuses on characterizing the noise d...
Deep Jump Gaussian Processes for Surrogate Modeling of High-Dimensional Piecewise Continuous Functions : Abstract: We introduce Deep Jump Gaussian Processes (DJGP), a novel method for surrogate modeling of a piecewise continuous function on a high-dimensional domain. DJGP addresses the limitations of con...
See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection : Abstract: Recent advances in end-to-end autonomous driving show that policies trained on patch-aligned features extracted from foundation models generalize better to Out-of-Distribution (OOD). We hypo...
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution : Abstract: Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop...
Adjusted Similarity Measures and a Violation of Expectations : Abstract: Adjusted similarity measures, such as Cohen's kappa for inter-rater reliability and the adjusted Rand index used to compare clustering algorithms, are a vital tool for comparing discrete lab...
Classification Imbalance as Transfer Learning : Abstract: Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution in...
Parametric RDT approach to computational gap of symmetric binary perceptron : Abstract: We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of \emph{fully lifted random duality theory} (fl-RDT) [...
Searching for Quantum Effects in the Brain: A Bell-Type Test for Nonclassical Latent Representations in Autoencoders : Abstract: Whether neural information processing is entirely classical or involves quantum-mechanical elements remains an open question. Here we propose a model-agnostic, information-theoretic test of ...
Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure : Abstract: Selective knowledge erasure from LLMs is critical for GDPR compliance and model safety, yet current unlearning methods conflate behavioral suppression with true knowledge removal, allowing l...
CoGen: Creation of Reusable UI Components in Figma via Textual Commands : Abstract: The evolution of User Interface design has emphasized the need for efficient, reusable, and editable components to ensure an efficient design process. This research introduces CoGen, a syste...
Coarsening Causal DAG Models : Abstract: Directed acyclic graphical (DAG) models are a powerful tool for representing causal relationships among jointly distributed random variables, especially concerning data from across different...
CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data : Abstract: With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- ...
H-EFT-VA: An Effective-Field-Theory Variational Ansatz with Provable Barren Plateau Avoidance : Abstract: Variational Quantum Algorithms (VQAs) are critically threatened by the Barren Plateau (BP) phenomenon. In this work, we introduce the H-EFT Variational Ansatz (H-EFT-VA), an architecture ins...
LangLasso: Interactive Cluster Descriptions through LLM Explanation : Abstract: Dimensionality reduction is a powerful technique for revealing structure and potential clusters in data. However, as the axes are complex, non-linear combinations of features, they often lac...
Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics : Abstract: Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, including the case of a high-amplitude vibration of a string. A mod...
An analytic theory of convolutional neural network inverse problems solvers : Abstract: Supervised convolutional neural networks (CNNs) are widely used to solve imaging inverse problems, achieving state-of-the-art performance in numerous applications. However, despite their emp...
An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit : Abstract: Finding the most relevant person for a job proposal in real time is challenging, especially when resumes are long, structured, and multilingual. In this paper, we propose a re-ranking model ...
Sim2Real Deep Transfer for Per-Device CFO Calibration : Abstract: Carrier Frequency Offset (CFO) estimation in Orthogonal Frequency Division Multiplexing (OFDM) systems faces significant performance degradation across heterogeneous software-defined radio (...
Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting : Abstract: In 1888, Vincent van Gogh wrote, "I am seeking exaggeration in the essential." This principle, amplifying structural form while suppressing photographic detail, lies at the core of Post-Impr...
Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition : Abstract: Particularly, financial named-entity recognition (NER) is one of the many important approaches to translate unformatted reports and news into structured knowledge graphs. However, free, easy...
SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations : Abstract: Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs), and enhance patient care. However, this i...
VibrantSR: Sub-Meter Canopy Height Models from Sentinel-2 Using Generative Flow Matching : Abstract: We present VibrantSR (Vibrant Super-Resolution), a generative super-resolution framework for estimating 0.5 meter canopy height models (CHMs) from 10 meter Sentinel-2 imagery. Unlike approac...
Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP : Abstract: CLIP has become a cornerstone of multimodal representation learning, yet improving its performance typically requires a prohibitively costly process of training from scratch on billions of s...
Accelerated Regularized Wasserstein Proximal Sampling Algorithms : Abstract: We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consid...
Detecting Batch Heterogeneity via Likelihood Clustering : Abstract: Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sa...
DInf-Grid: A Neural Differential Equation Solver with Differentiable Feature Grids : Abstract: We present a novel differentiable grid-based representation for efficiently solving differential equations (DEs). Widely used architectures for neural solvers, such as sinusoidal neural netw...
High-accuracy and dimension-free sampling with diffusions : Abstract: Diffusion models have shown remarkable empirical success in sampling from rich multi-modal distributions. Their inference relies on numerically solving a certain differential equation. This ...
Distributed Perceptron under Bounded Staleness, Partial Participation, and Noisy Communication : Abstract: We study a semi-asynchronous client-server perceptron trained via iterative parameter mixing (IPM-style averaging): clients run local perceptron updates and a server forms a global model by ...
Communication-Efficient and Privacy-Adaptable Mechanism -- a Federated Learning Scheme with Convergence Analysis : Abstract: Federated learning enables multiple parties to jointly train learning models without sharing their own underlying data, offering a practical pathway to privacy-preserving collaboration under...
Data-driven stochastic reduced-order modeling of parametrized dynamical systems : Abstract: Modeling complex dynamical systems under varying conditions is computationally intensive, often rendering high-fidelity simulations intractable. Although reduced-order models (ROMs) offer a ...
Single-Stage Huffman Encoder for ML Compression : Abstract: Training and serving Large Language Models (LLMs) require partitioning data across multiple accelerators, where collective operations are frequently bottlenecked by network bandwidth. Lossle...
STEM: Scaling Transformers with Embedding Modules : Abstract: Fine-grained sparsity promises higher parametric capacity without proportional per-token compute, but often suffers from training instability, load balancing, and communication overhead. We ...
Combinatorial Optimization Augmented Machine Learning : Abstract: Combinatorial optimization augmented machine learning (COAML) has recently emerged as a powerful paradigm for integrating predictive models with combinatorial decision-making. By embedding c...
Kolmogorov Arnold Networks and Multi-Layer Perceptrons: A Paradigm Shift in Neural Modelling : Abstract: The research undertakes a comprehensive comparative analysis of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptrons (MLP), highlighting their effectiveness in solving essential comp...
Mixtures of Transparent Local Models : Abstract: The predominance of machine learning models in many spheres of human activity has led to a growing demand for their transparency. The transparency of models makes it possible to discern some...
Transformer-Based Cognitive Radio: Adaptive Modulation Strategies Using Transformer Models : Abstract: Cognitive Radio (CR) systems, which dynamically adapt to changing spectrum environments, could benefit significantly from advancements in machine learning technologies. These systems can be ...
Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients : Abstract: Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce communication overhea...
DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction : Abstract: We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibiti...
Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching : Abstract: We study tabular reinforcement learning problems with multiple steps of lookahead information. Before acting, the learner observes $\ell$ steps of future transition and reward realizations: ...
CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning : Abstract: Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to backdoor attacks. Existing attack strategies typically struggle against s...
Discrete Feynman-Kac Correctors : Abstract: Discrete diffusion models have recently emerged as a promising alternative to the autoregressive approach for generating discrete sequences. Sample generation via gradual denoising or demask...
PLGC: Pseudo-Labeled Graph Condensation : Abstract: Large graph datasets make training graph neural networks (GNNs) computationally costly. Graph condensation methods address this by generating small synthetic graphs that approximate the orig...
EvoMorph: Counterfactual Explanations for Continuous Time-Series Extrinsic Regression Applied to Photoplethysmography : Abstract: Wearable devices enable continuous, population-scale monitoring of physiological signals, such as photoplethysmography (PPG), creating new opportunities for data-driven clinical assessment. ...
Meta Dynamic Graph for Traffic Flow Prediction : Abstract: Traffic flow prediction is a typical spatio-temporal prediction problem and has a wide range of applications. The core challenge lies in modeling the underlying complex spatio-temporal depen...
We Need a More Robust Classifier: Dual Causal Learning Empowers Domain-Incremental Time Series Classification : Abstract: The World Wide Web thrives on intelligent services that rely on accurate time series classification, which has recently witnessed significant progress driven by advances in deep learning. Ho...
Early Fault Detection on CMAPSS with Unsupervised LSTM Autoencoders : Abstract: This paper introduces an unsupervised health-monitoring framework for turbofan engines that does not require run-to-failure labels. First, operating-condition effects in NASA CMAPSS sensor s...
In-Context Source and Channel Coding : Abstract: Separate Source-Channel Coding (SSCC) remains attractive for text transmission due to its modularity and compatibility with mature entropy coders and powerful channel codes. However, SSCC of...
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD : Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions re...
Graph Regularized PCA : Abstract: High-dimensional data often exhibit dependencies among variables that violate the isotropic-noise assumption under which principal component analysis (PCA) is optimal. For cases where the no...
Reinforcement Learning to Discover a NorthEast Monsoon Index for Monthly Rainfall Prediction in Thailand : Abstract: Climate prediction is a challenge due to the intricate spatiotemporal patterns within Earth systems. Global climate indices, such as the El Niño Southern Oscillation, are standard input feat...
Bias in the Shadows: Explore Shortcuts in Encrypted Network Traffic Classification : Abstract: Pre-trained models operating directly on raw bytes have achieved promising performance in encrypted network traffic classification (NTC), but often suffer from shortcut learning-relying on s...
CC-OR-Net: A Unified Framework for LTV Prediction through Structural Decoupling : Abstract: Customer Lifetime Value (LTV) prediction, a central problem in modern marketing, is characterized by a unique zero-inflated and long-tail data distribution. This distribution presents two fu...
Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text : Abstract: Multimodal models excel in English, supported by abundant image-text and audio-text data, but performance drops sharply for other languages due to limited multilingual multimodal resources. ...
Bayesian Meta-Analyses Could Be More: A Case Study in Trial of Labor After a Cesarean-section Outcomes and Complications : Abstract: The meta-analysis's utility is dependent on previous studies having accurately captured the variables of interest, but in medical studies, a key decision variable that impacts a physician's ...
Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection : Abstract: Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine l...
Comparative Evaluation of Deep Learning-Based and WHO-Informed Approaches for Sperm Morphology Assessment : Abstract: Assessment of sperm morphological quality remains a critical yet subjective component of male fertility evaluation, often limited by inter-observer variability and resource constraints. This...
Efficient Content-based Recommendation Model Training via Noise-aware Coreset Selection : Abstract: Content-based recommendation systems (CRSs) utilize content features to predict user-item interactions, serving as essential tools for helping users navigate information-rich web services. H...
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers : Abstract: Large language models (LLMs) exhibit impressive in-context learning (ICL) capabilities, yet the quality of their predictions is fundamentally limited by the few costly labeled demonstrations...
BPE: Behavioral Profiling Ensemble : Abstract: Ensemble learning is widely recognized as a pivotal strategy for pushing the boundaries of predictive performance. Traditional static ensemble methods, such as Stacking, typically assign wei...
Time Aggregation Features for XGBoost Models : Abstract: This paper studies time aggregation features for XGBoost models in click-through rate prediction. The setting is the Avazu click-through rate prediction dataset with strict out-of-time split...
CAFEDistill: Learning Personalized and Dynamic Models through Federated Early-Exit Network Distillation : Abstract: Personalized Federated Learning (PFL) enables collaboratively model training on decentralized, heterogeneous data while tailoring them to each client's unique distribution. However, existing...
PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning : Abstract: Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks ...
Continuous-Depth Transformers with Learned Control Dynamics : Abstract: We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation (ODE) block, enabling inference-time control o...
FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems : Abstract: Approximate Nearest-Neighbor Search (ANNS) is a key technique in retrieval-augmented generation (RAG), enabling rapid identification of the most relevant high-dimensional embeddings from mas...
In-Context Operator Learning on the Space of Probability Measures : Abstract: We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions t...
An Exploratory Study to Repurpose LLMs to a Unified Architecture for Time Series Classification : Abstract: Time series classification (TSC) is a core machine learning problem with broad applications. Recently there has been growing interest in repurposing large language models (LLMs) for TSC, mot...
Interpolation-Based Optimization for Enforcing lp-Norm Metric Differential Privacy in Continuous and Fine-Grained Domains : Abstract: Metric Differential Privacy (mDP) generalizes Local Differential Privacy (LDP) by adapting privacy guarantees based on pairwise distances, enabling context-aware protection and improved util...
The PROPER Approach to Proactivity: Benchmarking and Advancing Knowledge Gap Navigation : Abstract: Most language-based assistants follow a reactive ask-and-respond paradigm, requiring users to explicitly state their needs. As a result, relevant but unexpressed needs often go unmet. Existi...
A New Convergence Analysis of Plug-and-Play Proximal Gradient Descent Under Prior Mismatch : Abstract: In this work, we provide a new convergence theory for plug-and-play proximal gradient descent (PnP-PGD) under prior mismatch where the denoiser is trained on a different data distribution to...
Eluder dimension: localise it! : Abstract: We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To add...
TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models : Abstract: As black box models and pretrained models gain traction in time series applications, understanding and explaining their predictions becomes increasingly vital, especially in high-stakes doma...
The Geometry of Thought: Disclosing the Transformer as a Tropical Polynomial Circuit : Abstract: We prove that the Transformer self-attention mechanism in the high-confidence regime ($β\to \infty$, where $β$ is an inverse temperature) operates in the tropical semiring (max-plus algebra)...
Social Determinants of Health Prediction for ICD-9 Code with Reasoning Models : Abstract: Social Determinants of Health correlate with patient outcomes but are rarely captured in structured data. Recent attention has been given to automatically extracting these markers from clini...
Bridging Semantic Understanding and Popularity Bias with LLMs : Abstract: Semantic understanding of popularity bias is a crucial yet underexplored challenge in recommender systems, where popular items are often favored at the expense of niche content. Most existin...
GPU-Accelerated ANNS: Quantized for Speed, Built for Change : Abstract: Approximate nearest neighbor search (ANNS) is a core problem in machine learning and information retrieval applications. GPUs offer a promising path to high-performance ANNS: they provide ma...
A reduced-order derivative-informed neural operator for subsurface fluid-flow : Abstract: Neural operators have emerged as cost-effective surrogates for expensive fluid-flow simulators, particularly in computationally intensive tasks such as permeability inversion from time-lapse...
On the Failure of Latent State Persistence in Large Language Models : Abstract: While Large Language Models (LLMs) excel in reasoning, whether they can sustain persistent latent states remains under-explored. The capacity to maintain and manipulate unexpressed, internal...
Adaptive Querying for Reward Learning from Human Feedback : Abstract: Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety. Existing approaches typically consider a single querying (interaction) for...
Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation : Abstract: Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-traini...
SSFL: Discovering Sparse Unified Subnetworks at Initialization for Efficient Federated Learning : Abstract: In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prio...
DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model : Abstract: Production scheduling is highly susceptible to dynamic disruptions, such as variations in processing times, machine availability, and unexpected task insertions. Conventional approaches typi...
Compartmentalised Agentic Reasoning for Clinical NLI : Abstract: Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than sur...
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning : Abstract: Mathematical reasoning remains a significant challenge for large language models (LLMs), despite progress in prompting techniques such as Chain-of-Thought (CoT). We present **Chain of Mathem...
Machine Learning and Theory Ladenness -- A Phenomenological Account : Abstract: We provide an analysis of theory ladenness in machine learning in science, where "theory", that we call "domain theory", refers to the domain knowledge of the scientific discipline where ML ...
Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router : Abstract: Social Determinants of Health (SDOH), also known as Health-Related Social Needs (HSRN), play a significant role in patient health outcomes. The Centers for Disease Control and Prevention (CD...
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching : Abstract: Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement...
Grounding Agent Memory in Contextual Intent : Abstract: Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing...
LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals : Abstract: Concept-based explanations quantify how high-level concepts (e.g., gender or experience) influence model behavior, which is crucial for decision-makers in high-stakes domains. Recent work ev...
On the origin of neural scaling laws: from random graphs to natural language : Abstract: Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and num...
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding : Abstract: Today's strongest video-language models (VLMs) remain proprietary. The strongest open-weight models either rely on synthetic data from proprietary VLMs, effectively distilling from them, or ...
Procedural Fairness in Multi-Agent Bandits : Abstract: In the context of multi-agent multi-armed bandits (MA-MAB), fairness is often reduced to outcomes: maximizing welfare, reducing inequality, or balancing utilities. However, evidence in psych...
ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition : Abstract: Time Series Foundation Models (TSFMs) have emerged as a promising approach for zero-shot financial forecasting, demonstrating strong transferability and data efficiency gains. However, their...
Adversarial Evasion Attacks on Computer Vision using SHAP Values : Abstract: The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by...
Process-Guided Concept Bottleneck Model : Abstract: Concept Bottleneck Models (CBMs) improve the explainability of black-box Deep Learning (DL) by introducing intermediate semantic concepts. However, standard CBMs often overlook domain-specif...
Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems : Abstract: Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severel...
SatMap: Revisiting Satellite Maps as Prior for Online HD Map Construction : Abstract: Online high-definition (HD) map construction is an essential part of a safe and robust end-to-end autonomous driving (AD) pipeline. Onboard camera-based approaches suffer from limited depth ...
Scalable Algorithms for Approximate DNF Model Counting : Abstract: Model counting of Disjunctive Normal Form (DNF) formulas is a critical problem in applications such as probabilistic inference and network reliability. For example, it is often used for quer...
Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning : Abstract: This note introduces Projected Microbatch Accumulation (PROMA), a proximal policy update method for large language model fine-tuning. PROMA accumulates policy gradients across microbatches b...
Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs : Abstract: Large language models are increasingly used for code generation and debugging, but their outputs can still contain bugs, that originate from training data. Distinguishing whether an LLM pref...
Urban Socio-Semantic Segmentation with Vision-Language Reasoning : Abstract: As hubs of human activity, urban surfaces consist of a wealth of semantic entities. Segmenting these various entities from satellite imagery is crucial for a range of downstream applications...
Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models : Abstract: A model that avoids stereotypes in a lab benchmark may not avoid them in deployment. We show that measured bias shifts dramatically when prompts mention different places, times, or audiences...
AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior : Abstract: Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents per...
Development of Ontological Knowledge Bases by Leveraging Large Language Models : Abstract: Ontological Knowledge Bases (OKBs) play a vital role in structuring domain-specific knowledge and serve as a foundation for effective knowledge management systems. However, their traditional...
Are Language Models Models? : Abstract: Futrell and Mahowald claim LMs "serve as model systems", but an assessment at each of Marr's three levels suggests the claim is clearly not true at the implementation level, poorly motivated...
Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer : Abstract: Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires the integration of heterogeneous clinical, radiological, and histopathological information. While Multimodal Deep ...
Global Context Compression with Interleaved Vision-Text Transformation : Abstract: Recent achievements of vision-language models in end-to-end OCR point to a new avenue for low-loss compression of textual information. This motivates earlier works that render the Transforme...
Towards Efficient Low-rate Image Compression with Frequency-aware Diffusion Prior Refinement : Abstract: Recent advancements in diffusion-based generative priors have enabled visually plausible image compression at extremely low bit rates. However, existing approaches suffer from slow sampling ...
SuS: Strategy-aware Surprise for Intrinsic Exploration : Abstract: We propose Strategy-aware Surprise (SuS), a novel intrinsic motivation framework that uses pre-post prediction mismatch as a novelty signal for exploration in reinforcement learning. Unlike ...
Training-Trajectory-Aware Token Selection : Abstract: Efficient distillation is a key pathway for converting expensive reasoning capability into deployable efficiency, yet in the frontier regime where the student already has strong reasoning ab...
OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding : Abstract: Modern coding scaffolds turn LLMs into capable software agents, but their ability to follow scaffold-specified instructions remains under-examined, especially when constraints are heterogene...
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale : Abstract: The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities. While this architecture ...
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset : Abstract: Vision-Language Pre-training (VLP) models demonstrate strong performance across various downstream tasks by learning from large-scale image-text pairs through contrastive pretraining. The re...
SPIKE: Sparse Koopman Regularization for Physics-Informed Neural Networks : Abstract: Physics-Informed Neural Networks (PINNs) provide a mesh-free approach for solving differential equations by embedding physical constraints into neural network training. However, PINNs tend t...
Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers : Abstract: We consider a single large language model (LLM) server that serves a heterogeneous stream of queries belonging to $N$ distinct task types. Queries arrive according to a Poisson process, and ...
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts : Abstract: We present MoST (Mixture of Speech and Text), a novel multimodal large language model that seamlessly integrates speech and text processing through our proposed Modality-Aware Mixture of Exp...
Untangling Input Language from Reasoning Language: A Diagnostic Framework for Cross-Lingual Moral Alignment in LLMs : Abstract: When LLMs judge moral dilemmas, do they reach different conclusions in different languages, and if so, why? Two factors could drive such differences: the language of the dilemma itself, or t...
X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction : Abstract: Sharpness-Aware Minimization (SAM) aims to improve generalization by minimizing a worst-case perturbed loss over a small neighborhood of model parameters. However, during training, its optim...
Loop as a Bridge: Can Looped Transformers Truly Link Representation Space and Natural Language Outputs? : Abstract: Large Language Models (LLMs) often exhibit a gap between their internal knowledge and their explicit linguistic outputs. In this report, we empirically investigate whether Looped Transformer...
Who Owns the Text? Design Patterns for Preserving Authorship in AI-Assisted Writing : Abstract: AI writing assistants can reduce effort and improve fluency, but they may also weaken writers' sense of authorship. We study this tension with an ownership-aware co-writing editor that offer...
Introduction to optimization methods for training SciML models : Abstract: Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet the structure of the underlying optimization problems differs substantially across t...
PADER: Paillier-based Secure Decentralized Social Recommendation : Abstract: The prevalence of recommendation systems also brings privacy concerns to both the users and the sellers, as centralized platforms collect as much data as possible from them. To keep the data...
One Instruction Does Not Fit All: How Well Do Embeddings Align Personas and Instructions in Low-Resource Indian Languages? : Abstract: Aligning multilingual assistants with culturally grounded user preferences is essential for serving India's linguistically diverse population of over one billion speakers across multiple scr...
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary : Abstract: Improving the reasoning abilities of Large Language Models (LLMs) has been a continuous topic recently. But most relevant works are based on outcome rewards at the trajectory level, missing ...
HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning : Abstract: Large Language Models (LLMs) have achieved remarkable strides in multilingual translation but are hindered by a systemic cross-lingual verbosity bias, rendering them unsuitable for strict ti...
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack : Abstract: Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnera...
RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation : Abstract: Open-vocabulary 3D Scene Graph (3DSG) generation can enhance various downstream tasks in robotics, such as manipulation and navigation, by leveraging structured semantic representations. A 3...
AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion Speakers : Abstract: We introduce AWED-FiNER, an open-source ecosystem designed to bridge the gap in Fine-grained Named Entity Recognition (FgNER) for 36 global languages spoken by more than 6.6 billion people. ...
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment : Abstract: Pretraining corpora contain extensive discourse about AI systems, yet the causal influence of this discourse on downstream alignment remains poorly understood. If prevailing descriptions of ...
LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers : Abstract: Compressing the KV cache is a required step to deploy large language models on edge devices. Current quantization methods compress storage but fail to reduce bandwidth as attention calculati...
Simple Network Graph Comparative Learning : Abstract: The effectiveness of contrastive learning methods has been widely recognized in the field of graph learning, especially in contexts where graph data often lack labels or are difficult to lab...
Understanding and Preserving Safety in Fine-Tuned LLMs : Abstract: Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential to substantially degrade safety alignment...
Step-by-Step Causality: Transparent Causal Discovery with Multi-Agent Tree-Query and Adversarial Confidence Estimation : Abstract: Causal discovery aims to recover ``what causes what'', but classical constraint-based methods (e.g., PC, FCI) suffer from error propagation, and recent LLM-based causal oracles often behave ...
Redundancy-Driven Top-$k$ Functional Dependency Discovery : Abstract: Functional dependencies (FDs) are basic constraints in relational databases and are used for many data management tasks. Most FD discovery algorithms find all valid dependencies, but this ca...
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning : Abstract: Current multimodal latent reasoning often relies on external supervision (e.g., auxiliary images), ignoring intrinsic visual attention dynamics. In this work, we identify a critical Percepti...
Role-Playing Agents Driven by Large Language Models: Current Status, Challenges, and Future Trends : Abstract: In recent years, with the rapid advancement of large language models (LLMs), role-playing language agents (RPLAs) have emerged as a prominent research focus at the intersection of natural la...
TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems : Abstract: Optimizing communication topology in LLM-based multi-agent system is critical for enabling collective intelligence. Existing methods mainly rely on spatio-temporal interaction paradigms, whe...
Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants : Abstract: Repository aware coding agents often struggle to recover build and test structure, especially in multilingual projects where cross language dependencies are encoded across heterogeneous buil...
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature : Abstract: Evaluating whether multimodal large language models truly understand long-form scientific papers remains challenging: answer-only metrics and synthetic "Needle-In-A-Haystack" tests often rew...
MathDoc: Benchmarking Structured Extraction and Active Refusal on Noisy Mathematics Exam Papers : Abstract: The automated extraction of structured questions from paper-based mathematics exams is fundamental to intelligent education, yet remains challenging in real-world settings due to severe visu...
FlowAct-R1: Towards Interactive Humanoid Video Generation : Abstract: Interactive humanoid video generation aims to synthesize lifelike visual agents that can engage with humans through continuous and responsive video. Despite recent advances in video synthesi...
V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation : Abstract: Recent advances in multimodal learning have significantly enhanced the reasoning capabilities of vision-language models (VLMs). However, state-of-the-art approaches rely heavily on large-sca...
LeMoF: Level-guided Multimodal Fusion for Heterogeneous Clinical Data : Abstract: Multimodal clinical prediction is widely used to integrate heterogeneous data such as Electronic Health Records (EHR) and biosignals. However, existing methods tend to rely on static modalit...
Difficulty-guided Sampling: Bridging the Target Gap between Dataset Distillation and Downstream Tasks : Abstract: In this paper, we propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task, therefore improving the performance of dataset...
Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts : Abstract: Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (...
ReaMIL: Reasoning- and Evidence-Aware Multiple Instance Learning for Whole-Slide Histopathology : Abstract: We introduce ReaMIL (Reasoning- and Evidence-Aware MIL), a multiple instance learning approach for whole-slide histopathology that adds a light selection head to a strong MIL backbone. The h...
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation : Abstract: Recent video generation models have revealed the emergence of Chain-of-Frame (CoF) reasoning, enabling frame-by-frame visual inference. With this capability, video models have been successfu...
What Understanding Means in AI-Laden Astronomy : Abstract: Artificial intelligence is rapidly transforming astronomical research, yet the scientific community has largely treated this transformation as an engineering challenge rather than an epistem...
Empowering Older Adults in Digital Technology Use with Foundation Models : Abstract: While high-quality technology support can assist older adults in using digital applications, many struggle to articulate their issues due to unfamiliarity with technical terminology and age-...
VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models : Abstract: Video Large Language Models (VideoLLMs) exhibit various types of hallucinations. Existing research has primarily focused on hallucinations involving the presence of events, objects, and scen...
Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG : Abstract: Neural Machine Translation (NMT) models for low-resource languages suffer significant performance degradation under domain shift. We quantify this challenge using Dhao, an indigenous languag...
Performance of AI agents based on reasoning language models on ALD process optimization tasks : Abstract: In this work we explore the performance and behavior of reasoning large language models to autonomously optimize atomic layer deposition (ALD) processes. In the ALD process optimization task...
A Sustainable AI Economy Needs Data Deals That Work for Generators : Abstract: We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synth...
Kinematic Tokenization: Optimization-Based Continuous-Time Tokens for Learnable Decision Policies in Noisy Time Series : Abstract: Transformers are designed for discrete tokens, yet many real-world signals are continuous processes observed through noisy sampling. Discrete tokenizations (raw values, patches, finite diffe...
Malware Classification using Diluted Convolutional Neural Network with Fast Gradient Sign Method : Abstract: Android malware has become an increasingly critical threat to organizations, society and individuals, posing significant risks to privacy, data security and infrastructure. As malware contin...
Learning to Decode in Parallel: Self-Coordinating Neural Network for Real-Time Quantum Error Correction : Abstract: Fast, reliable decoders are pivotal components for enabling fault-tolerant quantum computation (FTQC). Neural network decoders like AlphaQubit have demonstrated potential, achieving higher a...
A Novel Contrastive Loss for Zero-Day Network Intrusion Detection : Abstract: Machine learning has achieved state-of-the-art results in network intrusion detection; however, its performance significantly degrades when confronted by a new attack class -- a zero-day att...
The Algorithmic Gaze: An Audit and Ethnography of the LAION-Aesthetics Predictor Model : Abstract: Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural value...
Transition Matching Distillation for Fast Video Generation : Abstract: Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inef...
MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation : Abstract: Recent progress in medical vision-language models (VLMs) has achieved strong performance on image-level text-centric tasks such as report generation and visual question answering (VQA). Howe...
Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment : Abstract: Large Language Models (LLMs) enable advanced natural language processing but face deployment challenges on resource-constrained edge devices due to high computational, memory, and energy dem...
OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing : Abstract: Scientific paper generation requires document-level planning and factual grounding, but current large language models, despite their strong local fluency, often fail in global structure, inp...
MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication : Abstract: Real-world health questions from patients often unintentionally embed false assumptions or premises. In such cases, safe medical communication typically involves redirection: addressing the ...
ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning : Abstract: Multimodal video captioning condenses dense footage into a structured format of keyframes and natural language. By creating a cohesive multimodal summary, this approach anchors generative AI...
A pipeline for enabling path-specific causal fairness in observational health data : Abstract: When training machine learning (ML) models for potential deployment in a healthcare setting, it is essential to ensure that they do not replicate or exacerbate existing healthcare biases. Al...
LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities : Abstract: Despite recent advancements in Large Language Models (LLMs), complex Software Engineering (SE) tasks require more collaborative and specialized approaches. This concept paper systematically ...
Explainable Deep Learning for Pediatric Pneumonia Detection in Chest X-Ray Images : Abstract: Background: Pneumonia remains a leading cause of morbidity and mortality among children worldwide, emphasizing the need for accurate and efficient diagnostic support tools. Deep learning has...
QFed: Parameter-Compact Quantum-Classical Federated Learning : Abstract: Organizations and enterprises across domains such as healthcare, finance, and scientific research are increasingly required to extract collective intelligence from distributed, siloed datase...
Diffusion-Driven Deceptive Patches: Adversarial Manipulation and Forensic Detection in Facial Identity Verification : Abstract: This work presents an end-to-end pipeline for generating, refining, and evaluating adversarial patches to compromise facial biometric systems, with applications in forensic analysis and secu...
Enhancing LUT-based Deep Neural Networks Inference through Architecture and Connectivity Optimization : Abstract: Deploying deep neural networks (DNNs) on resource-constrained edge devices such as FPGAs requires a careful balance among latency, power, and hardware resource usage, while maintaining high ...
CLiMB: A Domain-Informed Novelty Detection Clustering Framework for Scientific Discovery : Abstract: In data-driven scientific discovery, a challenge lies in classifying well-characterized phenomena while identifying novel anomalies. Current semi-supervised clustering algorithms do not alwa...
Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation : Abstract: Compliance testing in highly regulated domains is crucial but largely manual, requiring domain experts to translate complex regulations into executable test cases. While large language model...
Investigating Tool-Memory Conflicts in Tool-Augmented LLMs : Abstract: Tool-augmented large language models (LLMs) have powered many applications. However, they are likely to suffer from knowledge conflict. In this paper, we propose a new type of knowledge conf...
Democracy and Distrust in an Era of Artificial Intelligence : Abstract: This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue ...
Synthetic Data for Veterinary EHR De-identification: Benefits, Limits, and Safety Trade-offs Under Fixed Compute : Abstract: Veterinary electronic health records (vEHRs) contain privacy-sensitive identifiers that limit secondary use. While PetEVAL provides a benchmark for veterinary de-identification, the domain r...
Heterogeneous computing platform for real-time robotics : Abstract: After Industry 4.0 has embraced tight integration between machinery (OT), software (IT), and the Internet, creating a web of sensors, data, and algorithms in service of efficient and reliabl...
Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools : Abstract: Crises in peer review capacity, study replication, and AI-fabricated science have intensified interest in automated tools for assessing scientific research. However, the scientific community...
SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments : Abstract: Large language models (LLMs) have proven to work well in question-answering scenarios, but real-world applications often require access to tools for live information or actuation. For this, ...
R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation : Abstract: Large Action Models (LAMs) extend large language models by enabling autonomous decision-making and tool execution, making them promising for automating scientific workflows. However, scienti...
Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts : Abstract: This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) co...
Enhancing Formal Software Specification with Artificial Intelligence : Abstract: Formal software specification is known to enable early error detection and explicit invariants, yet it has seen limited industrial adoption due to its high notation overhead and the expertis...
Formal Safety Guarantees for Autonomous Vehicles using Barrier Certificates : Abstract: Modern AI technologies enable autonomous vehicles to perceive complex scenes, predict human behavior, and make real-time driving decisions. However, these data-driven components often operat...
Reinforced Linear Genetic Programming : Abstract: Linear Genetic Programming (LGP) is a powerful technique that allows for a variety of problems to be solved using a linear representation of programs. However, there still exists some limita...
From Detection to Diagnosis: Advancing Hallucination Analysis with Automated Data Synthesis : Abstract: Hallucinations in Large Language Models (LLMs), defined as the generation of content inconsistent with facts or context, represent a core obstacle to their reliable deployment in critical do...
Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets : Abstract: The construction of Supervised Fine-Tuning (SFT) datasets is a critical yet under-theorized stage in the post-training of Large Language Models (LLMs), as prevalent practices often rely on h...
Clinical Document Metadata Extraction: A Scoping Review : Abstract: Clinical document metadata, such as document type, structure, author role, medical specialty, and encounter setting, is essential for accurate interpretation of information captured in clini...
Enhancing Business Analytics through Hybrid Summarization of Financial Reports : Abstract: Financial reports and earnings communications contain large volumes of structured and semi structured information, making detailed manual analysis inefficient. Earnings conference calls prov...
Eliminating Agentic Workflow for Introduction Generation with Parametric Stage Tokens : Abstract: In recent years, using predefined agentic workflows to guide large language models (LLMs) for literature classification and review has become a research focus. However, writing research intr...
SciNets: Graph-Constrained Multi-Hop Reasoning for Scientific Literature Synthesis : Abstract: Cross-domain scientific synthesis requires connecting mechanistic explanations across fragmented literature, a capability that remains challenging for both retrieval-based systems and uncons...
Forgetting as a Feature: Cognitive Alignment of Large Language Models : Abstract: Large Language Models (LLMs) are often evaluated against ideals of perfect Bayesian inference, yet growing evidence suggests that their in-context reasoning exhibits systematic forgetting of...
Syntactic Framing Fragility: An Audit of Robustness in LLM Ethical Decisions : Abstract: Large language models (LLMs) are increasingly deployed in consequential decision-making settings, yet their robustness to benign prompt variation remains underexplored. In this work, we stud...
SagaScale: A Realistic, Scalable, and High-Quality Long-Context Benchmark Built from Full-Length Novels : Abstract: Large Language Models (LLMs) have shown significant progress, but understanding long and complex documents remains challenging. Many long-context benchmarks have been proposed, but they face...
ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language : Abstract: In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed f...
Cross-Platform Evaluation of Large Language Model Safety in Pediatric Consultations: Evolution of Adversarial Robustness and the Scale Paradox : Abstract: Background Large language models (LLMs) are increasingly deployed in medical consultations, yet their safety under realistic user pressures remains understudied. Prior assessments focused on...
Uncertainty-Aware Dynamic Knowledge Graphs for Reliable Question Answering : Abstract: Question answering (QA) systems are increasingly deployed across domains. However, their reliability is undermined when retrieved evidence is incomplete, noisy, or uncertain. Existing knowle...
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models : Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN is inefficient...
StatLLaMA: A multi-stage training framework for building a domain-optimized statistical language model : Abstract: This study investigates how to efficiently build a domain-specialized large language model (LLM) for statistics using the lightweight LLaMA-3.2-3B family as the foundation model (FM). We sys...
SALP-CG: Standard-Aligned LLM Pipeline for Classifying and Grading Large Volumes of Online Conversational Health Data : Abstract: Online medical consultations generate large volumes of conversational health data that often embed protected health information, requiring robust methods to classify data categories and assi...
Introducing Axlerod: An LLM-based Chatbot for Assisting Independent Insurance Agents : Abstract: The insurance industry is undergoing a paradigm shift through the adoption of artificial intelligence (AI) technologies, particularly in the realm of intelligent conversational agents. Chatb...
Evaluating Novelty in AI-Generated Research Plans Using Multi-Workflow LLM Pipelines : Abstract: The integration of Large Language Models (LLMs) into the scientific ecosystem raises fundamental questions about the creativity and originality of AI-generated research. Recent work has iden...
Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control : Abstract: This paper proposes a novel reinforcement learning framework, named Self-Organizing Dual-buffer Adaptive Clustering Experience Replay (SODACER), designed to achieve safe and scalable optimal...
The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load : Abstract: Our study examines how generative AI (GenAI) influences performance, creative self-efficacy, and cognitive load in architectural conceptual design tasks. Thirty-six student participants from...
Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems : Abstract: Large language model (LLM) contexts are typically constructed using retrieval-augmented generation (RAG), which involves ranking and selecting the top-k passages. The approach causes fragmen...
Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models : Abstract: Hierarchical reasoning model (HRM) achieves extraordinary performance on various reasoning tasks, significantly outperforming large language model-based reasoners. To understand the strength...
Multi-Property Synthesis : Abstract: We study LTLf synthesis with multiple properties, where satisfying all properties may be impossible. Instead of enumerating subsets of properties, we compute in one fixed-point computation t...
From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA : Abstract: Comprehending genomic information is essential for biomedical research, yet extracting data from complex distributed databases remains challenging. Large language models (LLMs) offer potenti...
Generative AI collective behavior needs an interactionist paradigm : Abstract: In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of ...
Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing : Abstract: Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world applications. Despite extensive safety alignment e...
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5 : Abstract: The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has produced substantial gains in reasoning, perception, and generative capability across lan...
Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection : Abstract: The practice of fine-tuning Large Language Models (LLMs) has achieved state-of-the-art performance on specialized tasks, yet diagnosing why these models become brittle and fail to generalize...
Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment : Abstract: As AI agents become increasingly autonomous, widely deployed in consequential contexts, and efficacious in bringing about real-world impacts, ensuring that their decisions are not only instr...
Panning for Gold: Expanding Domain-Specific Knowledge Graphs with General Knowledge : Abstract: Domain-specific knowledge graphs (DKGs) often lack coverage compared to general knowledge graphs (GKGs). To address this, we introduce Domain-specific Knowledge Graph Fusion (DKGF), a novel ...
ChartComplete: A Taxonomy-based Inclusive Chart Dataset : Abstract: With advancements in deep learning (DL) and computer vision techniques, the field of chart understanding is evolving rapidly. In particular, multimodal large language models (MLLMs) are prov...
NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models : Abstract: Although the Gradient Boosted Decision Trees (GBDTs) dominate industrial tabular applications, upgrading legacy models in high-concurrency production environments still faces prohibitive ret...
LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models : Abstract: Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a...
LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies : Abstract: Privacy policies help inform people about organisations' personal data processing practices, covering different aspects such as data collection, data storage, and sharing of personal data wi...
ErrEval: Error-Aware Evaluation for Question Generation through Explicit Diagnostics : Abstract: Automatic Question Generation (QG) often produces outputs with critical defects, such as factual hallucinations and answer mismatches. However, existing evaluation methods, including LLM-bas...
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering : Abstract: The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and it...
LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries : Abstract: In LLM-based text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorrect text but also executable programs that yield misleading results or violate safe...
C-GRASP: Clinically-Grounded Reasoning for Affective Signal Processing : Abstract: Heart rate variability (HRV) is a pivotal noninvasive marker for autonomic monitoring; however, applying Large Language Models (LLMs) to HRV interpretation is hindered by physiological hallu...
Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning : Abstract: While Reinforcement Learning (RL) has advanced LLM reasoning, applying it to long-context scenarios is hindered by sparsity of outcome rewards. This limitation fails to penalize ungrounded "...
NoReGeo: Non-Reasoning Geometry Benchmark : Abstract: We present NoReGeo, a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs) without relying on reasoning or algebraic computation. Unlike...
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks : Abstract: Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing met...
Topo-RAG: Topology-aware retrieval for hybrid text-table documents : Abstract: In enterprise datasets, documents are rarely pure. They are not just text, nor just numbers; they are a complex amalgam of narrative and structure. Current Retrieval-Augmented Generation (RA...
GFM4GA: Graph Foundation Model for Group Anomaly Detection : Abstract: Group anomaly detection is crucial in many network applications, but faces challenges due to diverse anomaly patterns. Motivated by the success of large language models (LLMs) in natural lan...
How does downsampling affect needle electromyography signals? A generalisable workflow for understanding downsampling effects on high-frequency time series : Abstract: Automated analysis of needle electromyography (nEMG) signals is emerging as a tool to support the detection of neuromuscular diseases (NMDs), yet the signals' high and heterogeneous sampling...
CtD: Composition through Decomposition in Emergent Communication : Abstract: Compositionality is a cognitive mechanism that allows humans to systematically combine known concepts in novel ways. This study demonstrates how artificial neural agents acquire and utilize ...
MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning : Abstract: Graph Neural Networks (GNNs) have been widely adopted for Protein Representation Learning (PRL), as residue interaction networks can be naturally represented as graphs. Current GNN-based PRL...
MHub.ai: A Simple, Standardized, and Reproducible Platform for AI Models in Medical Imaging : Abstract: Artificial intelligence (AI) has the potential to transform medical imaging by automating image analysis and accelerating clinical research. However, research and clinical use are limited by...
DecisionLLM: Large Language Models for Long Sequence Decision Exploration : Abstract: Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as re...
History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis : Abstract: In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable dat...
Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning and prediction across different domains. Yet, their ability to infer temporal regularities from structured...
M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints : Abstract: Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they...
Following the Teacher's Footsteps: Scheduled Checkpoint Distillation for Domain-Specific LLMs : Abstract: Large language models (LLMs) are challenging to deploy for domain-specific tasks due to their massive scale. While distilling a fine-tuned LLM into a smaller student model is a promising alt...
MATRIX AS PLAN: Structured Logical Reasoning with Feedback-Driven Replanning : Abstract: As knowledge and semantics on the web grow increasingly complex, enhancing Large Language Models (LLMs) comprehension and reasoning capabilities has become particularly important. Chain-of-T...
State of AI: An Empirical 100 Trillion Token Study with OpenRouter : Abstract: The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5t...
FilDeep: Learning Large Deformations of Elastic-Plastic Solids with Multi-Fidelity Data : Abstract: The scientific computation of large deformations in elastic-plastic solids is crucial in various manufacturing applications. Traditional numerical methods exhibit several inherent limitation...
PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization : Abstract: Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To addre...
Structured Personality Control and Adaptation for LLM Agents : Abstract: Large Language Models (LLMs) are increasingly shaping human-computer interaction (HCI), from personalized assistants to social simulations. Beyond language competence, researchers are explor...
Memo-SQL: Structured Decomposition and Experience-Driven Self-Correction for Training-Free NL2SQL : Abstract: Existing NL2SQL systems face two critical limitations: (1) they rely on in-context learning with only correct examples, overlooking the rich signal in historical error-fix pairs that could g...
SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation : Abstract: Personalizing Large Language Models typically relies on static retrieval or one-time adaptation, assuming user preferences remain invariant over time. However, real-world interactions are dy...
Chinese Labor Law Large Language Model Benchmark : Abstract: Recent advances in large language models (LLMs) have led to substantial progress in domain-specific applications, particularly within the legal domain. However, general-purpose models such a...
Hallucination Detection and Mitigation in Large Language Models : Abstract: Large Language Models (LLMs) and Large Reasoning Models (LRMs) offer transformative potential for high-stakes domains like finance and law, but their tendency to hallucinate, generating fact...
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents : Abstract: AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectu...
Continuum Memory Architectures for Long-Horizon LLM Agents : Abstract: Retrieval-augmented generation (RAG) has become the default strategy for providing large language model (LLM) agents with contextual knowledge. Yet RAG treats memory as a stateless lookup ta...
Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL : Abstract: Most existing Large Language Model (LLM)-based Multi-Agent Systems (MAS) rely on predefined workflows, where human engineers enumerate task states in advance and specify routing rules and co...
Epistemology gives a Future to Complementarity in Human-AI Interactions : Abstract: Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the human-AI interaction lite...
A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents : Abstract: Anthropomorphisation -- the phenomenon whereby non-human entities are ascribed human-like qualities -- has become increasingly salient with the rise of large language model (LLM)-based conve...
Thinking Long, but Short: Stable Sequential Test-Time Scaling for Large Reasoning Models : Abstract: Sequential test-time scaling is a promising training-free method to improve large reasoning model accuracy, but as currently implemented, significant limitations have been observed. Inducing...
Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention : Abstract: Modern logical reasoning with LLMs primarily relies on employing complex interactive frameworks that decompose the reasoning process into subtasks solved through carefully designed prompts o...
Antisocial behavior towards large language model users: experimental evidence : Abstract: The rapid spread of large language models (LLMs) has raised concerns about the social reactions they provoke. Prior research documents negative attitudes toward AI users, but it remains uncl...
PCN-Rec: Agentic Proof-Carrying Negotiation for Reliable Governance-Constrained Recommendation : Abstract: Modern LLM-based recommenders can generate compelling ranked lists, but they struggle to reliably satisfy governance constraints such as minimum long-tail exposure or diversity requirements....
GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents : Abstract: Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs...
AI Survival Stories: a Taxonomic Analysis of AI Existential Risk : Abstract: Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the exi...

Research Sources: 353 | Generated: 1/16/2026