AI Research News Feeds for February 12th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis : Abstract: In medical imaging, the diffusion models have shown great potential for synthetic image generation tasks. However, these approaches often lack the interpretable connections between the gener...
MITI: SLAM Benchmark for Laparoscopic Surgery : Abstract: We propose a new benchmark for evaluating stereoscopic visual-inertial computer vision algorithms (SLAM/ SfM/ 3D Reconstruction/ Visual-Inertial Odometry) for minimally invasive surgical (MI...
Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud? : Abstract: Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised ...
Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling : Abstract: Semantic segmentation of 3D point clouds is important for many applications, such as autonomous driving. To train semantic segmentation models, labeled point cloud segmentation datasets are ...
From Representational Complementarity to Dual Systems: Synergizing VLM and Vision-Only Backbones for End-to-End Driving : Abstract: Vision-Language-Action (VLA) driving augments end-to-end (E2E) planning with language-enabled backbones, yet it remains unclear what changes beyond the usual accuracy--cost trade-off. We rev...
Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading : Abstract: Diabetes mellitus is a chronic metabolic disorder characterized by persistent hyperglycemia due to insufficient insulin production or impaired insulin utilization. One of its most severe com...
A Systematic Review on Data-Driven Brain Deformation Modeling for Image-Guided Neurosurgery : Abstract: Accurate compensation of brain deformation is a critical challenge for reliable image-guided neurosurgery, as surgical manipulation and tumor resection induce tissue motion that misaligns pr...
URBAN-SPIN: A street-level bikeability index to inform design implementations in historical city centres : Abstract: Cycling is reported by an average of 35\% of adults at least once per week across 28 countries, and as vulnerable road users directly exposed to their surroundings, cyclists experience the s...
SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes : Abstract: Simulation has become a key tool for training and evaluating home robots at scale, yet existing environments fail to capture the diversity and physical complexity of real indoor spaces. Curr...
SurfPhase: 3D Interfacial Dynamics in Two-Phase Flows from Sparse Videos : Abstract: Interfacial dynamics in two-phase flows govern momentum, heat, and mass transfer, yet remain difficult to measure experimentally. Classical techniques face intrinsic limitations near moving ...
PhyCritic: Multimodal Critic Models for Physical AI : Abstract: With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferenc...
HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion : Abstract: We present HairWeaver, a diffusion-based pipeline that animates a single human image with realistic and expressive hair dynamics. While existing methods successfully control body pose, they ...
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference : Abstract: Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like di...
PuriLight: A Lightweight Shuffle and Purification Framework for Monocular Depth Estimation : Abstract: We propose PuriLight, a lightweight and efficient framework for self-supervised monocular depth estimation, to address the dual challenges of computational efficiency and detail preservation...
LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation : Abstract: Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature...
Interpretable Vision Transformers in Monocular Depth Estimation via SVDA : Abstract: Monocular depth estimation is a central problem in computer vision with applications in robotics, AR, and autonomous driving, yet the self-attention mechanisms that drive modern Transformer ...
Interpretable Vision Transformers in Image Classification via SVDA : Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors....
DFIC: Towards a balanced facial image dataset for automatic ICAO compliance verification : Abstract: Ensuring compliance with ISO/IEC and ICAO standards for facial images in machine-readable travel documents (MRTDs) is essential for reliable identity verification, but current manual inspect...
VFGS-Net: Frequency-Guided State-Space Learning for Topology-Preserving Retinal Vessel Segmentation : Abstract: Accurate retinal vessel segmentation is a critical prerequisite for quantitative analysis of retinal images and computer-aided diagnosis of vascular diseases such as diabetic retinopathy. Ho...
Towards Learning a Generalizable 3D Scene Representation from 2D Observations : Abstract: We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordin...
FastUSP: A Multi-Level Collaborative Acceleration Framework for Distributed Diffusion Model Inference : Abstract: Large-scale diffusion models such as FLUX (12B parameters) and Stable Diffusion 3 (8B parameters) require multi-GPU parallelism for efficient inference. Unified Sequence Parallelism (USP), w...
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving : Abstract: The comprehensive understanding capabilities of world models for driving scenarios have significantly improved the planning accuracy of end-to-end autonomous driving frameworks. However, the...
Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation : Abstract: Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on sup...
Stride-Net: Fairness-Aware Disentangled Representation Learning for Chest X-Ray Diagnosis : Abstract: Deep neural networks for chest X-ray classification achieve strong average performance, yet often underperform for specific demographic subgroups, raising critical concerns about clinical sa...
Hyperspectral Smoke Segmentation via Mixture of Prototypes : Abstract: Smoke segmentation is critical for wildfire management and industrial safety applications. Traditional visible-light-based methods face limitations due to insufficient spectral information, ...
Resource-Efficient RGB-Only Action Recognition for Edge Deployment : Abstract: Action recognition on edge devices poses stringent constraints on latency, memory, storage, and power consumption. While auxiliary modalities such as skeleton and depth information can enhan...
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories : Abstract: Existing multimodal retrieval systems excel at semantic matching but implicitly assume that query-image relevance can be measured in isolation. This paradigm overlooks the rich dependencies ...
DMP-3DAD: Cross-Category 3D Anomaly Detection via Realistic Depth Map Projection with Few Normal Samples : Abstract: Cross-category anomaly detection for 3D point clouds aims to determine whether an unseen object belongs to a target category using only a few normal examples. Most existing methods rely on c...
From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning? : Abstract: Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language mo...
Dual-End Consistency Model : Abstract: The slow iterative sampling nature remains a major bottleneck for the practical deployment of diffusion and flow-based generative models. While consistency models (CMs) represent a state-of-...
Text-to-Vector Conversion for Residential Plan Design : Abstract: Computer graphics, comprising both raster and vector components, is a fundamental part of modern science, industry, and digital communication. While raster graphics offer ease of use, its pi...
OccFace: Unified Occlusion-Aware Facial Landmark Detection with Per-Point Visibility : Abstract: Accurate facial landmark detection under occlusion remains challenging, especially for human-like faces with large appearance variation and rotation-driven self-occlusion. Existing detectors...
Ecological mapping with geospatial foundation models : Abstract: Geospatial foundation models (GFMs) are a fast-emerging paradigm for various geospatial tasks, such as ecological mapping. However, the utility of GFMs has not been fully explored for high-v...
FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection : Abstract: With the increasing availability of high-resolution remote sensing and aerial imagery, oriented object detection has become a key capability for geographic information updating, maritime sur...
(MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization : Abstract: Cross-view geo-localization (CVGL) is pivotal for GNSS-denied UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satell...
AMAP-APP: Efficient Segmentation and Morphometry Quantification of Fluorescent Microscopy Images of Podocytes : Abstract: Background: Automated podocyte foot process quantification is vital for kidney research, but the established "Automatic Morphological Analysis of Podocytes" (AMAP) method is hindered by high...
Dynamic Frequency Modulation for Controllable Text-driven Image Generation : Abstract: The success of text-guided diffusion models has established a new image generation paradigm driven by the iterative refinement of text prompts. However, modifying the original text prompt to...
AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception : Abstract: Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility. However, developing reliable AI systems for autonomous vehicles re...
Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation : Abstract: We address the challenging task of text-driven 3D human-object interaction (HOI) motion generation. Existing methods primarily rely on a direct text-to-HOI mapping, which suffers from three ...
VideoSTF: Stress-Testing Output Repetition in Video Large Language Models : Abstract: Video Large Language Models (VideoLLMs) have recently achieved strong performance in video understanding tasks. However, we identify a previously underexplored generation failure: severe out...
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration : Abstract: Diffusion models have attained remarkable breakthroughs in the real-world super-resolution (SR) task, albeit at slow inference and high demand on devices. To accelerate inference, recent wor...
Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation : Abstract: While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-moda...
Fast Person Detection Using YOLOX With AI Accelerator For Train Station Safety : Abstract: Recently, Image processing has advanced Faster and applied in many fields, including health, industry, and transportation. In the transportation sector, object detection is widely used to im...
Enhancing YOLOv11n for Reliable Child Detection in Noisy Surveillance Footage : Abstract: This paper presents a practical and lightweight solution for enhancing child detection in low-quality surveillance footage, a critical component in real-world missing child alert and daycare...
Enhancing Underwater Images via Adaptive Semantic-aware Codebook Learning : Abstract: Underwater Image Enhancement (UIE) is an ill-posed problem where natural clean references are not available, and the degradation levels vary significantly across semantic regions. Existing U...
MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps : Abstract: Maps are powerful carriers of structured and contextual knowledge, encompassing geography, demographics, infrastructure, and environmental patterns. Reasoning over such knowledge requires mo...
3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars : Abstract: Audio-driven 3D talking avatar generation is increasingly important in virtual communication, digital humans, and interactive media, where avatars must preserve identity, synchronize lip mot...
Med-SegLens: Latent-Level Model Diffing for Interpretable Medical Image Segmentation : Abstract: Modern segmentation models achieve strong predictive performance but remain largely opaque, limiting our ability to diagnose failures, understand dataset shift, or intervene in a principled ...
The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation : Abstract: This study introduces the Garbage Dataset (GD), a publicly available image dataset designed to advance automated waste segregation through machine learning and computer vision. It's a divers...
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings : Abstract: Multi-Resolution Hash Encoding (MHE), the foundational technique behind Instant Neural Graphics Primitives, provides a powerful parameterization for neural fields. However, its spatial behav...
End-to-End LiDAR optimization for 3D point cloud registration : Abstract: LiDAR sensors are a key modality for 3D perception, yet they are typically designed independently of downstream tasks such as point cloud registration. Conventional registration operates on ...
Towards Remote Sensing Change Detection with Neural Memory : Abstract: Remote sensing change detection is essential for environmental monitoring, urban planning, and related applications. However, current methods often struggle to capture long-range dependencie...
HII-DPO: Eliminate Hallucination via Accurate Hallucination-Inducing Counterfactual Images : Abstract: Large Vision-Language Models (VLMs) have achieved remarkable success across diverse multimodal tasks but remain vulnerable to hallucinations rooted in inherent language bias. Despite recent ...
Comp2Comp: Open-Source Software with FDA-Cleared Artificial Intelligence Algorithms for Computed Tomography Image Analysis : Abstract: Artificial intelligence allows automatic extraction of imaging biomarkers from already-acquired radiologic images. This paradigm of opportunistic imaging adds value to medical imaging withou...
Monte Carlo Maximum Likelihood Reconstruction for Digital Holography with Speckle : Abstract: In coherent imaging, speckle is statistically modeled as multiplicative noise, posing a fundamental challenge for image reconstruction. While maximum likelihood estimation (MLE) provides a p...
A Low-Rank Defense Method for Adversarial Attack on Diffusion Models : Abstract: Recently, adversarial attacks for diffusion models as well as their fine-tuning process have been developed rapidly. To prevent the abuse of these attack algorithms from affecting the practi...
Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing : Abstract: Neural-network-based diagnosis from dermatoscopic images is increasingly used for clinical decision support, yet studies report performance disparities across skin tones. Fairness auditing o...
PMMA: The Polytechnique Montreal Mobility Aids Dataset : Abstract: This study introduces a new object detection dataset of pedestrians using mobility aids, named PMMA. The dataset was collected in an outdoor environment, where volunteers used wheelchairs, c...
XSPLAIN: XAI-enabling Splat-based Prototype Learning for Attribute-aware INterpretability : Abstract: 3D Gaussian Splatting (3DGS) has rapidly become a standard for high-fidelity 3D reconstruction, yet its adoption in multiple critical domains is hindered by the lack of interpretability of t...
DEGMC: Denoising Diffusion Models Based on Riemannian Equivariant Group Morphological Convolutions : Abstract: In this work, we address two major issues in recent Denoising Diffusion Probabilistic Models (DDPM): {\bf 1)} geometric key feature extraction and {\bf 2)} network equivariance. Since the DD...
ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop : Abstract: Representation in the family of 3D Gaussian Splats (3DGS) are growing into a viable alternative to traditional graphics for an expanding number of application, including recent techniques th...
MPA: Multimodal Prototype Augmentation for Few-Shot Learning : Abstract: Recently, few-shot learning (FSL) has become a popular task that aims to recognize new classes from only a few labeled examples and has been widely applied in fields such as natural science,...
Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval : Abstract: Dense retrieval is a promising approach for acquiring relevant context or world knowledge in open-domain natural language processing tasks and is now widely used in information retrieval app...
RE-LLM: Refining Empathetic Speech-LLM Responses by Integrating Emotion Nuance : Abstract: With generative AI advancing, empathy in human-AI interaction is essential. While prior work focuses on emotional reflection, emotional exploration, key to deeper engagement, remains overloo...
ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents : Abstract: Large Language Model (LLM) agents have shown promising potential in automating Instructional Systems Design (ISD), a systematic approach to developing educational programs. However, evaluati...
TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation : Abstract: Given that Large Language Models (LLMs) are increasingly applied to automate software development, comprehensive software assurance spans three distinct goals: regression prevention, reactiv...
The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis : Abstract: The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where un...
MLDocRAG: Multimodal Long-Context Document Retrieval Augmented Generation : Abstract: Understanding multimodal long-context documents that comprise multimodal chunks such as paragraphs, figures, and tables is challenging due to (1) cross-modal heterogeneity to localize releva...
VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding : Abstract: While Vision-Language Models (VLMs) have shown promise in textual understanding, they face significant challenges when handling long context and complex reasoning tasks. In this paper, we di...
AntigenLM: Structure-Aware DNA Language Modeling for Influenza : Abstract: Language models have advanced sequence analysis, yet DNA foundation models often lag behind task-specific methods for unclear reasons. We present AntigenLM, a generative DNA language model p...
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning : Abstract: Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more u...
TEGRA: Text Encoding With Graph and Retrieval Augmentation for Misinformation Detection : Abstract: Misinformation detection is a critical task that can benefit significantly from the integration of external knowledge, much like manual fact-checking. In this work, we propose a novel method...
Can Large Language Models Make Everyone Happy? : Abstract: Misalignment in Large Language Models (LLMs) refers to the failure to simultaneously satisfy safety, value, and cultural dimensions, leading to behaviors that diverge from human expectations...
Simultaneous Speech-to-Speech Translation Without Aligned Data : Abstract: Simultaneous speech translation requires translating source speech into a target language in real-time while handling non-monotonic word dependencies. Traditional approaches rely on supervis...
Embedding Inversion via Conditional Masked Diffusion Language Models : Abstract: We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusi...
C-MOP: Integrating Momentum and Boundary-Aware Clustering for Enhanced Prompt Evolution : Abstract: Automatic prompt optimization is a promising direction to boost the performance of Large Language Models (LLMs). However, existing methods often suffer from noisy and conflicting update sign...
I can tell whether you are a Native Hawl\^eri Speaker! How ANN, CNN, and RNN perform in NLI-Native Language Identification : Abstract: Native Language Identification (NLI) is a task in Natural Language Processing (NLP) that typically determines the native language of an author through their writing or a speaker through thei...
Reinforced Curriculum Pre-Alignment for Domain-Adaptive VLMs : Abstract: Vision-Language Models (VLMs) demonstrate remarkable general-purpose capabilities but often fall short in specialized domains such as medical imaging or geometric problem-solving. Supervised...
Macaron: Controlled, Human-Written Benchmark for Multilingual and Multicultural Reasoning via Template-Filling : Abstract: Multilingual benchmarks rarely test reasoning over culturally grounded premises: translated datasets keep English-centric scenarios, while culture-first datasets often lack control over the ...
Targeted Syntactic Evaluation of Language Models on Georgian Case Alignment : Abstract: This paper evaluates the performance of transformer-based language models on split-ergative case alignment in Georgian, a particularly rare system for assigning grammatical cases to mark arg...
Benchmarks Are Not That Out of Distribution: Word Overlap Predicts Performance : Abstract: Understanding what constitutes high-quality pre-training data remains a central question in language model training. In this work, we investigate whether benchmark performance is primarily d...
UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory : Abstract: Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)-based agents, where extraction (distilling insights from experience) and management (updating the mem...
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning : Abstract: Decoder-only large language models are increasingly used as behavioral encoders for user representation learning, yet the impact of attention masking on the quality of user embeddings remain...
On the Robustness of Knowledge Editing for Detoxification : Abstract: Knowledge-Editing-based (KE-based) detoxification has emerged as a promising approach for mitigating harmful behaviours in Large Language Models. Existing evaluations, however, largely rely ...
Canvas-of-Thought: Grounding Reasoning via Mutable Structured States : Abstract: While Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), relying solely on linear text sequences remains a bo...
Neuro-Symbolic Synergy for Interactive World Modeling : Abstract: Large language models (LLMs) exhibit strong general-purpose reasoning capabilities, yet they frequently hallucinate when used as world models (WMs), where strict compliance with deterministi...
LATA: A Tool for LLM-Assisted Translation Annotation : Abstract: The construction of high-quality parallel corpora for translation research has increasingly evolved from simple sentence alignment to complex, multi-layered annotation tasks. This methodolog...
EVOKE: Emotion Vocabulary Of Korean and English : Abstract: This paper introduces EVOKE, a parallel dataset of emotion vocabulary in English and Korean. The dataset offers comprehensive coverage of emotion words in each language, in addition to many-...
When are We Worried? Temporal Trends of Anxiety and What They Reveal about Us : Abstract: In this short paper, we make use of a recently created lexicon of word-anxiety associations to analyze large amounts of US and Canadian social media data (tweets) to explore *when* we are an...
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents : Abstract: Vision-language models (VLMs) perform well on many document understanding tasks, yet their reliability in specialized, non-English domains remains underexplored. This gap is especially criti...
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models : Abstract: Backdoor attacks pose significant security risks for Large Language Models (LLMs), yet the internal mechanisms by which triggers operate remain poorly understood. We present the first mechan...
Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation : Abstract: Real-world digital environments are highly diverse and dynamic. These characteristics cause agents to frequently encounter unseen scenarios and distribution shifts, making continual learning...
When Less Is More? Diagnosing ASR Predictions in Sardinian via Layer-Wise Decoding : Abstract: Recent studies have shown that intermediate layers in multilingual speech models often encode more phonetically accurate representations than the final output layer. In this work, we apply a...
The Subjectivity of Respect in Police Traffic Stops: Modeling Community Perspectives in Body-Worn Camera Footage : Abstract: Traffic stops are among the most frequent police-civilian interactions, and body-worn cameras (BWCs) provide a unique record of how these encounters unfold. Respect is a central dimension of...
On Emergent Social World Models -- Evidence for Functional Integration of Theory of Mind and Pragmatic Reasoning in Language Models : Abstract: This paper investigates whether LMs recruit shared computational mechanisms for general Theory of Mind (ToM) and language-specific pragmatic reasoning in order to contribute to the general q...
Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens : Abstract: While explicit Chain-of-Thought (CoT) equips Large Language Models (LLMs) with strong reasoning capabilities, it requires models to verbalize every intermediate step in text tokens, constrai...
Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback : Abstract: Peer review is central to scientific quality, yet reliance on simple heuristics -- lazy thinking -- has lowered standards. Prior work treats lazy thinking detection as a single-label task, b...
Efficient Causal Structure Learning via Modular Subgraph Integration : Abstract: Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges ...
Exponential time differencing for matrix-valued dynamical systems : Abstract: Matrix evolution equations occur in many applications, such as dynamical Lyapunov/Sylvester systems or Riccati equations in optimization and stochastic control, machine learning or data assi...
Diffusion posterior sampling for simulation-based inference in tall data settings : Abstract: Identifying the parameters of a non-linear model that best explain observed data is a core task across scientific fields. When such models rely on complex simulators, evaluating the likeliho...
Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning : Abstract: In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex o...
Airway Tree Modeling Using Dual-channel 3D UNet 3+ with Vesselness Prior : Abstract: The lung airway tree modeling is essential to work for the diagnosis of pulmonary diseases, especially for X-Ray computed tomography (CT). The airway tree modeling on CT images can provide t...
Neural Score Matching for High-Dimensional Causal Inference : Abstract: Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching ...
Implicit Hypothesis Testing and Divergence Preservation in Neural Network Representations : Abstract: We study the supervised training dynamics of neural classifiers through the lens of binary hypothesis testing. We model classification as a set of binary tests between class-conditional dist...
Hypercube Policy Regularization Framework for Offline Reinforcement Learning : Abstract: Offline reinforcement learning has received extensive attention from scholars because it avoids the interaction between the agent and the environment by learning a policy through a static da...
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure : Abstract: This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, ...
IGC-Net for conditional average potential outcome estimation over time : Abstract: Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. However, many existing methods for this task fai...
Goal-Conditioned Reinforcement Learning from Sub-Optimal Data on Metric Spaces : Abstract: We study the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning under sparse rewards, invertible actions and deterministic tra...
Learning-based agricultural management in partially observable environments subject to climate variability : Abstract: Agricultural management, with a particular focus on fertilization strategies, holds a central role in shaping crop yield, economic profitability, and environmental sustainability. While conv...
Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data : Abstract: Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing hig...
YOR: Your Own Mobile Manipulator for Generalizable Robotics : Abstract: Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditizati...
SCRAPL: Scattering Transform with Random Paths for Machine Learning : Abstract: The Euclidean distance between wavelet scattering transform coefficients (known as paths) provides informative gradients for perceptual quality assessment of deep inverse problems in compute...
LCIP: Loss-Controlled Inverse Projection of High-Dimensional Image Data : Abstract: Projections (or dimensionality reduction) methods $P$ aim to map high-dimensional data to typically 2D scatterplots for visual exploration. Inverse projection methods $P^{-1}$ aim to map thi...
Renet: Principled and Efficient Relaxation for the Elastic Net via Dynamic Objective Selection : Abstract: We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators. While, on the one hand, $\ell_1$-regularization is a standard tool for variable ...
First International StepUP Competition for Biometric Footstep Recognition: Methods, Results and Remaining Challenges : Abstract: Biometric footstep recognition, based on a person's unique pressure patterns under their feet during walking, is an emerging field with growing applications in security and safety. However, ...
A Gibbs posterior sampler for inverse problem based on prior diffusion model : Abstract: This paper addresses the issue of inversion in cases where (1) the observation system is modeled by a linear transformation and additive noise, (2) the problem is ill-posed and regularizatio...
Characterizing Trainability of Instantaneous Quantum Polynomial Circuit Born Machines : Abstract: Instantaneous quantum polynomial quantum circuit Born machines (IQP-QCBMs) have been proposed as quantum generative models with a classically tractable training objective based on the maximu...
The emergence of numerical representations in communicating artificial agents : Abstract: Human languages provide efficient systems for expressing numerosities, but whether the sheer pressure to communicate is enough for numerical representations to arise in artificial agents, an...
Variational Optimality of F\"ollmer Processes in Generative Diffusions : Abstract: We construct and analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon using the stochastic interpolant framework. The drif...
Optimal Initialization in Depth: Lyapunov Initialization and Limit Theorems for Deep Leaky ReLU Networks : Abstract: The development of effective initialization methods requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep unbiased Leaky ReLU networks ...
SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora : Abstract: We present an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while handling semantic variations (substitution,...
Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids : Abstract: We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in t...
Deep Learning of Compositional Targets with Hierarchical Spectral Methods : Abstract: Why depth yields a genuine computational advantage over shallow methods remains a central open question in learning theory. We study this question in a controlled high-dimensional Gaussian s...
Self-Supervised Learning for Speaker Recognition: A study and review : Abstract: Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, m...
Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training : Abstract: The adaptation of large-scale Vision-Language Models (VLMs) through post-training reveals a pronounced generalization gap: models fine-tuned with Reinforcement Learning (RL) consistently ach...
Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM : Abstract: Large Language Models (LLMs) have achieved remarkable success, however, the emergence of content generation distortion (hallucination) limits their practical applications. The core cause of ...
Bayesian Signal Component Decomposition via Diffusion-within-Gibbs Sampling : Abstract: In signal processing, the data collected from sensing devices is often a noisy linear superposition of multiple components, and the estimation of components of interest constitutes a crucial...
Spectral-Spatial Contrastive Learning Framework for Regression on Hyperspectral Data : Abstract: Contrastive learning has demonstrated great success in representation learning, especially for image classification tasks. However, there is still a shortage in studies targeting regression ...
A Unified Experimental Architecture for Informative Path Planning: from Simulation to Deployment with GuadalPlanner : Abstract: The evaluation of informative path planning algorithms for autonomous vehicles is often hindered by fragmented execution pipelines and limited transferability between simulation and real-wor...
Robust Assortment Optimization from Observational Data : Abstract: Assortment optimization is a fundamental challenge in modern retail and recommendation systems, where the goal is to select a subset of products that maximizes expected revenue under complex...
Convergence Rates for Distribution Matching with Sliced Optimal Transport : Abstract: We study the slice-matching scheme, an efficient iterative method for distribution matching based on sliced optimal transport. We investigate convergence to the target distribution and deriv...
Beyond Task Performance: A Metric-Based Analysis of Sequential Cooperation in Heterogeneous Multi-Agent Destructive Foraging : Abstract: This work addresses the problem of analyzing cooperation in heterogeneous multi-agent systems which operate under partial observability and temporal role dependency, framed within a destruct...
A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization : Abstract: Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordi...
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks : Abstract: Speech Enhancement (SE) in audio devices is often supported by auxiliary modules for Voice Activity Detection (VAD), SNR estimation, or Acoustic Scene Classification to ensure robust context...
Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning : Abstract: In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory ...
Highly Adaptive Principal Component Regression : Abstract: The Highly Adaptive Lasso (HAL) is a nonparametric regression method that achieves almost dimension-free convergence rates under minimal smoothness assumptions, but its implementation can be...
Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood : Abstract: Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multipl...
Flow-Enabled Generalization to Human Demonstrations in Few-Shot Imitation Learning : Abstract: Imitation Learning (IL) enables robots to learn complex skills from demonstrations without explicit task modeling, but it typically requires large amounts of demonstrations, creating signifi...
Deep Bootstrap : Abstract: In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to lear...
Predictive-State Communication: Innovation Coding and Reconciliation under Delay : Abstract: Shannon theory models communication as the reliable transfer of symbol sequences, with performance governed by capacity and rate-distortion limits. When both endpoints possess strong predict...
Solving PDEs in One Shot via Fourier Features with Exact Analytical Derivatives : Abstract: Recent random feature methods for solving partial differential equations (PDEs) reduce computational cost compared to physics-informed neural networks (PINNs) but still rely on iterative opt...
Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models : Abstract: Agentic theorem provers -- pipelines that couple a mathematical reasoning model with library retrieval, subgoal-decomposition/search planner, and a proof assistant verifier -- have recently ...
Statistical Inference and Learning for Shapley Additive Explanations (SHAP) : Abstract: The SHAP (short for Shapley additive explanation) framework has become an essential tool for attributing importance to variables in predictive tasks. In model-agnostic settings, SHAP uses th...
From Collapse to Improvement: Statistical Perspectives on the Evolutionary Dynamics of Iterative Training on Contaminated Sources : Abstract: The problem of model collapse has presented new challenges in iterative training of generative models, where such training with synthetic data leads to an overall degradation of performance....
Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise : Abstract: Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and ...
Privacy-Utility Tradeoffs in Quantum Information Processing : Abstract: When sensitive information is encoded in data, it is important to ensure the privacy of information when attempting to learn useful information from the data. There is a natural tradeoff whe...
Pricing Query Complexity of Multiplicative Revenue Approximation : Abstract: We study the pricing query complexity of revenue maximization for a single buyer whose private valuation is drawn from an unknown distribution. In this setting, the seller must learn the opt...
GPU-Fuzz: Finding Memory Errors in Deep Learning Frameworks : Abstract: GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by mod...
Online Generalized-mean Welfare Maximization: Achieving Near-Optimal Regret from Samples : Abstract: We study online fair allocation of $T$ sequentially arriving items among $n$ agents with heterogeneous preferences, with the objective of maximizing generalized-mean welfare, defined as the ...
Unlocked Backpropagation using Wave Scattering : Abstract: Both the backpropagation algorithm in machine learning and the maximum principle in optimal control theory are posed as a two-point boundary problem, resulting in a "forward-backward" lock. ...
Compute Only Once: UG-Separation for Efficient Large Recommendation Models : Abstract: Driven by scaling laws, recommender systems increasingly rely on large-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive train...
Distributed Online Convex Optimization with Nonseparable Costs and Constraints : Abstract: This paper studies distributed online convex optimization with time-varying coupled constraints, motivated by distributed online control in network systems. Most prior work assumes a separab...
End-to-End Semantic ID Generation for Generative Advertisement Recommendation : Abstract: Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequen...
Towards Affordable, Non-Invasive Real-Time Hypoglycemia Detection Using Wearable Sensor Signals : Abstract: Accurately detecting hypoglycemia without invasive glucose sensors remains a critical challenge in diabetes management, particularly in regions where continuous glucose monitoring (CGM) is p...
Flash-SD-KDE: Accelerating SD-KDE with Tensor Cores : Abstract: Score-debiased kernel density estimation (SD-KDE) achieves improved asymptotic convergence rates over classical KDE, but its use of an empirical score has made it significantly slower in pra...
Causal Effect Estimation with Learned Instrument Representations : Abstract: Instrumental variable (IV) methods mitigate bias from unobserved confounding in observational causal inference but rely on the availability of a valid instrument, which can often be difficul...
Physically Interpretable AlphaEarth Foundation Model Embeddings Enable LLM-Based Land Surface Intelligence : Abstract: Satellite foundation models produce dense embeddings whose physical interpretability remains poorly understood, limiting their integration into environmental decision systems. Using 12.1 mil...
Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models : Abstract: Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic,...
Conditional Uncertainty-Aware Political Deepfake Detection with Stochastic Convolutional Neural Networks : Abstract: Recent advances in generative image models have enabled the creation of highly realistic political deepfakes, posing risks to information integrity, public trust, and democratic processes. W...
Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks : Abstract: Context: JWST has enabled transmission spectroscopy at unprecedented precision, but stellar heterogeneities (spots and faculae) remain a dominant contamination source that can bias atmospher...
Flow Matching with Uncertainty Quantification and Guidance : Abstract: Despite the remarkable success of sampling-based generative models such as flow matching, they can still produce samples of inconsistent or degraded quality. To assess sample reliability and...
Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning : Abstract: Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained m...
Learning to Evict from Key-Value Cache : Abstract: The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or c...
ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs : Abstract: Recent advances in large language models (LLMs) have sparked growing interest in applying them to hardware design automation, particularly for accurate RTL code generation. Prior efforts fol...
Dissecting Performative Prediction: A Comprehensive Survey : Abstract: The field of performative prediction had its beginnings in 2020 with the seminal paper "Performative Prediction" by Perdomo et al., which established a novel machine learning setup where the...
Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization : Abstract: Traditional video retrieval benchmarks focus on matching precise descriptions to closed video pools, failing to reflect real-world searches characterized by fuzzy, multi-dimensional memories...
STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations : Abstract: Predicting how genetic perturbations change cellular state is a core problem for building controllable models of gene regulation. Perturbations targeting the same gene can produce different ...
Basic Legibility Protocols Improve Trusted Monitoring : Abstract: The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions during deployment. Because human oversight i...
Validating Interpretability in siRNA Efficacy Prediction: A Perturbation-Based, Dataset-Aware Protocol : Abstract: Saliency maps are increasingly used as \emph{design guidance} in siRNA efficacy prediction, yet attribution methods are rarely validated before motivating sequence edits. We introduce a \tex...
Diffusion-Pretrained Dense and Contextual Embeddings : Abstract: In this report, we introduce pplx-embed, a family of multilingual embedding models that employ multi-stage contrastive learning on a diffusion-pretrained language model backbone for web-scal...
TabICLv2: A better, faster, scalable, and open tabular foundation model : Abstract: Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for...
Just on Time: Token-Level Early Stopping for Diffusion Language Models : Abstract: Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising ...
From Circuits to Dynamics: Understanding and Stabilizing Failure in 3D Diffusion Transformers : Abstract: Reliable surface completion from sparse point clouds underpins many applications spanning content creation and robotics. While 3D diffusion transformers attain state-of-the-art results on th...
Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards : Abstract: Reinforcement learning with verifiable rewards has driven recent advances in LLM post-training, in particular for reasoning. Policy optimization algorithms generate a number of responses for...
The Offline-Frontier Shift: Diagnosing Distributional Limits in Generative Multi-Objective Optimization : Abstract: Offline multi-objective optimization (MOO) aims to recover Pareto-optimal designs given a finite, static dataset. Recent generative approaches, including diffusion models, show strong perfor...
From Natural Language to Materials Discovery:The Materials Knowledge Navigation Agent : Abstract: Accelerating the discovery of high-performance materials remains a central challenge across energy, electronics, and aerospace technologies, where traditional workflows depend heavily on exp...
Statistical Learning Analysis of Physics-Informed Neural Networks : Abstract: We study the training and performance of physics-informed learning for initial and boundary value problems (IBVP) with physics-informed neural networks (PINNs) from a statistical learning pe...
MerLin: A Discovery Engine for Photonic and Hybrid Quantum Machine Learning : Abstract: Identifying where quantum models may offer practical benefits in near term quantum machine learning (QML) requires moving beyond isolated algorithmic proposals toward systematic and empirica...
Token-Efficient Change Detection in LLM APIs : Abstract: Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box ac...
Motion Capture is Not the Target Domain: Scaling Synthetic Data for Learning Motion Representations : Abstract: Synthetic data offers a compelling path to scalable pretraining when real-world data is scarce, but models pretrained on synthetic data often fail to transfer reliably to deployment settings...
MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation : Abstract: Graph neural networks (GNNs) have revolutionized recommender systems by effectively modeling complex user-item interactions, yet data sparsity and the item cold-start problem significantly i...
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models : Abstract: The multi-commodity flow (MCF) problem is a fundamental topic in network flow and combinatorial optimization, with broad applications in transportation, communication, and logistics, etc. No...
Learning Page Order in Shuffled WOO Releases : Abstract: We investigate document page ordering on 5,461 shuffled WOO documents (Dutch freedom of information releases) using page embeddings. These documents are heterogeneous collections such as ema...
When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging : Abstract: We study same-source multi-view learning and adversarial robustness for next-day direction prediction with financial image representations. On Shanghai Gold Exchange (SGE) spot gold data (20...
TVCACHE: A Stateful Tool-Value Cache for Post-Training LLM Agents : Abstract: In RL post-training of LLM agents, calls to external tools take several seconds or even minutes, leaving allocated GPUs idle and inflating post-training time and cost. While many tool invoca...
Sample Efficient Generative Molecular Optimization with Joint Self-Improvement : Abstract: Generative molecular optimization aims to design molecules with properties surpassing those of existing compounds. However, such candidates are rare and expensive to evaluate, yielding sampl...
A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions : Abstract: We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskeda...
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs : Abstract: Knowledge editing (KE) enables precise modifications to factual content in large language models (LLMs). Existing KE methods are largely designed for dense architectures, limiting their appl...
Stochastic Parroting in Temporal Attention -- Regulating the Diagonal Sink : Abstract: Spatio-temporal models analyze spatial structures and temporal dynamics, which makes them prone to information degeneration among space and time. Prior literature has demonstrated that over-...
CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control : Abstract: Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open cha...
Spatial-Morphological Modeling for Multi-Attribute Imputation of Urban Blocks : Abstract: Accurate reconstruction of missing morphological indicators of a city is crucial for urban planning and data-driven analysis. This study presents the spatial-morphological (SM) imputer tool,...
Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins : Abstract: We study safe online reinforcement learning in Constrained Markov Decision Processes (CMDPs) under strong regret and violation metrics, which forbid error cancellation over time. Existing pr...
Tuning the burn-in phase in training recurrent neural networks improves their performance : Abstract: Training recurrent neural networks (RNNs) with standard backpropagation through time (BPTT) can be challenging, especially in the presence of long input sequences. A practical alternative to...
Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation : Abstract: In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--na...
The Sample Complexity of Uniform Approximation for Multi-Dimensional CDFs and Fixed-Price Mechanisms : Abstract: We study the sample complexity of learning a uniform approximation of an $n$-dimensional cumulative distribution function (CDF) within an error $ε> 0$, when observations are restricted to a ...
Automated Model Design using Gated Neuron Selection in Telecom : Abstract: The telecommunications industry is experiencing rapid growth in adopting deep learning for critical tasks such as traffic prediction, signal strength prediction, and quality of service optim...
SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios : Abstract: Large language models (LLMs) have been extensively studied for tasks like math competitions, complex coding, and scientific reasoning, yet their ability to accurately represent and simulate ...
Adaptive Sampling for Private Worst-Case Group Optimization : Abstract: Models trained by minimizing the average loss often fail to be accurate on small or hard-to-learn groups of the data. Various methods address this issue by optimizing a weighted objective th...
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization : Abstract: Aligning large language models (LLMs) on domain-specific data remains a fundamental challenge. Supervised fine-tuning (SFT) offers a straightforward way to inject domain knowledge but often ...
PRISM: Parallel Residual Iterative Sequence Model : Abstract: Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretic...
Semi-Supervised Cross-Domain Imitation Learning : Abstract: Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is c...
Collaborative Threshold Watermarking : Abstract: In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenan...
Predicting integers from continuous parameters : Abstract: We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the num...
Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking : Abstract: State-space language models such as Mamba and gated linear attention (GLA) offer efficient alternatives to transformers due to their linear complexity and parallel training, but often lack t...
Rising Multi-Armed Bandits with Known Horizons : Abstract: The Rising Multi-Armed Bandit (RMAB) framework models environments where expected rewards of arms increase with plays, which models practical scenarios where performance of each option impro...
SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining : Abstract: While FP8 attention has shown substantial promise in innovations like FlashAttention-3, its integration into the decoding phase of the DeepSeek Multi-head Latent Attention (MLA) architecture...
Reducing Estimation Uncertainty Using Normalizing Flows and Stratification : Abstract: Estimating the expectation of a real-valued function of a random variable from sample data is a critical aspect of statistical analysis, with far-reaching implications in various application...
Domain Knowledge Guided Bayesian Optimization For Autonomous Alignment Of Complex Scientific Instruments : Abstract: Bayesian Optimization (BO) is a powerful tool for optimizing complex non-linear systems. However, its performance degrades in high-dimensional problems with tightly coupled parameters and hi...
Evaluation metrics for temporal preservation in synthetic longitudinal patient data : Abstract: This study introduces a set of metrics for evaluating temporal preservation in synthetic longitudinal patient data, defined as artificially generated data that mimic real patients' repeated ...
Coarse-Grained Boltzmann Generators : Abstract: Sampling equilibrium molecular configurations from the Boltzmann distribution is a longstanding challenge. Boltzmann Generators (BGs) address this by combining exact-likelihood generative mo...
Generative clinical time series models trained on moderate amounts of patient data are privacy preserving : Abstract: Sharing medical data for machine learning model training purposes is often impossible due to the risk of disclosing identifying information about individual patients. Synthetic data produced...
Pupillometry and Brain Dynamics for Cognitive Load in Working Memory : Abstract: Cognitive load, the mental effort required during working memory, is central to neuroscience, psychology, and human-computer interaction. Accurate assessment is vital for adaptive learning, ...
On the Role of Consistency Between Physics and Data in Physics-Informed Neural Networks : Abstract: Physics-informed neural networks (PINNs) have gained significant attention as a surrogate modeling strategy for partial differential equations (PDEs), particularly in regimes where labeled d...
dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning : Abstract: Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically m...
Learning Mixture Density via Natural Gradient Expectation Maximization : Abstract: Mixture density networks are neural networks that produce Gaussian mixtures to represent continuous multimodal conditional densities. Standard training procedures involve maximum likelihood ...
Roughness-Informed Federated Learning : Abstract: Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet faces challenges in non-independent and identically distributed (no...
TRACE: Theoretical Risk Attribution under Covariate-shift Effects : Abstract: When a source-trained model $Q$ is replaced by a model $\tilde{Q}$ trained on shifted data, its performance on the source domain can change unpredictably. To address this, we study the two-m...
When Gradient Clipping Becomes a Control Mechanism for Differential Privacy in Deep Learning : Abstract: Privacy-preserving training on sensitive data commonly relies on differentially private stochastic optimization with gradient clipping and Gaussian noise. The clipping threshold is a critica...
Gauss-Newton Unlearning for the LLM Era : Abstract: Standard large language model training can create models that produce outputs their trainer deems unacceptable in deployment. The probability of these outputs can be reduced using methods su...
Online Min-Max Optimization: From Individual Regrets to Cumulative Saddle Points : Abstract: We propose and study an online version of min-max optimization based on cumulative saddle points under a variety of performance measures beyond convex-concave settings. After first observing...
Bridging the Compression-Precision Paradox: A Hybrid Architecture for Clinical EEG Report Generation with Guaranteed Measurement Accuracy : Abstract: Automated EEG monitoring requires clinician-level precision for seizure detection and reporting. Clinical EEG recordings exceed LLM context windows, requiring extreme compression (400:1+ rat...
What Makes Value Learning Efficient in Residual Reinforcement Learning? : Abstract: Residual reinforcement learning (RL) enables stable online refinement of expressive pretrained policies by freezing the base and learning only bounded corrections. However, value learning in...
Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models : Abstract: Looped Language Models (LoopLMs) perform multi-step latent reasoning prior to token generation and outperform conventional LLMs on reasoning benchmarks at smaller parameter budgets. However,...
Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving : Abstract: We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon det...
Enhancing Ride-Hailing Forecasting at DiDi with Multi-View Geospatial Representation Learning from the Web : Abstract: The proliferation of ride-hailing services has fundamentally transformed urban mobility patterns, making accurate ride-hailing forecasting crucial for optimizing passenger experience and urb...
Analyzing Fairness of Neural Network Prediction via Counterfactual Dataset Generation : Abstract: Interpreting the inference-time behavior of deep neural networks remains a challenging problem. Existing approaches to counterfactual explanation typically ask: What is the closest alternati...
A Multimodal Conditional Mixture Model with Distribution-Level Physics Priors : Abstract: Many scientific and engineering systems exhibit intrinsically multimodal behavior arising from latent regime switching and non-unique physical mechanisms. In such settings, learning the full...
Chamfer-Linkage for Hierarchical Agglomerative Clustering : Abstract: Hierarchical Agglomerative Clustering (HAC) is a widely-used clustering method based on repeatedly merging the closest pair of clusters, where inter-cluster distances are determined by a lin...
QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs : Abstract: Large language models (LLMs) demand substantial computational and memory resources, posing challenges for efficient deployment. Two complementary approaches have emerged to address these iss...
Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning : Abstract: Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction ($x$-prediction). In this...
LightGTS-Cov: Covariate-Enhanced Time Series Forecasting : Abstract: Time series foundation models are typically pre-trained on large, multi-source datasets; however, they often ignore exogenous covariates or incorporate them via simple concatenation with the...
LUCID: Attention with Preconditioned Representations : Abstract: Softmax-based dot-product attention is a cornerstone of Transformer architectures, enabling remarkable capabilities such as in-context learning. However, as context lengths increase, a funda...
Gated Removal of Normalization in Transformers Enables Stable Training and Efficient Inference : Abstract: Normalization is widely viewed as essential for stabilizing Transformer training. We revisit this assumption for pre-norm Transformers and ask to what extent sample-dependent normalization i...
Experimental Demonstration of Online Learning-Based Concept Drift Adaptation for Failure Detection in Optical Networks : Abstract: We present a novel online learning-based approach for concept drift adaptation in optical network failure detection, achieving up to a 70% improvement in performance over conventional static...
Tensor Methods: A Unified and Interpretable Approach for Material Design : Abstract: When designing new materials, it is often necessary to tailor the material design (with respect to its design parameters) to have some desired properties (e.g. Young's modulus). As the set o...
Colorful Talks with Graphs: Human-Interpretable Graph Encodings for Large Language Models : Abstract: Graph problems are fundamentally challenging for large language models (LLMs). While LLMs excel at processing unstructured text, graph tasks require reasoning over explicit structure, permut...
Deep learning outperforms traditional machine learning methods in predicting childhood malnutrition: evidence from survey data : Abstract: Childhood malnutrition remains a major public health concern in Nepal and other low-resource settings, while conventional case-finding approaches are labor-intensive and frequently unavailab...
Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs : Abstract: Vision-Language-Action Models (VLAs) have emerged as a key paradigm of Physical AI and are increasingly deployed in autonomous vehicles, robots, and smart spaces. In these resource-constrain...
Simple LLM Baselines are Competitive for Model Diffing : Abstract: Standard LLM evaluations only test capabilities or dispositions that evaluators designed them for, missing unexpected differences such as behavioral shifts between model revisions or emergen...
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution : Abstract: Contrastive learning has emerged as a powerful framework for learning generalizable representations, yet its theoretical understanding remains limited, particularly under imbalanced data dis...
Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models : Abstract: We present a scalable, AI-powered system that identifies and extracts evidence-based behavioral nudges from unstructured biomedical literature. Nudges are subtle, non-coercive interventions ...
Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training : Abstract: Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they ...
R2RAG-Flood: A reasoning-reinforced training-free retrieval augmentation generation framework for flood damage nowcasting : Abstract: R2RAG-Flood is a reasoning-reinforced, training-free retrieval-augmented generation framework for post-storm property damage nowcasting. Building on an existing supervised tabular predictor,...
ICODEN: Ordinary Differential Equation Neural Networks for Interval-Censored Data : Abstract: Predicting time-to-event outcomes when event times are interval censored is challenging because the exact event time is unobserved. Many existing survival analysis approaches for interval-ce...
Configuration-to-Performance Scaling Law with Neural Ansatz : Abstract: Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size N and data size D. These laws assume that other training hyperparamet...
What Does Preference Learning Recover from Pairwise Comparison Data? : Abstract: Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-...
Linear-LLM-SCM: Benchmarking LLMs for Coefficient Elicitation in Linear-Gaussian Causal Models : Abstract: Large language models (LLMs) have shown potential in identifying qualitative causal relations, but their ability to perform quantitative causal reasoning -- estimating effect sizes that para...
Kernel-Based Learning of Chest X-ray Images for Predicting ICU Escalation among COVID-19 Patients : Abstract: Kernel methods have been extensively utilized in machine learning for classification and prediction tasks due to their ability to capture complex non-linear data patterns. However, single ke...
Modeling Programming Skills with Source Code Embeddings for Context-aware Exercise Recommendation : Abstract: In this paper, we propose a context-aware recommender system that models students' programming skills using embeddings of the source code they submit throughout a course. These embeddings pr...
Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence : Abstract: When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from t...
Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs : Abstract: Large audio language models are increasingly used for complex audio understanding tasks, but they struggle with temporal tasks that require precise temporal grounding, such as word alignment...
PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction : Abstract: Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically require...
Temper-Then-Tilt: Principled Unlearning for Generative Models through Tempering and Classifier Guidance : Abstract: We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution rather than supervised fine-tuning. While classifier guidance ...
ELROND: Exploring and decomposing intrinsic capabilities of diffusion models : Abstract: A single text prompt passed to a diffusion model often yields a wide range of visual outputs determined solely by stochastic process, leaving users with no direct control over which specific...
Rank-Accuracy Trade-off for LoRA: A Gradient-Flow Analysis : Abstract: Previous empirical studies have shown that LoRA achieves accuracy comparable to full-parameter methods on downstream fine-tuning tasks, even for rank-1 updates. By contrast, the theoretical ...
How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge : Abstract: Large language models (LLMs) continue to struggle with knowledge-intensive questions that require up-to-date information and multi-hop reasoning. Augmenting LLMs with hybrid external knowled...
Neural Network Quantum Field Theory from Transformer Architectures : Abstract: We propose a neural-network construction of Euclidean scalar quantum field theories from transformer attention heads, defining $n$-point correlators by averaging over random network paramete...
Adaptive Optimization via Momentum on Variance-Normalized Gradients : Abstract: We introduce MVN-Grad (Momentum on Variance-Normalized Gradients), an Adam-style optimizer that improves stability and performance by combining two complementary ideas: variance-based normal...
Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting : Abstract: Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metri...
RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World Environments : Abstract: Temporally locating and classifying fine-grained sub-task segments in long, untrimmed videos is crucial to safe human-robot collaboration. Unlike generic activity recognition, collaborative ...
A Multimodal Manufacturing Safety Chatbot: Knowledge Base Design, Benchmark Development, and Evaluation of Multiple RAG Approaches : Abstract: Ensuring worker safety remains a critical challenge in modern manufacturing environments. Industry 5.0 reorients the prevailing manufacturing paradigm toward more human-centric operations. U...
Designing Beyond Language: Sociotechnical Barriers in AI Health Technologies for Limited English Proficiency : Abstract: Limited English proficiency (LEP) patients in the U.S. face systemic barriers to healthcare beyond language and interpreter access, encompassing procedural and institutional constraints. AI ...
HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba : Abstract: End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, ineffici...
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding : Abstract: Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding. Existing Table-as-Text approaches flatten tables for large language...
Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision : Abstract: Imitation learning is a promising paradigm for training robot agents; however, standard approaches typically require substantial data acquisition -- via numerous demonstrations or random exp...
Localized Graph-Based Neural Dynamics Models for Terrain Manipulation : Abstract: Predictive models can be particularly helpful for robots to effectively manipulate terrains in construction sites and extraterrestrial surfaces. However, terrain state representations become...
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data : Abstract: Collecting and labeling large real-world wild animal datasets is impractical, costly, error-prone, and labor-intensive. For animal monitoring tasks, as detection, tracking, and pose estimati...
Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning : Abstract: Recent advances in large-scale code generation models have led to remarkable progress in producing high-quality code. These models are trained in a self-supervised manner on extensive unlabe...
Tensor learning with orthogonal, Lorentz, and symplectic symmetries : Abstract: Tensors are a fundamental data structure for many scientific contexts, such as time series analysis, materials science, and physics, among many others. Improving our ability to produce and h...
Structured Sentiment Analysis as Transition-based Dependency Graph Parsing : Abstract: Structured sentiment analysis (SSA) aims to automatically extract people's opinions from a text in natural language and adequately represent that information in a graph structure. One of the...
CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs : Abstract: Developing agents capable of open-endedly discovering and learning novel skills is a grand challenge in Artificial Intelligence. While reinforcement learning offers a powerful framework for ...
ClinAlign: Scaling Healthcare Alignment from Clinician Preference : Abstract: Although large language models (LLMs) demonstrate expert-level medical knowledge, aligning their open-ended outputs with fine-grained clinician preferences remains challenging. Existing meth...
SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning : Abstract: Large Vision-Language Models (LVLMs) have demonstrated strong reasoning capabilities in geo-localization, yet they often struggle in real-world scenarios where visual cues are sparse, long-t...
World of Workflows: A Benchmark for Bringing World Models to Enterprise Systems : Abstract: Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects acros...
Meta Context Engineering via Agentic Skill Evolution : Abstract: The operational efficacy of large language models relies heavily on their inference-time context. This has established Context Engineering (CE) as a formal discipline for optimizing these in...
Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling : Abstract: Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) h...
GENIUS: Generative Fluid Intelligence Evaluation Suite : Abstract: Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on reca...
Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via Normalizing Flows : Abstract: Hierarchical goal-conditioned reinforcement learning (H-GCRL) provides a powerful framework for tackling complex, long-horizon tasks by decomposing them into structured subgoals. However, it...
Weight Decay Improves Language Model Plasticity : Abstract: The prevailing paradigm in large language model (LLM) development is to pretrain a base model, then perform further training to improve performance and model behavior. However, hyperparamete...
Learning to Compose for Cross-domain Agentic Workflow Generation : Abstract: Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks be...
Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away : Abstract: Reinforcement learning (RL) based post-training for explicit chain-of-thought (e.g., GRPO) improves the reasoning ability of multimodal large-scale reasoning models (MLRMs). But recent evide...
Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates : Abstract: Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existi...
DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning : Abstract: In the current landscape of Large Language Models (LLMs), the curation of large-scale, high-quality training data is a primary driver of model performance. A key lever is the \emph{data reci...
General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies : Abstract: Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset. However, practic...
GRASP: group-Shapley feature selection for patients : Abstract: Feature selection remains a major challenge in medical prediction, where existing approaches such as LASSO often lack robustness and interpretability. We introduce GRASP, a novel framework t...
SteuerLLM: Local specialized large language model for German tax law analysis : Abstract: Large language models (LLMs) demonstrate strong general reasoning and language understanding, yet their performance degrades in domains governed by strict formal rules, precise terminology, ...
In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution : Abstract: We propose activation-based data attribution, a method that traces behavioral changes in post-trained language models to responsible training datapoints. By computing activation-difference v...
Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing : Abstract: Sixth-generation (6G) radio access networks (RANs) must enforce strict service-level agreements (SLAs) for heterogeneous slices, yet sudden latency spikes remain difficult to diagnose and re...
Chatting with Images for Introspective Visual Thinking : Abstract: Current large vision-language models (LVLMs) typically rely on text-only reasoning based on a single-pass visual encoding, which often leads to loss of fine-grained visual information. Recen...
Conversational Behavior Modeling Foundation Model With Multi-Level Perception : Abstract: Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this perceptual pathway is key to building natural full-duplex interactive sys...
GraphSeek: Next-Generation Graph Analytics with LLMs : Abstract: Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale p...
Language Model Inversion through End-to-End Differentiation : Abstract: Despite emerging research on Language Models (LM), few approaches analyse the invertibility of LMs. That is, given a LM and a desirable target output sequence of tokens, determining what inp...
Linguistic Indicators of Early Cognitive Decline in the DementiaBank Pitt Corpus: A Statistical and Machine Learning Study : Abstract: Background: Subtle changes in spontaneous language production are among the earliest indicators of cognitive decline. Identifying linguistically interpretable markers of dementia can support...
Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting : Abstract: Accurate counting of surgical instruments in Operating Rooms (OR) is a critical prerequisite for ensuring patient safety during surgery. Despite recent progress of large visual-language mode...
ContactGaussian-WM: Learning Physics-Grounded World Model from Videos : Abstract: Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model ...
OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories : Abstract: This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safe...
From Buffers to Registers: Unlocking Fine-Grained FlashAttention with Hybrid-Bonded 3D NPU Co-Design : Abstract: Transformer-based models dominate modern AI workloads but exacerbate memory bottlenecks due to their quadratic attention complexity and ever-growing model sizes. Existing accelerators, such ...
CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data : Abstract: Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geo...
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression : Abstract: We present ROCKET, a training-free model compression method that achieves state-of-the-art performance in comparison with factorization, structured-sparsification and dynamic compression bas...
Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception : Abstract: Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN...
Fine-Tuning GPT-5 for GPU Kernel Generation : Abstract: Developing efficient GPU kernels is essential for scaling modern AI systems, yet it remains a complex task due to intricate hardware architectures and the need for specialized optimization e...
LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules : Abstract: Despite its huge number of variants, standard Low-Rank Adaptation (LoRA) is still a dominant technique for parameter-efficient fine-tuning (PEFT). Nonetheless, it faces persistent challenges...
RiemannGL: Riemannian Geometry Changes Graph Deep Learning : Abstract: Graphs are ubiquitous, and learning on graphs has become a cornerstone in artificial intelligence and data mining communities. Unlike pixel grids in images or sequential structures in langua...
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development : Abstract: Agents powered by large language models (LLMs) are increasingly adopted in the software industry, contributing code as collaborators or even autonomous developers. As their presence grows, i...
Healthy Harvests: A Comparative Look at Guava Disease Classification Using InceptionV3 : Abstract: Guava fruits often suffer from many diseases. This can harm fruit quality and fruit crop yield. Early identification is important for minimizing damage and ensuring fruit health. This study ...
Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers : Abstract: Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains po...
Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) generate text by iteratively denoising a masked sequence, repeatedly deciding which positions to commit at each step. Standard decoding follows a greedy rule...
Computational Phenomenology of Temporal Experience in Autism: Quantifying the Emotional and Narrative Characteristics of Lived Unpredictability : Abstract: Disturbances in temporality, such as desynchronization with the social environment and its unpredictability, are considered core features of autism with a deep impact on relationships. Howev...
What do people want to fact-check? : Abstract: Research on misinformation has focused almost exclusively on supply, asking what falsehoods circulate, who produces them, and whether corrections work. A basic demand-side question remains u...
Traceable, Enforceable, and Compensable Participation: A Participation Ledger for People-Centered AI Governance : Abstract: Participatory approaches are widely invoked in AI governance, yet participation rarely translates into durable influence. In public sector and civic AI systems, community contributions such ...
Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System : Abstract: The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "S...
Resource-Efficient Model-Free Reinforcement Learning for Board Games : Abstract: Board games have long served as complex decision-making benchmarks in artificial intelligence. In this field, search-based reinforcement learning methods such as AlphaZero have achieved rema...
Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search : Abstract: Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it int...
The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems : Abstract: We present the setup and the tasks of the FinMMEval Lab at CLEF 2026, which introduces the first multilingual and multimodal evaluation framework for financial Large Language Models (LLMs). ...
Diagnosing Structural Failures in LLM-Based Evidence Extraction for Meta-Analysis : Abstract: Systematic reviews and meta-analyses rely on converting narrative articles into structured, numerically grounded study records. Despite rapid advances in large language models (LLMs), it rem...
FedPS: Federated data Preprocessing via aggregated Statistics : Abstract: Federated Learning (FL) enables multiple parties to collaboratively train machine learning models without sharing raw data. However, before training, data must be preprocessed to address mis...
ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents : Abstract: Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-no...
Time Series Foundation Models for Energy Load Forecasting on Consumer Hardware: A Multi-Dimensional Zero-Shot Benchmark : Abstract: Time Series Foundation Models (TSFMs) have introduced zero-shot prediction capabilities that bypass the need for task-specific training. Whether these capabilities translate to mission-criti...
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval : Abstract: Multivariate time series forecasting (MTSF) plays a vital role in numerous real-world applications, yet existing models remain constrained by their reliance on a limited historical context. ...
Flow caching for autoregressive video generation : Abstract: Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this se...
Beyond Confidence: The Rhythms of Reasoning in Generative Models : Abstract: Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and per...
PELLI: Framework to effectively integrate LLMs for quality software generation : Abstract: Recent studies have revealed that when LLMs are appropriately prompted and configured, they demonstrate mixed results. Such results often meet or exceed the baseline performance. However, th...
RSHallu: Dual-Mode Hallucination Evaluation for Remote-Sensing Multimodal Large Language Models with Domain-Tailored Mitigation : Abstract: Multimodal large language models (MLLMs) are increasingly adopted in remote sensing (RS) and have shown strong performance on tasks such as RS visual grounding (RSVG), RS visual question ans...
Transport, Don't Generate: Deterministic Geometric Flows for Combinatorial Optimization : Abstract: Recent advances in Neural Combinatorial Optimization (NCO) have been dominated by diffusion models that treat the Euclidean Traveling Salesman Problem (TSP) as a stochastic $N \times N$ heat...
VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection : Abstract: Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuse...
Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks : Abstract: Machine learning models are increasingly present in our everyday lives; as a result, they become targets of adversarial attackers seeking to manipulate the systems we interact with. A well-k...
LOREN: Low Rank-Based Code-Rate Adaptation in Neural Receivers : Abstract: Neural network based receivers have recently demonstrated superior system-level performance compared to traditional receivers. However, their practicality is limited by high memory and power...
Exploring the impact of adaptive rewiring in Graph Neural Networks : Abstract: This paper explores sparsification methods as a form of regularization in Graph Neural Networks (GNNs) to address high memory usage and computational costs in large-scale graph applications....
SecureScan: An AI-Driven Multi-Layer Framework for Malware and Phishing Detection Using Logistic Regression and Threat Intelligence Integration : Abstract: The growing sophistication of modern malware and phishing campaigns has diminished the effectiveness of traditional signature-based intrusion detection systems. This work presents SecureScan...
Self-Supervised Image Super-Resolution Quality Assessment based on Content-Free Multi-Model Oriented Representation Learning : Abstract: Super-resolution (SR) applied to real-world low-resolution (LR) images often results in complex, irregular degradations that stem from the inherent complexity of natural scene acquisition. I...
Calliope: A TTS-based Narrated E-book Creator Ensuring Exact Synchronization, Privacy, and Layout Fidelity : Abstract: A narrated e-book combines synchronized audio with digital text, highlighting the currently spoken word or sentence during playback. This format supports early literacy and assists individua...
A Diffusion-Based Generative Prior Approach to Sparse-view Computed Tomography : Abstract: The reconstruction of X-rays CT images from sparse or limited-angle geometries is a highly challenging task. The lack of data typically results in artifacts in the reconstructed image and ma...
Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents : Abstract: Long-term conversational memory is a core capability for LLM-based dialogue systems, yet existing benchmarks and evaluation protocols primarily focus on surface-level factual recall. In real...
Cross-Sectional Asset Retrieval via Future-Aligned Soft Contrastive Learning : Abstract: Asset retrieval--finding similar assets in a financial universe--is central to quantitative investment decision-making. Existing approaches define similarity through historical price pattern...
Interpretable Graph-Level Anomaly Detection via Contrast with Normal Prototypes : Abstract: The task of graph-level anomaly detection (GLAD) is to identify anomalous graphs that deviate significantly from the majority of graphs in a dataset. While deep GLAD methods have shown promi...
AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic perception and control, yet most existing approaches primarily rely on VLM trained using 2D images, ...
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training : Abstract: Training stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and i...
OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL : Abstract: Existing forgery detection methods are often limited to uni-modal or bi-modal settings, failing to handle the interleaved text, images, and videos prevalent in real-world misinformation. To ...
TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning : Abstract: Visual Chain-of-Thought (VCoT) has emerged as a promising paradigm for enhancing multimodal reasoning by integrating visual perception into intermediate reasoning steps. However, existing VC...
The Neurosymbolic Frontier of Nonuniform Ellipticity: Formalizing Sharp Schauder Theory via Topos-Theoretic Reasoning Models : Abstract: This white paper presents a critical synthesis of the recent breakthrough in nonuniformly elliptic regularity theory and the burgeoning field of neurosymbolic large reasoning models (LRMs). ...
A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology : Abstract: Medical foundation models have shown promise in controlled benchmarks, yet widespread deployment remains hindered by reliance on task-specific fine-tuning. Here, we introduce DermFM-Zero, a ...
Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling : Abstract: Reward models learned from human preferences are central to aligning large language models (LLMs) via reinforcement learning from human feedback, yet they are often vulnerable to reward hack...
Online Causal Kalman Filtering for Stable and Effective Policy Optimization : Abstract: Reinforcement learning for large language models suffers from high-variance token-level importance sampling (IS) ratios, which would destabilize policy optimization at scale. To improve stab...
Hierarchical Zero-Order Optimization for Deep Neural Networks : Abstract: Zeroth-order (ZO) optimization has long been favored for its biological plausibility and its capacity to handle non-differentiable objectives, yet its computational complexity has historical...
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters : Abstract: We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when buildin...
Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity : Abstract: The trade-off between interpretability and accuracy remains a core challenge in machine learning. Standard Generalized Additive Models (GAMs) offer clear feature attributions but are often c...
LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization : Abstract: Symbolic regression aims to distill mathematical equations from observational data. Recent approaches have successfully leveraged Large Language Models (LLMs) to generate equation hypotheses...
MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning : Abstract: Metaphorical comprehension in images remains a critical challenge for Nowadays AI systems. While Multimodal Large Language Models (MLLMs) excel at basic Visual Question Answering (VQA), they...
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning : Abstract: While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the c...
LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer : Abstract: A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-tr...
Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss : Abstract: Recent advances in large language models (LLMs) have enabled the development of multimodal medical AI. While models such as MedGemini achieve high accuracy on VQA tasks like USMLE MM, their ...
C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning : Abstract: Recent advances in 3D Large Multimodal Models (LMMs) built on Large Language Models (LLMs) have established the alignment of 3D visual features with LLM representations as the dominant parad...
Enhancing Weakly Supervised Multimodal Video Anomaly Detection through Text Guidance : Abstract: Weakly supervised multimodal video anomaly detection has gained significant attention, yet the potential of the text modality remains under-explored. Text provides explicit semantic informat...
RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated Images : Abstract: The rapid advancement of generative AI has raised concerns about the authenticity of digital images, as highly realistic fake images can now be generated at low cost, potentially increasing ...
$\mu$pscaling small models: Principled warm starts and hyperparameter transfer : Abstract: Modern large-scale neural networks are often trained and released in multiple sizes to accommodate diverse inference budgets. To improve efficiency, recent work has explored model upscaling:...
A Swap-Adversarial Framework for Improving Domain Generalization in Electroencephalography-Based Parkinson's Disease Prediction : Abstract: Electroencephalography (ECoG) offers a promising alternative to conventional electrocorticography (EEG) for the early prediction of Parkinson's disease (PD), providing higher spatial resolut...
AI-PACE: A Framework for Integrating AI into Medical Education : Abstract: The integration of artificial intelligence (AI) into healthcare is accelerating, yet medical education has not kept pace with these technological advancements. This paper synthesizes current...
LHAW: Controllable Underspecification for Long-Horizon Tasks : Abstract: Long-horizon workflow agents that operate effectively over extended periods are essential for truly autonomous systems. Their reliable execution critically depends on the ability to reason t...
Co-jump: Cooperative Jumping with Quadrupedal Robots via Multi-Agent Reinforcement Learning : Abstract: While single-agent legged locomotion has witnessed remarkable progress, individual robots remain fundamentally constrained by physical actuation limits. To transcend these boundaries, we int...
1%>100%: High-Efficiency Visual Adapter with Complex Linear Projection Optimization : Abstract: Deploying vision foundation models typically relies on efficient adaptation strategies, whereas conventional full fine-tuning suffers from prohibitive costs and low efficiency. While delta-t...
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation : Abstract: Graph Domain Adaptation (GDA) aims to bridge distribution shifts between domains by transferring knowledge from well-labeled source graphs to given unlabeled target graphs. One promising rec...
Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks : Abstract: We investigate the geometric structure of learning dynamics in overparameterized transformer models through carefully controlled modular arithmetic tasks. Our primary finding is that despite...
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation : Abstract: Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs but is challenged by complex, multi-faceted distributional shifts. Existing methods at...
Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI : Abstract: Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives...
Driving Reaction Trajectories via Latent Flow Matching : Abstract: Recent advances in reaction prediction have achieved near-saturated accuracy on standard benchmarks (e.g., USPTO), yet most state-of-the-art models formulate the task as a one-shot mapping f...
Why Human Guidance Matters in Collaborative Vibe Coding : Abstract: Writing code has been one of the most transformative ways for human societies to translate abstract ideas into tangible technologies. Modern AI is transforming this process by enabling exper...
Authenticated Workflows: A Systems Approach to Protecting Agentic AI : Abstract: Agentic AI systems automate enterprise workflows but existing defenses--guardrails, semantic filters--are probabilistic and routinely bypassed. We introduce authenticated workflows, the firs...
Constructing Industrial-Scale Optimization Modeling Benchmark : Abstract: Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and sol...
A Unified Theory of Random Projection for Influence Functions : Abstract: Influence functions and related data attribution scores take the form of $g^{\top}F^{-1}g^{\prime}$, where $F\succeq 0$ is a curvature operator. In modern overparameterized models, forming o...
LakeMLB: Data Lake Machine Learning Benchmark : Abstract: Modern data lakes have emerged as foundational platforms for large-scale machine learning, enabling flexible storage of heterogeneous data and structured analytics through table-oriented abs...
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning : Abstract: Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning. However, their performance on fine grained auditory perception remains unrelia...
Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering : Abstract: Sparse autoencoders (SAEs) decompose language model activations into interpretable features, but existing methods reveal only which features activate, not which change model outputs when amp...
A Dual-Stream Physics-Augmented Unsupervised Architecture for Runtime Embedded Vehicle Health Monitoring : Abstract: Runtime quantification of vehicle operational intensity is essential for predictive maintenance and condition monitoring in commercial and heavy-duty fleets. Traditional metrics like mileage...
Breaking the Curse of Repulsion: Optimistic Distributionally Robust Policy Optimization for Off-Policy Generative Recommendation : Abstract: Policy-based Reinforcement Learning (RL) has established itself as the dominant paradigm in generative recommendation for optimizing sequential user interactions. However, when applied to of...
AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles : Abstract: AIvilization v0 is a publicly deployed large-scale artificial society that couples a resource-constrained sandbox economy with a unified LLM-agent architecture, aiming to sustain long-horizo...
Equivariant Evidential Deep Learning for Interatomic Potentials : Abstract: Uncertainty quantification (UQ) is critical for assessing the reliability of machine learning interatomic potentials (MLIPs) in molecular dynamics (MD) simulations, identifying extrapolation...
AI-rithmetic : Abstract: Modern AI systems have been successfully deployed to win medals at international math competitions, assist with research workflows, and prove novel technical lemmas. However, despite their p...
Modular Multi-Task Learning for Chemical Reaction Prediction : Abstract: Adapting large language models (LLMs) trained on broad organic chemistry to smaller, domain-specific reaction datasets is a key challenge in chemical and pharmaceutical R&D. Effective specia...
Affordances Enable Partial World Modeling with LLMs : Abstract: Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast...
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs : Abstract: The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify div...
Making Databases Faster with LLM Evolutionary Sampling : Abstract: Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving the...
Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series : Abstract: Automatically discovering personalized sequential events from large-scale time-series data is crucial for enabling precision medicine in clinical research, yet it remains a formidable challe...
The Alignment Bottleneck in Decomposition-Based Claim Verification : Abstract: Structured claim decomposition is often proposed as a solution for verifying complex, multi-faceted claims, yet empirical results have been inconsistent. We argue that these inconsistencies ...
ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters : Abstract: To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small e...
Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT : Abstract: Purpose: Translating foundation models into clinical practice requires evaluating their performance under compound distribution shift, where severe class imbalance coexists with heterogeneou...
Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs : Abstract: Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters ...
Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality : Abstract: Human reasoning is shaped by resource rationality -- optimizing performance under constraints. Recently, inference-time scaling has emerged as a powerful paradigm to improve the reasoning pe...
Confounding Robust Continuous Control via Automatic Reward Shaping : Abstract: Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents' training. However, a principled way of designing effective reward shaping functions, especially for c...
ECHO: An Open Research Platform for Evaluation of Chat, Human Behavior, and Outcomes : Abstract: ECHO (Evaluation of Chat, Human behavior, and Outcomes) is an open research platform designed to support reproducible, mixed-method studies of human interaction with both conversational AI s...
ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting : Abstract: Generating 3D content from a single image remains a fundamentally challenging and ill-posed problem due to the inherent absence of geometric and textural information in occluded regions. Whi...
From Classical to Topological Neural Networks Under Uncertainty : Abstract: This chapter explores neural networks, topological data analysis, and topological deep learning techniques, alongside statistical Bayesian methods, for processing images, time series, and gr...
The Complexity of Bayesian Network Learning: Revisiting the Superstructure : Abstract: We investigate the parameterized complexity of Bayesian Network Structure Learning (BNSL), a classical problem that has received significant attention in empirical but also purely theoretica...
KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis : Abstract: Solid State Drives (SSDs) are critical to datacenters, consumer platforms, and mission-critical systems. Yet diagnosing their performance and reliability is difficult because data are fragme...
Transforming Policy-Car Swerving for Mitigating Stop-and-Go Traffic Waves: A Practice-Oriented Jam-Absorption Driving Strategy : Abstract: Stop-and-go waves, as a major form of freeway traffic congestion, cause severe and long-lasting adverse effects, including reduced traffic efficiency, increased driving risks, and higher veh...
ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise : Abstract: Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve, have demonstrated remarkable success in discovering novel mathematical constructions and solving challenging...
Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards : Abstract: Group Relative Policy Optimization (GRPO) assigns a single scalar advantage to all tokens in a completion. For structured generations with explicit segments and objectives, this couples unre...
Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents : Abstract: Optimizing large-scale machine learning systems, such as recommendation models for global video platforms, requires navigating a massive hyperparameter search space and, more critically, des...
Quantum Integrated Sensing and Computation with Indefinite Causal Order : Abstract: Quantum operations with indefinite causal order (ICO) represent a framework in quantum information processing where the relative order between two events can be indefinite. In this paper, we...
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for enhancing the reasoning capabilities of Large Language Models (LLMs). Despite its efficacy, RLV...
Versor: A Geometric Sequence Architecture : Abstract: A novel sequence architecture design is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of the traditional fundamental non-linear operations to achieve structural g...
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models : Abstract: Recent advances in large image editing models have shifted the paradigm from text-driven instructions to vision-prompt editing, where user intent is inferred directly from visual inputs such...
Towards Autonomous Mathematics Research : Abstract: Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-l...
Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe : Abstract: Reconstructing the early Universe from the evolved present-day Universe is a challenging and computationally demanding problem in modern astrophysics. We devise a novel generative framework,...
EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems : Abstract: As large language models (LLMs) continue to advance in programming tasks, LLM-driven coding systems have evolved from one-shot code generation into complex systems capable of iterative impro...
EVA: Towards a universal model of the immune system : Abstract: The effective application of foundation models to translational research in immune-mediated diseases requires multimodal patient-level representations that can capture complex phenotypes eme...
Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct : Abstract: The scarcity of high-quality segmentation masks remains a major bottleneck for medical image analysis, particularly in non-contrast CT (NCCT) neuroimaging, where manual annotation is costly ...
Beyond SMILES: Evaluating Agentic Systems for Drug Discovery : Abstract: Agentic systems for drug discovery have demonstrated autonomous synthesis planning, literature mining, and molecular design. We ask how well they generalize. Evaluating six frameworks agains...
Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment : Abstract: Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a systematic understanding of vulnerabilities in o...
AD$^2$: Analysis and Detection of Adversarial Threats in Visual Perception for End-to-End Autonomous Driving Systems : Abstract: End-to-end autonomous driving systems have achieved significant progress, yet their adversarial robustness remains largely underexplored. In this work, we conduct a closed-loop evaluation of...
NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers : Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is fundamental for molecular structure elucidation, yet interpreting spectra at scale remains time-consuming and highly expertise-dependent. Whi...
MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift : Abstract: Encryption has been commonly used in network traffic to secure transmission, but it also brings challenges for malicious traffic detection, due to the invisibility of the packet payload. Gra...
PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) enhance collaboration in Extended Reality (XR) environments by enabling flexible object and animation creation through the combination of natural lan...
PEST: Physics-Enhanced Swin Transformer for 3D Turbulence Simulation : Abstract: Accurate simulation of turbulent flows is fundamental to scientific and engineering applications. Direct numerical simulation (DNS) offers the highest fidelity but is computationally prohibi...
Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires : Abstract: Third-Party Risk Assessment (TPRA) is a core cybersecurity practice for evaluating suppliers against standards such as ISO/IEC 27001 and NIST. TPRA questionnaires are typically drawn from la...
Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks : Abstract: Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex multimodal harmful tasks. Mainstream black-box ...
On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner's View : Abstract: The use of Large Language Models (LLMs) has drawn growing interest within the scientific community. LLMs can handle large volumes of textual data and support methods for evidence synthesis. ...
Silence Routing: When Not Speaking Improves Collective Judgment : Abstract: The wisdom of crowds has been shown to operate not only for factual judgments but also in matters of taste, where accuracy is defined relative to an individual's preferences. However, it rem...
When LLMs get significantly worse: A statistical approach to detect model degradations : Abstract: Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accura...
Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study : Abstract: Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from stan...
Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible : Abstract: Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-le...
Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement : Abstract: Chart understanding is a quintessential information fusion task, requiring the seamless integration of graphical and textual data to extract meaning. The advent of Multimodal Large Language ...
Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation : Abstract: This work proposes MeCSAFNet, a multi-branch encoder-decoder architecture for land cover segmentation in multispectral imagery. The model separately processes visible and non-visible channel...
Reverse-Engineering Model Editing on Language Models : Abstract: Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information. Locate-then-edit methods, as a mainstream par...
AgentTrace: A Structured Logging Framework for Agent System Observability : Abstract: Despite the growing capabilities of autonomous agents powered by large language models (LLMs), their adoption in high-stakes domains remains limited. A key barrier is security: the inherentl...
TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models : Abstract: Development and operation of commercially viable fusion energy reactors such as tokamaks require accurate predictions of plasma dynamics from sparse, noisy, and incomplete sensors readings. ...
The Anatomy of the Moltbook Social Graph : Abstract: I present a descriptive analysis of Moltbook, a social platform populated exclusively by AI agents, using data from the platform's first 3.5 days (6{,}159 agents; 13{,}875 posts; 115{,}031 c...
"Humans welcome to observe": A First Look at the Agent Social Network Moltbook : Abstract: The rapid advancement of artificial intelligence (AI) agents has catalyzed the transition from static language models to autonomous agents capable of tool use, long-term planning, and social...
A Practical Guide to Agentic AI Transition in Organizations : Abstract: Agentic AI represents a significant shift in how intelligence is applied within organizations, moving beyond AI-assisted tools toward autonomous systems capable of reasoning, decision-making...
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke : Abstract: Accurate prediction of functional outcomes after acute ischemic stroke can inform clinical decision-making and resource allocation. Prior work on modified Rankin Scale (mRS) prediction has r...
FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight : Abstract: As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Ju...
GameDevBench: Evaluating Agentic Capabilities Through Game Development : Abstract: Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of so...
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion : Abstract: Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system...
Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation : Abstract: Large Language Models (LLMs) are increasingly used to generate and shape cultural content, ranging from narrative writing to artistic production. While these models demonstrate impressive fl...
Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics : Abstract: Despite chain-of-thought (CoT) playing crucial roles in LLM reasoning, directly rewarding it is difficult: training a reward model demands heavy human labeling efforts, and static RMs strugg...
SynergyKGC: Reconciling Topological Heterogeneity in Knowledge Graph Completion via Topology-Aware Synergy : Abstract: Knowledge Graph Completion (KGC) fundamentally hinges on the coherent fusion of pre-trained entity semantics with heterogeneous topological structures to facilitate robust relational reasoni...
See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch : Abstract: Block-based programming environments such as Scratch play a central role in low-code education, yet evaluating the capabilities of AI agents to construct programs through Graphical User Inte...
Integrating Generative AI-enhanced Cognitive Systems in Higher Education: From Stakeholder Perceptions to a Conceptual Framework considering the EU AI Act : Abstract: Many staff and students in higher education have adopted generative artificial intelligence (GenAI) tools in their work and study. GenAI is expected to enhance cognitive systems by enabling ...
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation : Abstract: Generative recommendation via autoregressive models has unified retrieval and ranking into a single conditional generation framework. However, fine-tuning these models with Reinforcement Lea...
OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization : Abstract: To develop socially intelligent AI, existing approaches typically model human behavioral dimensions (e.g., affective, cognitive, or social attributes) in isolation. Although useful, task-spe...
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks : Abstract: Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions, which is essential for natural social interaction. Although recent progr...
Neuro-symbolic Action Masking for Deep Reinforcement Learning : Abstract: Deep reinforcement learning (DRL) may explore infeasible actions during training and execution. Existing approaches assume a symbol grounding function that maps high-dimensional states to co...
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets : Abstract: Standard autoregressive language models generate text token-by-token from a fixed vocabulary, inducing a tree-structured state space when viewing token sampling as an action, which limits fl...
Abstraction Generation for Generalized Planning with Pretrained Large Language Models : Abstract: Qualitative Numerical Planning (QNP) serves as an important abstraction model for generalized planning (GP), which aims to compute general plans that solve multiple instances at once. Recent...
MERIT Feedback Elicits Better Bargaining in LLM Negotiators : Abstract: Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and...
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving : Abstract: Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in...
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation : Abstract: The deployment of Large Language Models (LLMs) in high-stakes clinical settings demands rigorous and reliable evaluation. However, existing medical benchmarks remain static, suffering from t...
Discovering Differences in Strategic Behavior Between Humans and LLMs : Abstract: As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. Whil...

Research Sources: 447 | Generated: 2/12/2026