AI Research News Feeds for October 27th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

MATrack: Efficient Multiscale Adaptive Tracker for Real-Time Nighttime UAV Operations : Abstract: Nighttime UAV tracking faces significant challenges in real-world robotics operations. Low-light conditions not only limit visual perception capabilities, but cluttered backgrounds and frequ...
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios : Abstract: By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus ...
IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals : Abstract: Semantic Scene Completion (SSC) has emerged as a pivotal approach for jointly learning scene geometry and semantics, enabling downstream applications such as navigation in mobile robotics. T...
Total Generalized Variation of the Normal Vector Field and Applications to Mesh Denoising : Abstract: We propose a novel formulation for the second-order total generalized variation (TGV) of the normal vector on an oriented, triangular mesh embedded in $\R^3$. The normal vector is considered...
An Evaluation of DUSt3R/MASt3R/VGGT 3D Reconstruction on Photogrammetric Aerial Blocks : Abstract: State-of-the-art 3D computer vision algorithms continue to advance in handling sparse, unordered image sets. Recently developed foundational models for 3D reconstruction, such as Dense and U...
Guided MRI Reconstruction via Schr\"odinger Bridge : Abstract: Magnetic Resonance Imaging (MRI) is an inherently multi-contrast modality, where cross-contrast priors can be exploited to improve image reconstruction from undersampled data. Recently, diff...
WMCopier: Forging Invisible Image Watermarks on Arbitrary Images : Abstract: Invisible Image Watermarking is crucial for ensuring content provenance and accountability in generative AI. While Gen-AI providers are increasingly integrating invisible watermarking system...
A robust and versatile deep learning model for prediction of the arterial input function in dynamic small animal $\left[^{18}\text{F}\right]$FDG PET imaging : Abstract: Dynamic positron emission tomography (PET) and kinetic modeling are pivotal in advancing tracer development research in small animal studies. Accurate kinetic modeling requires precise input...
AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation : Abstract: Free space ground segmentation is essential to navigate robots and autonomous vehicles, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging for exist...
VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance : Abstract: Recent video inpainting methods often employ image-to-video (I2V) priors to model temporal consistency across masked frames. While effective in moderate cases, these methods struggle under s...
CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis : Abstract: Deep learning models have achieved remarkable accuracy in chest X-ray diagnosis, yet their widespread clinical adoption remains limited by the black-box nature of their predictions. Clinicia...
ITC-RWKV: Interactive Tissue-Cell Modeling with Recurrent Key-Value Aggregation for Histopathological Subtyping : Abstract: Accurate interpretation of histopathological images demands integration of information across spatial and semantic scales, from nuclear morphology and cellular textures to global tissue orga...
GRAP-MOT: Unsupervised Graph-based Position Weighted Person Multi-camera Multi-object Tracking in a Highly Congested Space : Abstract: GRAP-MOT is a new approach for solving the person MOT problem dedicated to videos of closed areas with overlapping multi-camera views, where person occlusion frequently occurs. Our novel gra...
An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning : Abstract: Placental abruption is a severe complication during pregnancy, and its early accurate diagnosis is crucial for ensuring maternal and fetal safety. Traditional ultrasound diagnostic methods h...
Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations : Abstract: Classifier-Free Guidance (CFG) is an essential component of text-to-image diffusion models, and understanding and advancing its operational mechanisms remains a central focus of research. Ex...
Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video : Abstract: Foley Control is a lightweight approach to video-guided Foley that keeps pretrained single-modality models frozen and learns only a small cross-attention bridge between them. We connect V-JE...
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation : Abstract: In recent years, artificial intelligence has significantly advanced medical image segmentation. Nonetheless, challenges remain, including efficient 3D medical image processing across diverse...
Restore Text First, Enhance Image Later: Two-Stage Scene Text Image Super-Resolution with Glyph Structure Guidance : Abstract: Current generative super-resolution methods show strong performance on natural images but distort text, creating a fundamental trade-off between image quality and textual readability. To add...
Automated interictal epileptic spike detection from simple and noisy annotations in MEG data : Abstract: In drug-resistant epilepsy, presurgical evaluation of epilepsy can be considered. Magnetoencephalography (MEG) has been shown to be an effective exam to inform the localization of the epilep...
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data : Abstract: Salient object detection exemplifies data-bounded tasks where expensive pixel-precise annotations force separate model training for related subtasks like DIS and HR-SOD. We present a method ...
Modest-Align: Data-Efficient Alignment for Vision-Language Models : Abstract: Cross-modal alignment aims to map heterogeneous modalities into a shared latent space, as exemplified by models like CLIP, which benefit from large-scale image-text pretraining for strong re...
Epipolar Geometry Improves Video Generation Models : Abstract: Video generation models have progressed tremendously through large latent diffusion transformers trained with rectified flow techniques. Yet these models still struggle with geometric incons...
DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning : Abstract: Compared to 2D data, the scale of point cloud data in different domains available for training, is quite limited. Researchers have been trying to combine these data of different domains for ...
Long-tailed Species Recognition in the NACTI Wildlife Dataset : Abstract: As most ''in the wild'' data collections of the natural world, the North America Camera Trap Images (NACTI) dataset shows severe long-tailed class imbalance, noting that the largest 'Head' c...
Self-Supervised Learning of Synapse Types from EM Images : Abstract: Separating synapses into different classes based on their appearance in EM images has many applications in biology. Examples may include assigning a neurotransmitter to a particular class, o...
Foundation Models in Dermatopathology: Skin Tissue Classification : Abstract: The rapid generation of whole-slide images (WSIs) in dermatopathology necessitates automated methods for efficient processing and accurate classification. This study evaluates the performanc...
WorldGrow: Generating Infinite 3D World : Abstract: We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challeng...
BachVid: Training-Free Video Generation with Consistent Background and Character : Abstract: Diffusion Transformers (DiTs) have recently driven significant progress in text-to-video (T2V) generation. However, generating multiple videos with consistent characters and backgrounds rema...
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent : Abstract: When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robust...
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets : Abstract: Developing embodied AI agents requires scalable training environments that balance content diversity with physics accuracy. World simulators provide such environments but face distinct limit...
Lightweight Classifier for Detecting Intracranial Hemorrhage in Ultrasound Data : Abstract: Intracranial hemorrhage (ICH) secondary to Traumatic Brain Injury (TBI) represents a critical diagnostic challenge, with approximately 64,000 TBI-related deaths annually in the United States...
Eye-Tracking as a Tool to Quantify the Effects of CAD Display on Radiologists' Interpretation of Chest Radiographs : Abstract: Rationale and Objectives: Computer-aided detection systems for chest radiographs are widely used, and concurrent reader displays, such as bounding-box (BB) highlights, may influence the read...
Physics-Informed Deep Learning for Improved Input Function Estimation in Motion-Blurred Dynamic [${}^{18}$F]FDG PET Images : Abstract: Kinetic modeling enables \textit{in vivo} quantification of tracer uptake and glucose metabolism in [${}^{18}$F]Fluorodeoxyglucose ([${}^{18}$F]FDG) dynamic positron emission tomography (dPE...
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models : Abstract: Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise o...
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models : Abstract: Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. ...
WCCNet: Wavelet-context Cooperative Network for Efficient Multispectral Pedestrian Detection : Abstract: Multispectral pedestrian detection achieves better visibility in challenging conditions and thus is essential to autonomous driving, for which both the accuracy and computational cost are of...
Circle Representation for Medical Instance Object Segmentation : Abstract: Recently, circle representation has been introduced for medical imaging, designed specifically to enhance the detection of instance objects that are spherically shaped (e.g., cells, glomerul...
On the Influence of Shape, Texture and Color for Learning Semantic Segmentation : Abstract: Recent research has investigated the shape and texture biases of pre-trained deep neural networks (DNNs) in image classification. Those works test how much a trained DNN relies on specific i...
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation : Abstract: We present InfiniDreamer, a novel framework for arbitrarily long human motion generation. InfiniDreamer addresses the limitations of current motion generation methods, which are typically re...
Boosting Adversarial Transferability with Spatial Adversarial Alignment : Abstract: Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial...
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction : Abstract: Recent Multimodal Large Language Models (MLLMs) have typically focused on integrating visual and textual modalities, with less emphasis placed on the role of speech in enhancing interaction....
RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets : Abstract: We present RigAnything, a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints and skeleton topologies and assigning skinning ...
AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition : Abstract: The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic da...
LEGNet: A Lightweight Edge-Gaussian Network for Low-Quality Remote Sensing Image Object Detection : Abstract: Remote sensing object detection (RSOD) often suffers from degradations such as low spatial resolution, sensor noise, motion blur, and adverse illumination. These factors diminish feature dis...
RT-DATR: Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature Alignment : Abstract: Despite domain-adaptive object detectors based on CNN and transformers have made significant progress in cross-domain detection tasks, it is regrettable that domain adaptation for real-time ...
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues : Abstract: Idling vehicle detection (IVD) uses surveillance video and multichannel audio to localize and classify vehicles in the last frame as moving, idling, or engine-off in pick-up zones. IVD faces...
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning : Abstract: Multimodal agents, which integrate a controller e.g., a vision language model) with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks. Existing a...
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video : Abstract: Multimodal Large Language Models (MLLMs) increasingly excel at perception, understanding, and reasoning. However, current benchmarks inadequately evaluate their ability to perform these task...
MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception : Abstract: Micro-expressions (MEs), brief and low-intensity facial movements revealing concealed emotions, are crucial for affective computing. Despite notable progress in ME recognition, existing meth...
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining : Abstract: Contrastive learning (CL) is a prevalent technique for training embedding models, which pulls semantically similar examples (positives) closer in the representation space while pushing dissi...
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning : Abstract: Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing methods for integrating sp...
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling : Abstract: Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole s...
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models : Abstract: Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering ...
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation : Abstract: Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic tech...
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting : Abstract: Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, a...
Seeing the Arrow of Time in Large Multimodal Models : Abstract: The Arrow of Time (AoT)-time's irreversible flow shaping physical events-is fundamental to video comprehension, yet remains a significant challenge for modern large multimodal models (LMMs)....
AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches : Abstract: Cutting-edge works have demonstrated that text-to-image (T2I) diffusion models can generate adversarial patches that mislead state-of-the-art object detectors in the physical world, revealin...
Metropolis-Hastings Sampling for 3D Gaussian Reconstruction : Abstract: We propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS) that leverages comprehensive multi-view photometric error signals within a unified Metropolis-Hastings approach. Va...
TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation : Abstract: Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vis...
Depth-Supervised Fusion Network for Seamless-Free Image Stitching : Abstract: Image stitching synthesizes images captured from multiple perspectives into a single image with a broader field of view. The significant variations in object depth often lead to large parall...
MUVR: A Multi-Modal Untrimmed Video Retrieval Benchmark with Multi-Level Visual Correspondence : Abstract: We propose the Multi-modal Untrimmed Video Retrieval task, along with a new benchmark (MUVR) to advance video retrieval for long-video platforms. MUVR aims to retrieve untrimmed videos conta...
Bridging the gap to real-world language-grounded visual concept learning : Abstract: Human intelligence effortlessly interprets visual scenes along a rich spectrum of semantic dimensions. However, existing approaches to language-grounded visual concept learning are limited t...
ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents : Abstract: We propose ArtiLatent, a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance. Our approach jointly models ...
Anisotropic Pooling for LUT-realizable CNN Image Restoration : Abstract: Table look-up realization of image restoration CNNs has the potential of achieving competitive image quality while being much faster and resource frugal than the straightforward CNN implemen...
OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields : Abstract: Modeling the inherent hierarchical structure of 3D objects and 3D scenes is highly desirable, as it enables a more holistic understanding of environments for autonomous agents. Accomplishing...
MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection : Abstract: Video Anomaly Detection (VAD) aims to locate unusual activities or behaviors within videos. Recently, offline VAD has garnered substantial research attention, which has been invigorated by t...
Why Registration Quality Matters: Enhancing sCT Synthesis with IMPACT-Based Registration : Abstract: We participated in the SynthRAD2025 challenge (Tasks 1 and 2) with a unified pipeline for synthetic CT (sCT) generation from MRI and CBCT, implemented using the KonfAI framework. Our model i...
FEAT: Free energy Estimators with Adaptive Transport : Abstract: We present Free energy Estimators with Adaptive Transport (FEAT), a novel framework for free energy estimation -- a critical challenge across scientific domains. FEAT leverages learned trans...
Visualization Tasks for Unlabelled Graphs : Abstract: We investigate tasks that can be accomplished with unlabelled graphs, which are graphs with nodes that do not have attached persistent or semantically meaningful labels. New visualization te...
Approximating Signed Distance Fields of Implicit Surfaces with Sparse Ellipsoidal Radial Basis Function Networks : Abstract: Accurate and compact representation of signed distance functions (SDFs) of implicit surfaces is crucial for efficient storage, computation, and downstream processing of 3D geometry. In this ...
Register and [CLS] tokens yield a decoupling of local and global features in large ViTs : Abstract: Recent work has shown that the attention maps of the widely popular DINOv2 model exhibit artifacts, which hurt both model interpretability and performance on dense image tasks. These artifac...
Ensuring Functional Correctness of Large Code Models with Selective Generation : Abstract: The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficul...
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks : Abstract: An established measure of the expressive power of a given ReLU neural network is the number of linear regions into which it partitions the input space. There exist many different, non-equiva...
Anytime-valid, Bayes-assisted, Prediction-Powered Inference : Abstract: Given a large pool of unlabelled data and a smaller amount of labels, prediction-powered inference (PPI) leverages machine learning predictions to increase the statistical efficiency of conf...
Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant : Abstract: Lorentz-equivariant neural networks are becoming the leading architectures for high-energy physics. Current implementations rely on specialized layers, limiting architectural choices. We int...
STACI: Spatio-Temporal Aleatoric Conformal Inference : Abstract: Fitting Gaussian Processes (GPs) provides interpretable aleatoric uncertainty quantification for estimation of spatio-temporal fields. Spatio-temporal deep learning models, while scalable, t...
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks : Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit ...
Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training : Abstract: The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwin...
RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting : Abstract: Recent deep learning approaches for river discharge forecasting have improved the accuracy and efficiency in flood forecasting, enabling more reliable early warning systems for risk manageme...
zip2zip: Inference-Time Adaptive Tokenization via Online Compression : Abstract: Tokenization efficiency plays a critical role in the performance and cost of large language models (LLMs), yet most models rely on static tokenizers optimized on general-purpose corpora. The...
Self-Refining Language Model Anonymizers via Adversarial Distillation : Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent...
Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schr\"odinger Bridges : Abstract: We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a s...
FORLA: Federated Object-centric Representation Learning with Slot Attention : Abstract: Learning efficient visual representations across heterogeneous unlabeled datasets remains a central challenge in federated learning. Effective federated representations require features that...
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness : Abstract: Disaggregated evaluation across subgroups is critical for assessing the fairness of machine learning models, but its uncritical use can mislead practitioners. We show that equal performance ...
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning : Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contr...
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings : Abstract: Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this ...
Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers : Abstract: Pretrained language models (LMs) are prone to arithmetic errors. Existing work showed limited success in probing numeric values from models' representations, indicating that these errors can...
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies : Abstract: Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approac...
POCO: Scalable Neural Forecasting through Population Conditioning : Abstract: Predicting future neural activity is a core challenge in modeling brain dynamics, with applications ranging from scientific investigation to closed-loop neurotechnology. While recent models ...
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding : Abstract: Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time-quickly exceeding the fixed memory of phones, AR glas...
Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Unknown Environments : Abstract: We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as inter...
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality : Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures ref...
Stabilizing PDE--ML coupled systems : Abstract: A long-standing obstacle in the use of machine-learnt surrogates with larger PDE systems is the onset of instabilities when solved numerically. Efforts towards ameliorating these have mostly...
FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction : Abstract: As language models become capable of processing increasingly long and complex texts, there has been growing interest in their application within computational literary studies. However, eval...
Irish-BLiMP: A Linguistic Benchmark for Evaluating Human and Language Model Performance in a Low-Resource Setting : Abstract: We present Irish-BLiMP (Irish Benchmark of Linguistic Minimal Pairs), the first dataset and framework designed for fine-grained evaluation of linguistic competence in the Irish language, an ...
Can Confidence Estimates Decide When Chain-of-thought is Necessary for Llms? : Abstract: Chain-of-thought (CoT) prompting has emerged as a common technique for enhancing the reasoning abilities of large language models (LLMs). While extended reasoning can boost accuracy on compl...
Input Matters: Evaluating Input Structure's Impact on LLM Summaries of Sports Play-by-Play : Abstract: A major concern when deploying LLMs in accuracy-critical domains such as sports reporting is that the generated text may not faithfully reflect the input data. We quantify how input structur...
Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization : Abstract: Large language models (LLMs) excel at factual recall yet still propagate stale or incorrect knowledge. In-context knowledge editing offers a gradient-free remedy suitable for black-box APIs,...
Social Simulations with Large Language Model Risk Utopian Illusion : Abstract: Reliable simulation of human behavior is essential for explaining, predicting, and intervening in our society. Recent advances in large language models (LLMs) have shown promise in emulating...
Estonian Native Large Language Model Benchmark : Abstract: The availability of LLM benchmarks for the Estonian language is limited, and a comprehensive evaluation comparing the performance of different LLMs on Estonian tasks has yet to be conducted....
The "Right" Discourse on Migration: Analysing Migration-Related Tweets in Right and Far-Right Political Movements : Abstract: The rise of right-wing populism in Europe has brought to the forefront the significance of analysing social media discourse to understand the dissemination of extremist ideologies and their ...
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services : Abstract: Objective: Emergency medical dispatch (EMD) is a high-stakes process challenged by caller distress, ambiguity, and cognitive load. Large Language Models (LLMs) and Multi-Agent Systems (MAS) ...
PARL: Prompt-based Agents for Reinforcement Learning : Abstract: Large language models (LLMs) have demonstrated high performance on tasks expressed in natural language, particularly in zero- or few-shot settings. These are typically framed as supervised (...
Typoglycemia under the Hood: Investigating Language Models' Understanding of Scrambled Words : Abstract: Research in linguistics has shown that humans can read words with internally scrambled letters, a phenomenon recently dubbed typoglycemia. Some specific NLP models have recently been propose...
A Diagnostic Benchmark for Sweden-Related Factual Knowledge : Abstract: Many Swedish benchmarks are translated US-centric benchmarks, and therefore not suitable for testing knowledge that is particularly relevant, or even specific, to Sweden. We therefore introd...
SindBERT, the Sailor: Charting the Seas of Turkish NLP : Abstract: Transformer models have revolutionized NLP, yet many morphologically rich languages remain underrepresented in large-scale pre-training efforts. With SindBERT, we set out to chart the seas o...
HalleluBERT: Let every token that has meaning bear its weight : Abstract: Transformer-based models have advanced NLP, yet Hebrew still lacks a large-scale RoBERTa encoder which is extensively trained. Existing models such as HeBERT, AlephBERT, and HeRo are limited...
Redefining Retrieval Evaluation in the Era of LLMs : Abstract: Traditional Information Retrieval (IR) metrics, such as nDCG, MAP, and MRR, assume that human users sequentially examine documents with diminishing attention to lower ranks. This assumption ...
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization : Abstract: Recent advances in diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs). However, DLMs still lag behind LLMs in ...
Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models : Abstract: Pretrained language models are remarkably effective in aligning with human brain responses elicited by natural language stimuli, positioning them as promising model organisms for studying la...
InterpDetect: Interpretable Signals for Detecting Hallucinations in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) integrates external knowledge to mitigate hallucinations, yet models often generate outputs inconsistent with retrieved content. Accurate hallucination d...
Are the LLMs Capable of Maintaining at Least the Language Genus? : Abstract: Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this pape...
Automated Quality Control for Language Documentation: Detecting Phonotactic Inconsistencies in a Kokborok Wordlist : Abstract: Lexical data collection in language documentation often contains transcription errors and undocumented borrowings that can mislead linguistic analysis. We present unsupervised anomaly detect...
RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models : Abstract: Recently, large language models (LLMs) have demonstrated outstanding reasoning capabilities on mathematical and coding tasks. However, their application to financial tasks-especially the mos...
Can large audio language models understand child stuttering speech? speech summarization, and source separation : Abstract: Child speech differs from adult speech in acoustics, prosody, and language development, and disfluencies (repetitions, prolongations, blocks) further challenge Automatic Speech Recognition (...
Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization : Abstract: Electrophysiological (ExG) signals offer valuable insights into human physiology, yet building foundation models that generalize across everyday tasks remains challenging due to two key limi...
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training : Abstract: We discover a novel and surprising phenomenon of unintentional misalignment in reasoning language models (RLMs), which we call self-jailbreaking. Specifically, after benign reasoning trainin...
Designing and Evaluating Hint Generation Systems for Science Education : Abstract: Large language models are influencing the education landscape, with students relying on them in their learning process. Often implemented using general-purpose models, these systems are like...
KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution : Abstract: The rapid progress of multimodal large language models (MLLMs) calls for more reliable evaluation protocols. Existing static benchmarks suffer from the potential risk of data contamination a...
ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem : Abstract: With the rapid development of (multimodal) large language model-based agents, the landscape of agentic service management has evolved from single-agent systems to multi-agent systems, and no...
Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research : Abstract: Deep Research systems have revolutionized how LLMs solve complex questions through iterative reasoning and evidence gathering. However, current systems remain fundamentally constrained to te...
Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation : Abstract: Online spaces provide individuals with the opportunity to engage in discussions on important topics and make collective decisions, regardless of their geographic location or time zone. Howev...
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching : Abstract: Distillation has shown remarkable success in transferring knowledge from a Large Language Model (LLM) teacher to a student LLM. However, current distillation methods require similar tokenize...
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models : Abstract: Measuring scientific paper innovation is both important and challenging. Existing content-based methods often overlook the full-paper context, fail to capture the full scope of innovation, a...
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space : Abstract: We introduce SLED, an alternative approach to speech language modeling by encoding speech waveforms into sequences of continuous latent representations and modeling them autoregressively usi...
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding : Abstract: Autoregressive decoding inherently limits the inference throughput of Large Language Model (LLM) due to its sequential dependency. Speculative decoding mitigates this by verifying multiple p...
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning : Abstract: Recent reasoning-focused language models achieve high accuracy by generating lengthy intermediate reasoning paths before producing final answers. While this approach is effective in solving ...
Reverse Engineering Human Preferences with Reinforcement Learning : Abstract: The capabilities of Large Language Models (LLMs) are routinely evaluated by other LLMs trained to predict human preferences. This framework--known as LLM-as-a-judge--is highly scalable and r...
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation : Abstract: Pre-trained language models represented by the Transformer have been proven to possess strong base capabilities, and the representative self-attention mechanism in the Transformer has become...
Dependency Parsing is More Parameter-Efficient with Normalization : Abstract: Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-...
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction : Abstract: Turn-taking is richly multimodal. Predictive turn-taking models (PTTMs) facilitate naturalistic human-robot interaction, yet most rely solely on speech. We introduce MM-VAP, a multimodal PTT...
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning : Abstract: Large language models (LLMs) have demonstrated significant improvements in contextual understanding. However, their ability to attend to truly critical information during long-context reason...
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement : Abstract: Vision-Language Models (VLMs) generate discourse-level, multi-sentence visual descriptions, challenging text scene graph parsers built for single-sentence caption-to-graph mapping. Current a...
Knee-Deep in C-RASP: A Transformer Depth Hierarchy : Abstract: It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained? We answer this question ...
Marcel: A Lightweight and Open-Source Conversational Agent for University Student Support : Abstract: We present Marcel, a lightweight and open-source conversational agent designed to support prospective students with admission-related inquiries. The system aims to provide fast and personali...
Robust Preference Alignment via Directional Neighborhood Consensus : Abstract: Aligning large language models with human preferences is critical for creating reliable and controllable AI systems. A human preference can be visualized as a high-dimensional vector where d...
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation : Abstract: Tree search methods have demonstrated impressive performance in code generation. Previous methods combine tree search with reflection that summarizes past mistakes to achieve iterative impro...
Visual Cues Support Robust Turn-taking Prediction in Noise : Abstract: Accurate predictive turn-taking models (PTTMs) are essential for naturalistic human-robot interaction. However, little is known about their performance in noise. This study therefore explore...
Generative Point Tracking with Flow Matching : Abstract: Tracking a point through a video can be a challenging task due to uncertainty arising from visual obfuscations, such as appearance changes and occlusions. Although current state-of-the-art d...
Thermal Polarimetric Multi-view Stereo : Abstract: This paper introduces a novel method for detailed 3D shape reconstruction utilizing thermal polarization cues. Unlike state-of-the-art methods, the proposed approach is independent of illumi...
BioDet: Boosting Industrial Object Detection with Image Preprocessing Strategies : Abstract: Accurate 6D pose estimation is essential for robotic manipulation in industrial environments. Existing pipelines typically rely on off-the-shelf object detectors followed by cropping and pos...
ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models : Abstract: Understanding and reasoning about complex 3D environments requires structured scene representations that capture not only objects but also their semantic and spatial relationships. While rec...
WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition : Abstract: While recent semantic segmentation networks heavily rely on powerful pretrained encoders, most employ simplistic decoders, leading to suboptimal trade-offs between semantic context and fine-...
Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease : Abstract: Hirschsprung's disease is defined as the congenital absence of ganglion cells in some segment(s) of the colon. The muscle cannot make coordinated movements to propel stool in that section, m...
HistRetinex: Optimizing Retinex model in Histogram Domain for Efficient Low-Light Image Enhancement : Abstract: Retinex-based low-light image enhancement methods are widely used due to their excellent performance. However, most of them are time-consuming for large-sized images. This paper extends the ...
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments : Abstract: Visual reasoning in multimodal large language models (MLLMs) has primarily been studied in static, fully observable settings, limiting their effectiveness in real-world environments where in...
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts : Abstract: Large-scale foundation models provide powerful feature representations for downstream object segmentation tasks. However, when adapted to specific tasks through the full-parameter fine-tunin...
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation : Abstract: What exactly makes a particular image unsafe? Systematically differentiating between benign and problematic images is a challenging problem, as subtle changes to an image, such as an insulti...
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation : Abstract: Reinforcement learning (RL) has shown promise in enhancing the general Chain-of-Thought (CoT) reasoning capabilities of multimodal large language models (MLLMs). However, when applied to imp...
Digital Contrast CT Pulmonary Angiography Synthesis from Non-contrast CT for Pulmonary Vascular Disease : Abstract: Computed Tomography Pulmonary Angiography (CTPA) is the reference standard for diagnosing pulmonary vascular diseases such as Pulmonary Embolism (PE) and Chronic Thromboembolic Pulmonary Hyp...
Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study : Abstract: How to integrate and verify spatial intelligence in foundation models remains an open challenge. Current practice often proxies Visual-Spatial Intelligence (VSI) with purely textual prompts ...
Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation : Abstract: Recently, Flow Matching models have pushed the boundaries of high-fidelity data generation across a wide range of domains. It typically employs a single large network to learn the entire gen...
TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection : Abstract: Adapting CLIP for anomaly detection on unseen objects has shown strong potential in a zero-shot manner. However, existing methods typically rely on a single textual space to align with visua...
3rd Place Solution to ICCV LargeFineFoodAI Retrieval : Abstract: This paper introduces the 3rd place solution to the ICCV LargeFineFoodAI Retrieval Competition on Kaggle. Four basic models are independently trained with the weighted sum of ArcFace and Cir...
3rd Place Solution to Large-scale Fine-grained Food Recognition : Abstract: Food analysis is becoming a hot topic in health area, in which fine-grained food recognition task plays an important role. In this paper, we describe the details of our solution to the Large...
Improved Training Technique for Shortcut Models : Abstract: Shortcut models represent a promising, non-adversarial paradigm for generative modeling, uniquely supporting one-step, few-step, and multi-step sampling from a single trained network. Howeve...
Topology Sculptor, Shape Refiner: Discrete Diffusion Model for High-Fidelity 3D Meshes Generation : Abstract: In this paper, we introduce Topology Sculptor, Shape Refiner (TSSR), a novel method for generating high-quality, artist-style 3D meshes based on Discrete Diffusion Models (DDMs). Our primary...
Towards Physically Executable 3D Gaussian for Embodied Navigation : Abstract: 3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it...
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning : Abstract: Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities across a wide range of vision-language tasks. However, due to the restricted input resolutions, MLLMs face signif...
Morphologically Intelligent Perturbation Prediction with FORM : Abstract: Understanding how cells respond to external stimuli is a central challenge in biomedical research and drug development. Current computational frameworks for modelling cellular responses rema...
Dynamic Semantic-Aware Correlation Modeling for UAV Tracking : Abstract: UAV tracking can be widely applied in scenarios such as disaster rescue, environmental monitoring, and logistics transportation. However, existing UAV tracking methods predominantly emphasiz...
Wisdom and Delusion of LLM Ensembles for Code Generation and Repair : Abstract: Today's pursuit of a single Large Language Model (LMM) for all software engineering tasks is resource-intensive and overlooks the potential benefits of complementarity, where different model...
Head Pursuit: Probing Attention Specialization in Multimodal Transformers : Abstract: Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how in...
HollowFlow: Efficient Sample Likelihood Evaluation using Hollow Message Passing : Abstract: Flow and diffusion-based models have emerged as powerful tools for scientific applications, particularly for sampling non-normalized probability distributions, as exemplified by Boltzmann Ge...
Document Understanding, Measurement, and Manipulation Using Category Theory : Abstract: We apply category theory to extract multimodal document structure which leads us to develop information theoretic measures, content summarization and extension, and self-supervised improveme...
Contribution of task-irrelevant stimuli to drift of neural representations : Abstract: Biological and artificial learners are inherently exposed to a stream of data and experience throughout their lifetimes and must constantly adapt to, learn from, or selectively ignore the on...
Fisher meets Feynman: score-based variational inference with a product of experts : Abstract: We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weigh...
Enhancing Tactile-based Reinforcement Learning for Robotic Control : Abstract: Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state ...
Multimodal Datasets with Controllable Mutual Information : Abstract: We introduce a framework for generating highly multimodal datasets with explicitly calculable mutual information between modalities. This enables the construction of benchmark datasets that ...
Visual Diffusion Models are Geometric Solvers : Abstract: In this paper we show that visual diffusion models can serve as effective geometric solvers: they can directly reason about geometric problems by working in pixel space. We first demonstrate...
VENI, VINDy, VICI: a generative reduced-order modeling framework with uncertainty quantification : Abstract: The simulation of many complex phenomena in engineering and science requires solving expensive, high-dimensional systems of partial differential equations (PDEs). To circumvent this, reduced...
Relative Representations: Topological and Geometric Perspectives : Abstract: Relative representations are an established approach to zero-shot model stitching, consisting of a non-trainable transformation of the latent space of a deep neural network. Based on insight...
Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems : Abstract: Ring attractors, mathematical models inspired by neural circuit dynamics, provide a biologically plausible mechanism to improve learning speed and accuracy in Reinforcement Learning (RL). Se...
How Learning Dynamics Drive Adversarially Robust Generalization? : Abstract: Despite significant progress in adversarially robust learning, the underlying mechanisms that govern robust generalization remain poorly understood. We propose a novel PAC-Bayesian framework...
Implementation and Assessment of Machine Learning Models for Forecasting Suspected Opioid Overdoses in Emergency Medical Services Data : Abstract: We present efforts in the fields of machine learning and time series forecasting to accurately predict counts of future suspected opioid overdoses recorded by Emergency Medical Services (EMS...
Adaptive Non-uniform Timestep Sampling for Accelerating Diffusion Model Training : Abstract: As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinato...
Probably Approximately Precision and Recall Learning : Abstract: Precision and Recall are fundamental metrics in machine learning tasks where both accurate predictions and comprehensive coverage are essential, such as in multi-label learning, language gen...
Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs : Abstract: Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting...
Prediction-Powered Causal Inferences : Abstract: In many scientific experiments, the data annotating cost constraints the pace for testing novel hypotheses. Yet, modern machine learning pipelines offer a promising solution, provided their ...
Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing : Abstract: Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between ...
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs : Abstract: LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous wo...
Robust time series generation via Schr\"odinger Bridge: a comprehensive evaluation : Abstract: We investigate the generative capabilities of the Schr\"odinger Bridge (SB) approach for time series. The SB framework formulates time series synthesis as an entropic optimal interpolation t...
Continuous Simplicial Neural Networks : Abstract: Simplicial complexes provide a powerful framework for modeling higher-order interactions in structured data, making them particularly suitable for applications such as trajectory prediction ...
DeCaFlow: A deconfounding causal generative model : Abstract: We introduce DeCaFlow, a deconfounding causal generative model. Training once per dataset using just observational data and the underlying causal graph, DeCaFlow enables accurate causal infe...
Borsuk-Ulam and Replicable Learning of Large-Margin Halfspaces : Abstract: We prove that the list replicability number of $d$-dimensional $\gamma$-margin half-spaces satisfies \[ \frac{d}{2}+1 \le \mathrm{LR}(H^d_\gamma) \le d, \] which grows with dimension. This r...
Planning and Learning in Average Risk-aware MDPs : Abstract: For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-ne...
A QUBO Framework for Team Formation : Abstract: The team formation problem assumes a set of experts and a task, where each expert has a set of skills and the task requires some skills. The objective is to find a set of experts that maximi...
Federated Unlearning Made Practical: Seamless Integration via Negated Pseudo-Gradients : Abstract: The right to be forgotten is a fundamental principle of privacy-preserving regulations and extends to Machine Learning (ML) paradigms such as Federated Learning (FL). While FL enhances priva...
A discrete physics-informed training for projection-based reduced order models with neural networks : Abstract: This paper presents a physics-informed training framework for projection-based Reduced Order Models (ROMs). We extend the PROM-ANN architecture by complementing snapshot-based training with ...
Large Language Bayes : Abstract: Many domain experts do not have the time or expertise to write formal Bayesian models. This paper takes an informal problem description as input, and combines a large language model and a pr...
Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness : Abstract: We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in...
Adaptive Latent-Space Constraints in Personalized Federated Learning : Abstract: Federated learning (FL) is an effective and widely used approach to training deep learning models on decentralized datasets held by distinct clients. FL also strengthens both security and pr...
SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures : Abstract: We study gradient flows for loss landscapes of fully connected feedforward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperboli...
ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data : Abstract: Clustering tabular data remains a significant open challenge in data analysis and machine learning. Unlike for image data, similarity between tabular records often varies across datasets, ma...
Prior-Guided Diffusion Planning for Offline Reinforcement Learning : Abstract: Diffusion models have recently gained prominence in offline reinforcement learning due to their ability to effectively learn high-performing, generalizable policies from static datasets. Dif...
Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning : Abstract: With the rapid discovery of emergent phenomena in deep learning and large language models, understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force the...
Incremental Sequence Classification with Temporal Consistency : Abstract: We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed. Drawing on temporal-difference learning from reinfo...
Multivariate Latent Recalibration for Conditional Normalizing Flows : Abstract: Reliably characterizing the full conditional distribution of a multivariate response variable given a set of covariates is crucial for trustworthy decision-making. However, misspecified or m...
Stochastic Forward-Forward Learning through Representational Dimensionality Compression : Abstract: The Forward-Forward (FF) learning algorithm provides a bottom-up alternative to backpropagation (BP) for training neural networks, relying on a layer-wise "goodness" function with well-desig...
Shape it Up! Restoring LLM Safety during Finetuning : Abstract: Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks: even a few harmful examples can compromise safety alignment. A common mitiga...
Graph Data Selection for Domain Adaptation: A Model-Free Approach : Abstract: Graph domain adaptation (GDA) is a fundamental task in graph machine learning, with techniques like shift-robust graph neural networks (GNNs) and specialized training procedures to tackle th...
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models : Abstract: This work introduces Structured Linear Controlled Differential Equations (SLiCEs), a unifying framework for sequence models with structured, input-dependent state-transition matrices that re...
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry : Abstract: Generating realistic brain connectivity matrices is key to analyzing population heterogeneity in brain organization, understanding disease, and augmenting data in challenging classification ...
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality : Abstract: We study the Pandora's Box problem in an online learning setting with semi-bandit feedback. In each round, the learner sequentially pays to open up to $n$ boxes with unknown reward distribut...
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians : Abstract: The theoretical understanding of self-attention (SA) has been steadily progressing. A prominent line of work studies a class of SA layers that admit an energy function decreased by state upd...
Revisiting Bi-Linear State Transitions in Recurrent Neural Networks : Abstract: The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explor...
Optimal kernel regression bounds under energy-bounded noise : Abstract: Non-conservative uncertainty bounds are key for both assessing an estimation algorithm's accuracy and in view of downstream tasks, such as its deployment in safety-critical contexts. In this...
Preference Learning with Response Time: Robust Losses and Guarantees : Abstract: This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become f...
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model : Abstract: Unlocking deep and interpretable biological reasoning from complex genomic data remains a major AI challenge limiting scientific progress. While current DNA foundation models excel at repres...
On Transferring Transferability: Towards a Theory for Size Generalization : Abstract: Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs,...
The Rich and the Simple: On the Implicit Bias of Adam and SGD : Abstract: Adam is the de facto optimization algorithm for several deep learning applications, but an understanding of its implicit bias and how it differs from other algorithms, particularly standard ...
FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees : Abstract: Efficiently solving constrained optimization problems is crucial for numerous real-world applications, yet traditional solvers are often computationally prohibitive for real-time use. Machin...
When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses : Abstract: We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted...
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search : Abstract: The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that wh...
Learning normalized image densities via dual score matching : Abstract: Learning probability models from data is at the heart of many machine learning endeavors, but is notoriously difficult due to the curse of dimensionality. We introduce a new framework for le...
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks : Abstract: What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of ...
A Stable Whitening Optimizer for Efficient Neural Network Training : Abstract: In this work, we take an experimentally grounded look at neural network optimization. Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in t...
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks : Abstract: ChebNet, one of the earliest spectral GNNs, has largely been overshadowed by Message Passing Neural Networks (MPNNs), which gained popularity for their simplicity and effectiveness in captur...
A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction : Abstract: Human activity intensity prediction is crucial to many location-based services. Despite tremendous progress in modeling dynamics of human activity, most existing methods overlook physical co...
How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension : Abstract: We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to ...
SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes : Abstract: Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making...
Risk-Averse Total-Reward Reinforcement Learning : Abstract: Risk-averse total-reward Markov Decision Processes (MDPs) offer a promising framework for modeling and solving undiscounted infinite-horizon objectives. Existing model-based algorithms for r...
Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence : Abstract: Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel p...
Scaling can lead to compositional generalization : Abstract: Can neural networks systematically capture discrete, compositional task structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks sug...
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems : Abstract: The Koopman operator provides a principled framework for analyzing nonlinear dynamical systems through linear operator theory. Recent advances in dynamic mode decomposition (DMD) have shown ...
Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data : Abstract: Conformal prediction is a distribution-free uncertainty quantification method that has gained popularity in the machine learning community due to its finite-sample guarantees and ease of use...
RockNet: Distributed Learning on Ultra-Low-Power Devices : Abstract: As Machine Learning (ML) becomes integral to Cyber-Physical Systems (CPS), there is growing interest in shifting training from traditional cloud-based to on-device processing (TinyML), for e...
Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk : Abstract: We study the optimal trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit model. We fully characterize the interplay among three desired p...
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices : Abstract: We consider multilevel low rank (MLR) matrices, defined as a row and column permutation of a sum of matrices, each one a block diagonal refinement of the previous one, with all blocks low ra...
FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees : Abstract: Finetuning large language models (LLMs) is essential for task adaptation, yet today's serving stacks isolate inference and finetuning on separate GPU clusters -- wasting resources and under-...
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes : Abstract: We study the error introduced by entropy regularization in infinite-horizon, discrete, discounted Markov decision processes. We show that this error decreases exponentially in the inverse re...
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models : Abstract: Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs). However, frequent failures still pose significant challe...
Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion : Abstract: In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI...
TopoFR: A Closer Look at Topology Alignment on Face Recognition : Abstract: The field of face recognition (FR) has undergone significant advancements with the rise of deep learning. Recently, the success of unsupervised learning and graph neural networks has demonst...
Point Cloud Synthesis Using Inner Product Transforms : Abstract: Point cloud synthesis, i.e. the generation of novel point clouds from an input distribution, remains a challenging task, for which numerous complex machine learning models have been devised....
Overcomplete Tensor Decomposition via Koszul-Young Flattenings : Abstract: Motivated by connections between algebraic complexity lower bounds and tensor decompositions, we investigate Koszul-Young flattenings, which are the main ingredient in recent lower bounds fo...
Mixture of Experts in Image Classification: What's the Sweet Spot? : Abstract: Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across domains. However, their application to image classification remains limited, often requi...
The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models : Abstract: Recent advances in multimodal training have significantly improved the integration of image understanding and generation within a unified model. This study investigates how vision-language m...
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching : Abstract: Autoregressive (AR) models have achieved state-of-the-art performance in text and image generation but suffer from slow generation due to the token-by-token process. We ask an ambitious ques...
An Efficient Orlicz-Sobolev Approach for Transporting Unbalanced Measures on a Graph : Abstract: We investigate optimal transport (OT) for measures on graph metric spaces with different total masses. To mitigate the limitations of traditional $L^p$ geometry, Orlicz-Wasserstein (OW) and ...
Scaling Embedding Layers in Language Models : Abstract: We propose $SCONE$ ($S$calable, $C$ontextualized, $O$ffloaded, $N$-gram $E$mbedding), a new method for extending input embedding layers to enhance language model performance. To avoid increa...
Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association : Abstract: Estimating associations between spatial covariates and responses - rather than merely predicting responses - is central to environmental science, epidemiology, and economics. For instance, p...
Spectral Analysis of Representational Similarity with Limited Neurons : Abstract: Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints...
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing : Abstract: We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample qua...
Inference for Deep Neural Network Estimators in Generalized Nonparametric Models : Abstract: While deep neural networks (DNNs) are used for prediction, inference on DNN-estimated subject-specific means for categorical or exponential family outcomes remains underexplored. We address ...
Cost Minimization for Space-Air-Ground Integrated Multi-Access Edge Computing Systems : Abstract: Space-air-ground integrated multi-access edge computing (SAGIN-MEC) provides a promising solution for the rapidly developing low-altitude economy (LAE) to deliver flexible and wide-area comp...
Interpretable Multimodal Zero-Shot ECG Diagnosis via Structured Clinical Knowledge Alignment : Abstract: Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditio...
Leveraging Classical Algorithms for Graph Neural Networks : Abstract: Neural networks excel at processing unstructured data but often fail to generalise out-of-distribution, whereas classical algorithms guarantee correctness but lack flexibility. We explore wh...
An unsupervised tour through the hidden pathways of deep neural networks : Abstract: The goal of this thesis is to improve our understanding of the internal mechanisms by which deep artificial neural networks create meaningful representations and are able to generalize. We f...
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects : Abstract: Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged du...
Accelerating Data Generation for Nonlinear temporal PDEs via homologous perturbation in solution space : Abstract: Data-driven deep learning methods like neural operators have advanced in solving nonlinear temporal partial differential equations (PDEs). However, these methods require large quantities of ...
SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism : Abstract: Although Shapley additive explanations (SHAP) can be computed in polynomial time for simple models like decision trees, they unfortunately become NP-hard to compute for more expressive black...
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds : Abstract: Geometric data and purpose-built generative models on them have become ubiquitous in high-impact deep learning application domains, ranging from protein backbone generation and computational...
Optimal Graph Clustering without Edge Density Signals : Abstract: This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM), addressing limitations of existing models. In contrast to the Stochastic B...
On Uncertainty Calibration for Equivariant Functions : Abstract: Data-sparse settings such as robotic manipulation, molecular physics, and galaxy morphology classification are some of the hardest domains for deep learning. For these problems, equivariant ...
Mechanistic Interpretability for Neural TSP Solvers : Abstract: Neural networks have advanced combinatorial optimization, with Transformer-based solvers achieving near-optimal solutions on the Traveling Salesman Problem (TSP) in milliseconds. However, th...
Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions : Abstract: We propose Equivariance by Contrast (EbC) to learn equivariant embeddings from observation pairs $(\mathbf{y}, g \cdot \mathbf{y})$, where $g$ is drawn from a finite group acting on the data...
Triangle Multiplication Is All You Need For Biomolecular Structure Representations : Abstract: AlphaFold has transformed protein structure prediction, but emerging applications such as virtual ligand screening, proteome-wide folding, and de novo binder design demand predictions at a m...
A Multiscale Approach for Enhancing Weak Signal Detection : Abstract: Stochastic resonance (SR), a phenomenon originally introduced in climate modeling, enhances signal detection by leveraging optimal noise levels within non-linear systems. Traditional SR tech...
BACE: Behavior-Adaptive Connectivity Estimation for Interpretable Graphs of Neural Dynamics : Abstract: Understanding how distributed brain regions coordinate to produce behavior requires models that are both predictive and interpretable. We introduce Behavior-Adaptive Connectivity Estimation ...
Data-Centric Lessons To Improve Speech-Language Pretraining : Abstract: Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with ...
Exponential Convergence Guarantees for Iterative Markovian Fitting : Abstract: The Schr\"odinger Bridge (SB) problem has become a fundamental tool in computational optimal transport and generative modeling. To address this problem, ideal methods such as Iterative Propo...
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization : Abstract: Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-ma...
ROPES: Robotic Pose Estimation via Score-Based Causal Representation Learning : Abstract: Causal representation learning (CRL) has emerged as a powerful unsupervised framework that (i) disentangles the latent generative factors underlying high-dimensional data, and (ii) learns th...
Information Theoretic Learning for Diffusion Models with Warm Start : Abstract: Generative models that maximize model likelihood have gained traction in many practical settings. Among them, perturbation based approaches underpin many strong likelihood estimation models,...
A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate : Abstract: Graphons, as limits of graph sequences, provide a framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons yields operato...
NeuroPilot: A Realtime Brain-Computer Interface system to enhance concentration of students in online learning : Abstract: Prevalence of online learning poses a vital challenge in real-time monitoring of students' concentration. Traditional methods such as questionnaire assessments require manual interventions a...
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing : Abstract: Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying. Despite numerous effort...
Robust Point Cloud Reinforcement Learning via PCA-Based Canonicalization : Abstract: Reinforcement Learning (RL) from raw visual input has achieved impressive successes in recent years, yet it remains fragile to out-of-distribution variations such as changes in lighting, col...
Can Current Detectors Catch Face-to-Voice Deepfake Attacks? : Abstract: The rapid advancement of generative models has enabled the creation of increasingly stealthy synthetic voices, commonly referred to as audio deepfakes. A recent technique, FOICE [USENIX'24],...
Graph Neural Regularizers for PDE Inverse Problems : Abstract: We present a framework for solving a broad class of ill-posed inverse problems governed by partial differential equations (PDEs), where the target coefficients of the forward operator are re...
Iso-Riemannian Optimization on Learned Data Manifolds : Abstract: High-dimensional data that exhibit an intrinsic low-dimensional structure are ubiquitous in machine learning and data science. While various approaches allow for learning the corresponding d...
Efficient Meningioma Tumor Segmentation Using Ensemble Learning : Abstract: Meningiomas represent the most prevalent form of primary brain tumors, comprising nearly one-third of all diagnosed cases. Accurate delineation of these tumors from MRI scans is crucial for ...
xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads : Abstract: The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is...
Soft Instruction De-escalation Defense : Abstract: Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment; this makes them susceptible to prompt injections when dealing with untru...
Doubly-Regressing Approach for Subgroup Fairness : Abstract: Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e...
A Unified Approach to Submodular Maximization Under Noise : Abstract: We consider the problem of maximizing a submodular function with access to a noisy value oracle for the function instead of an exact value oracle. Similar to prior work, we assume that the n...
TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests : Abstract: Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of mega...
Instance-Adaptive Hypothesis Tests with Heterogeneous Agents : Abstract: We study hypothesis testing over a heterogeneous population of strategic agents with private information. Any single test applied uniformly across the population yields statistical error tha...
Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization : Abstract: Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibrati...
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set : Abstract: The alignment of vision-language representations endows current Vision-Language Models (VLMs) with strong multi-modal reasoning capabilities. However, the interpretability of the alignment c...
Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning : Abstract: The reasoning capabilities of Large Language Models (LLMs) are typically developed through the single-turn reinforcement learning, whereas real-world applications often involve multi-turn in...
BADiff: Bandwidth Adaptive Diffusion Model : Abstract: In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce...
Efficient Exploration of Chemical Kinetics : Abstract: Estimating reaction rates and chemical stability is fundamental, yet efficient methods for large-scale simulations remain out of reach despite advances in modeling and exascale computing. Di...
On Local Limits of Sparse Random Graphs: Color Convergence and the Refined Configuration Model : Abstract: Local convergence has emerged as a fundamental tool for analyzing sparse random graph models. We introduce a new notion of local convergence, color convergence, based on the Weisfeiler-Leman...
Oracle-Efficient Combinatorial Semi-Bandits : Abstract: We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has ...
Scalable Neural Incentive Design with Parameterized Mean-Field Approximation : Abstract: Designing incentives for a multi-agent system to induce a desirable Nash equilibrium is both a crucial and challenging problem appearing in many decision-making domains, especially for a lar...
SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots : Abstract: Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility....
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk : Abstract: Large language model (LLM) benchmarks inform LLM use decisions (e.g., "is this LLM safe to deploy for my use case and context?"). However, benchmarks may be rendered unreliable by various fa...
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds : Abstract: This work addresses the finite-time analysis of nonsmooth nonconvex stochastic optimization under Riemannian manifold constraints. We adapt the notion of Goldstein stationarity to the Rieman...
Misspellings in Natural Language Processing: A survey : Abstract: This survey provides an overview of the challenges of misspellings in natural language processing (NLP). While often unintentional, misspellings have become ubiquitous in digital communicati...
Robust LLM Alignment via Distributionally Robust Direct Preference Optimization : Abstract: A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming th...
Electronic Circuit Principles of Large Language Models : Abstract: Large language models (LLMs) such as DeepSeek-R1 have achieved remarkable performance across diverse reasoning tasks. To uncover the principles that govern their behaviour, we introduce the ...
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol : Abstract: Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods t...
GoRA: Gradient-driven Adaptive Low Rank Adaptation : Abstract: Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initi...
UniTok: A Unified Tokenizer for Visual Generation and Understanding : Abstract: Visual generative and understanding models typically rely on distinct tokenizers to process images, presenting a key challenge for unifying them within a single framework. Recent studies att...
L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling : Abstract: We present a universal theoretical framework for understanding long-context language modeling based on a bipartite mutual information scaling law that we rigorously verify in natural languag...
Operational Change Detection for Geographical Information: Overview and Challenges : Abstract: Rapid evolution of territories due to climate change and human impact requires prompt and effective updates to geospatial databases maintained by the National Mapping Agency. This paper pres...
Reinforcement Learning for Reasoning in Large Language Models with One Training Example : Abstract: We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LL...
A4L: An Architecture for AI-Augmented Learning : Abstract: AI promises personalized learning and scalable education. As AI agents increasingly permeate education in support of teaching and learning, there is a critical and urgent need for data archi...
Fr\'{e}chet Power-Scenario Distance: A Metric for Evaluating Generative AI Models across Multiple Time-Scales in Smart Grids : Abstract: Generative artificial intelligence (AI) models in smart grids have advanced significantly in recent years due to their ability to generate large amounts of synthetic data, which would otherw...
BLEUBERI: BLEU is a surprisingly effective reward for instruction following : Abstract: Reward models are central to aligning LLMs with human preferences, but they are costly to train, requiring large-scale human-labeled preference data and powerful pretrained LLM backbones. Me...
Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization : Abstract: Imagine hearing a dog bark and turning toward the sound only to see a parked car, while the real, silent dog sits elsewhere. Such sensory conflicts test perception, yet humans reliably resol...
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages : Abstract: Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release rais...
BioCube: A Multimodal Dataset for Biodiversity Research : Abstract: Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in e...
CLT and Edgeworth Expansion for m-out-of-n Bootstrap Estimators of The Studentized Median : Abstract: The m-out-of-n bootstrap, originally proposed by Bickel, Gotze, and Zwet (1992), approximates the distribution of a statistic by repeatedly drawing m subsamples (with m much smaller than n) ...
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics : Abstract: Complex, temporally evolving phenomena, from climate to brain activity, are governed by dynamical systems (DS). DS reconstruction (DSR) seeks to infer generative surrogate models of these fr...
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency : Abstract: We study Transformers through the perspective of optimal control theory, using tools from continuous-time formulations to derive actionable insights into training and architecture design. Th...
Let LLMs Break Free from Overthinking via Self-Braking Tuning : Abstract: Large reasoning models (LRMs), such as OpenAI o1 and DeepSeek-R1, have significantly enhanced their reasoning capabilities by generating longer chains of thought, demonstrating outstanding p...
LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought : Abstract: Sample-wise learning curves plot performance versus training set size. They are useful for studying scaling laws and speeding up hyperparameter tuning and model selection. Learning curves ar...
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning : Abstract: Chain-of-thought reasoning has significantly improved the performance of Large Language Models (LLMs) across various domains. However, this reasoning process has been confined exclusively to...
Equivariant Eikonal Neural Networks: Grid-Free, Scalable Travel-Time Prediction on Homogeneous Spaces : Abstract: We introduce Equivariant Neural Eikonal Solvers, a novel framework that integrates Equivariant Neural Fields (ENFs) with Neural Eikonal Solvers. Our approach employs a single neural field wh...
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities as intelligent agents capable of solving complex problems. However, effective planning in scenarios involving dependenc...
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs : Abstract: Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require e...
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning : Abstract: Scaling deep reinforcement learning in pixel-based environments presents a significant challenge, often resulting in diminished performance. While recent works have proposed algorithmic and ...
Scalable Valuation of Human Feedback through Provably Robust Model Alignment : Abstract: Despite the importance of aligning language models with human preferences, crowd-sourced human feedback is often noisy -- for example, preferring less desirable responses -- posing a fundame...
Knot So Simple: A Minimalistic Environment for Spatial Reasoning : Abstract: We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all ...
AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking : Abstract: Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, re...
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers : Abstract: Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the n...
Two Causally Related Needles in a Video Haystack : Abstract: Properly evaluating the ability of Video-Language Models (VLMs) to understand long videos remains a challenge. We propose a long-context video understanding benchmark, Causal2Needles, that a...
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees : Abstract: Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical ...
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective : Abstract: World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately ...
Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs : Abstract: Large vision-language models (LVLMs) are increasingly deployed in interactive applications such as virtual and augmented reality, where a first-person (egocentric) view captured by head-moun...
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning : Abstract: Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and mitigate hallucination. However, dense retrievers ofte...
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions : Abstract: Pretrained Large Language Models (LLMs) achieve strong performance across a wide range of tasks, yet exhibit substantial variability in the various layers' training quality with respect to s...
Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics : Abstract: Autonomy is a hallmark of animal intelligence, enabling adaptive and intelligent behavior in complex environments without relying on external reward or task structure. Existing reinforcement...
Principled Data Augmentation for Learning to Solve Quadratic Programming Problems : Abstract: Linear and quadratic optimization are crucial in numerous real-world applications, ranging from training machine learning models to solving integer linear programs. Recently, learning-to-opt...
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection : Abstract: Early detection of cognitive disorders such as Alzheimer's disease is critical for enabling timely clinical intervention and improving patient outcomes. In this work, we introduce CogniAlign...
FuXi-Ocean: A Global Ocean Forecasting System with Sub-Daily Resolution : Abstract: Accurate, high-resolution ocean forecasting is crucial for maritime operations and environmental monitoring. While traditional numerical models are capable of producing sub-daily, eddy-resol...
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning : Abstract: Recent advances in Chain-of-Thought (CoT) reasoning have improved complex video understanding, but existing methods often struggle to adapt to domain-specific skills (e.g., event detection, ...
Rectified Point Flow: Generic Point Cloud Pose Estimation : Abstract: We introduce Rectified Point Flow, a unified parameterization that formulates pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. Give...
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay : Abstract: Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs), particularly to enhance their reasoning capabilities. However, RL fine-tuning remai...
Distillation Robustifies Unlearning : Abstract: Current LLM unlearning methods are not robust. A few steps of finetuning can revert their effects. We begin by showing that this is true even for an idealized form of unlearning: training to...
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning : Abstract: The rapid emergence of diverse large language models (LLMs) has spurred the development of LLM routers that assign user queries to the most suitable model. However, existing LLM routers typi...
Causal Climate Emulation with Bayesian Filtering : Abstract: Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, ...
ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition : Abstract: Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, current strategies often rely on external foundation models or datasets, whose usa...
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning : Abstract: We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open-source imple...
Grids Often Outperform Implicit Neural Representation at Compressing Dense Signals : Abstract: Implicit Neural Representations (INRs) have recently shown impressive results, but their fundamental capacity, implicit biases, and scaling behavior remain poorly understood. We investigate ...
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents : Abstract: Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined...
PLD: A Choice-Theoretic List-Wise Knowledge Distillation : Abstract: Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based kno...
What Do Latent Action Models Actually Learn? : Abstract: Latent action models (LAMs) aim to learn action-relevant changes from unlabeled videos by compressing changes between frames as latents. However, differences between video frames can be caus...
System-Embedded Diffusion Bridge Models : Abstract: Solving inverse problems -- recovering signals from incomplete or noisy measurements -- is fundamental in science and engineering. Score-based generative models (SGMs) have recently emerged ...
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning : Abstract: Despite advances in reinforcement learning (RL)-based video reasoning with large language models (LLMs), data collection and fine-tuning remain significant challenges. These methods often re...
Reinforcement Learning with Action Chunking : Abstract: We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-onl...
Modeling the Economic Impacts of AI Openness Regulation : Abstract: Regulatory frameworks, such as the EU AI Act, encourage openness of general-purpose AI models by offering legal exemptions for "open-source" models. Despite this legislative attention on ope...
Retention analysis of edited knowledge after fine-tuning : Abstract: Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model e...
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution : Abstract: Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic u...
MOBO-OSD: Batch Multi-Objective Bayesian Optimization via Orthogonal Search Directions : Abstract: Bayesian Optimization (BO) is a powerful tool for optimizing expensive black-box objective functions. While extensive research has been conducted on the single-objective optimization problem...
Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control : Abstract: Stochastic gradient descent (SGD) and its variants enable modern artificial intelligence. However, theoretical understanding lags far behind their empirical success. It is widely believed th...
Learning from Interval Targets : Abstract: We study the problem of regression with interval targets, where only upper and lower bounds on target values are available in the form of intervals. This problem arises when the exact target...
LLM-Integrated Bayesian State Space Models for Multimodal Time-Series Forecasting : Abstract: Forecasting in the real world requires integrating structured time-series data with unstructured textual information, but existing methods are architecturally limited by fixed input/output h...
Safety Assessment in Reinforcement Learning via Model Predictive Control : Abstract: Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rel...
An Ensembled Penalized Federated Learning Framework for Falling People Detection : Abstract: Falls among elderly and disabled individuals remain a leading cause of injury and mortality worldwide, necessitating robust, accurate, and privacy-aware fall detection systems. Traditional f...
Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection : Abstract: Accurate detection of errors in large language models (LLM) responses is central to the success of scalable oversight, or providing effective supervision to superhuman intelligence. Yet, sel...
Neural Mutual Information Estimation with Vector Copulas : Abstract: Estimating mutual information (MI) is a fundamental task in data science and machine learning. Existing estimators mainly rely on either highly flexible models (e.g., neural networks), which...
On the accuracy of implicit neural representations for cardiovascular anatomies and hemodynamic fields : Abstract: Implicit neural representations (INRs, also known as neural fields) have recently emerged as a powerful framework for knowledge representation, synthesis, and compression. By encoding fields...
L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks : Abstract: Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, b...
AL-CoLe: Augmented Lagrangian for Constrained Learning : Abstract: Despite the non-convexity of most modern machine learning parameterizations, Lagrangian duality has become a popular tool for addressing constrained learning problems. We revisit Augmented L...
Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation : Abstract: Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the larg...
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference : Abstract: Representation learning is increasingly applied to generate representations that generalize well across multiple downstream tasks. Ensuring fairness guarantees in representation learning is ...
More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning : Abstract: Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods, particularly in settings where gradient computation is expensive or even im...
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD : Abstract: To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear f...
CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena : Abstract: Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential ...
Elementary, My Dear Watson: Non-Invasive Neural Keyword Spotting in the LibriBrain Dataset : Abstract: Non-invasive brain-computer interfaces (BCIs) are beginning to benefit from large, public benchmarks. However, current benchmarks target relatively simple, foundational tasks like Speech Det...
Amortized Active Generation of Pareto Sets : Abstract: We introduce active generation of Pareto sets (A-GPS), a new framework for online discrete black-box multi-objective optimization (MOO). A-GPS learns a generative model of the Pareto set tha...
Online Multi-Class Selection with Group Fairness Guarantee : Abstract: We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents. Our work addresses two key limita...
Scalable Machine Learning Analysis of Parker Solar Probe Solar Wind Data : Abstract: We present a scalable machine learning framework for analyzing Parker Solar Probe (PSP) solar wind data using distributed processing and the quantum-inspired Kernel Density Matrices (KDM) me...
The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning : Abstract: Reasoning models represent a significant advance in LLM capabilities, particularly for complex reasoning tasks such as mathematics and coding. Previous studies confirm that parallel test-tim...
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data : Abstract: Among many mysteries behind the success of deep networks lies the exceptional discriminative power of their learned representations as manifested by the intriguing Neural Collapse (NC) pheno...
Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution : Abstract: Deploying deep neural networks on mobile devices is increasingly important but remains challenging due to limited computing resources. On the other hand, their unified memory architecture an...
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients : Abstract: Federated Learning (FL) enables collaborative model training across institutions without sharing raw data. However, gradient sharing still risks privacy leakage, such as gradient inversion a...
Distributionally Robust Feature Selection : Abstract: We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications i...
SolarBoost: Distributed Photovoltaic Power Forecasting Amid Time-varying Grid Capacity : Abstract: This paper presents SolarBoost, a novel approach for forecasting power output in distributed photovoltaic (DPV) systems. While existing centralized photovoltaic (CPV) methods are able to pre...
Cloud-Fog-Edge Collaborative Computing for Sequential MIoT Workflow: A Two-Tier DDPG-Based Scheduling Framework : Abstract: The Medical Internet of Things (MIoT) demands stringent end-to-end latency guarantees for sequential healthcare workflows deployed over heterogeneous cloud-fog-edge infrastructures. Scheduli...
A Unified Matrix Factorization Framework for Classical and Robust Clustering : Abstract: This paper presents a unified matrix factorization framework for classical and robust clustering. We begin by revisiting the well-known equivalence between crisp k-means clustering and matri...
A visual big data system for the prediction of weather-related variables: Jordan-Spain case study : Abstract: The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particul...
Scalable Principal-Agent Contract Design via Gradient-Based Optimization : Abstract: We study a bilevel \emph{max-max} optimization framework for principal-agent contract design, in which a principal chooses incentives to maximize utility while anticipating the agent's best ...
Gen-Review: A Large-scale Dataset of AI-Generated (and Human-written) Peer Reviews : Abstract: How does the progressive embracement of Large Language Models (LLMs) affect scientific peer reviewing? This multifaceted question is fundamental to the effectiveness -- as well as to the int...
Online AUC Optimization Based on Second-order Surrogate Loss : Abstract: The Area Under the Curve (AUC) is an important performance metric for classification tasks, particularly in class-imbalanced scenarios. However, minimizing the AUC presents significant chall...
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models : Abstract: Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing a...
Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization : Abstract: Graph Neural Networks (GNNs) face a fundamental adaptability challenge: their fixed message-passing architectures struggle with the immense diversity of real-world graphs, where optimal comp...
On the flow matching interpretability : Abstract: Generative models based on flow matching have demonstrated remarkable success in various domains, yet they suffer from a fundamental limitation: the lack of interpretability in their interme...
Model Merging with Functional Dual Anchors : Abstract: Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter spa...
How Hard is it to Confuse a World Model? : Abstract: In reinforcement learning (RL) theory, the concept of most confusing instances is central to establishing regret lower bounds, that is, the minimal exploration needed to solve a problem. Giv...
Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime : Abstract: Continuous-time models provide important insights into the training dynamics of optimization algorithms in deep learning. In this work, we establish a non-asymptotic convergence analysis of ...
Unified Implementations of Recurrent Neural Networks in Multiple Deep Learning Frameworks : Abstract: Recurrent neural networks (RNNs) are a cornerstone of sequence modeling across various scientific and industrial applications. Owing to their versatility, numerous RNN variants have been pro...
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling : Abstract: Recent advances in Scientific Machine Learning have shown that second-order methods can enhance the training of Physics-Informed Neural Networks (PINNs), making them a suitable alternative t...
Relieving the Over-Aggregating Effect in Graph Transformers : Abstract: Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, ...
Buffer layers for Test-Time Adaptation : Abstract: In recent advancements in Test Time Adaptation (TTA), most existing methodologies focus on updating normalization layers to adapt to the test domain. However, the reliance on normalization-b...
Sensor-Specific Transformer (PatchTST) Ensembles with Test-Matched Augmentation : Abstract: We present a noise-aware, sensor-specific ensemble approach for robust human activity recognition on the 2nd WEAR Dataset Challenge. Our method leverages the PatchTST transformer architectur...
Adaptive Data Selection for Multi-Layer Perceptron Training: A Sub-linear Value-Driven Method : Abstract: Data selection is one of the fundamental problems in neural network training, particularly for multi-layer perceptrons (MLPs) where identifying the most valuable training samples from massiv...
Additive Models Explained: A Computational Complexity Approach : Abstract: Generalized Additive Models (GAMs) are commonly considered *interpretable* within the ML community, as their structure makes the relationship between inputs and outputs relatively understand...
An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination : Abstract: Unsupervised anomaly detection (AD) methods typically assume clean training data, yet real-world datasets often contain undetected or mislabeled anomalies, leading to significant performance...
Amortized Variational Inference for Partial-Label Learning: A Probabilistic Approach to Label Disambiguation : Abstract: Real-world data is frequently noisy and ambiguous. In crowdsourcing, for example, human annotators may assign conflicting class labels to the same instances. Partial-label learning (PLL) add...
Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity : Abstract: Multiplicity -- the existence of distinct models with comparable performance -- has received growing attention in recent years. While prior work has largely emphasized modelling choices, the...
Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need : Abstract: Regret in stochastic multi-armed bandits traditionally measures the difference between the highest reward and either the arithmetic mean of accumulated rewards or the final reward. These con...
Leverage Unlearning to Sanitize LLMs : Abstract: Pre-trained large language models (LLMs) are becoming useful for various tasks. To improve their performance on certain tasks, it is necessary to fine-tune them on specific data corpora (e.g...
SCORENF: Score-based Normalizing Flows for Sampling Unnormalized distributions : Abstract: Unnormalized probability distributions are central to modeling complex physical systems across various scientific domains. Traditional sampling methods, such as Markov Chain Monte Carlo (MCM...
Robust Yield Curve Estimation for Mortgage Bonds Using Neural Networks : Abstract: Robust yield curve estimation is crucial in fixed-income markets for accurate instrument pricing, effective risk management, and informed trading strategies. Traditional approaches, includin...
Compositional Monte Carlo Tree Diffusion for Extendable Planning : Abstract: Monte Carlo Tree Diffusion (MCTD) integrates diffusion models with structured tree search to enable effective trajectory exploration through stepwise reasoning. However, MCTD remains fundame...
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models : Abstract: Text-to-image diffusion models, such as Stable Diffusion, have demonstrated remarkable capabilities in generating high-quality and diverse images from natural language prompts. However, rece...
Randomized Neural Network with Adaptive Forward Regularization for Online Task-free Class Incremental Learning : Abstract: Class incremental learning (CIL) requires an agent to learn distinct tasks consecutively with knowledge retention against forgetting. Problems impeding the practical applications of CIL meth...
Cost-Sensitive Freeze-thaw Bayesian Optimization for Efficient Hyperparameter Tuning : Abstract: In this paper, we address the problem of \emph{cost-sensitive} hyperparameter optimization (HPO) built upon freeze-thaw Bayesian optimization (BO). Specifically, we assume a scenario where u...
Disentangled Representation Learning via Modular Compositional Bias : Abstract: Recent disentangled representation learning (DRL) methods heavily rely on factor specific strategies-either learning objectives for attributes or model architectures for objects-to embed ind...
Self-diffusion for Solving Inverse Problems : Abstract: We propose self-diffusion, a novel framework for solving inverse problems without relying on pretrained generative models. Traditional diffusion-based approaches require training a model on ...
A Rapid Physics-Informed Machine Learning Framework Based on Extreme Learning Machine for Inverse Stefan Problems : Abstract: The inverse Stefan problem, as a typical phase-change problem with moving boundaries, finds extensive applications in science and engineering. Recent years have seen the applications of phys...
Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems : Abstract: Large-scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement-learning agents with both scale and environment shifts. To address these challenges, we pro...
Unified token representations for sequential decision models : Abstract: Transformers have demonstrated strong potential in offline reinforcement learning (RL) by modeling trajectories as sequences of return-to-go, states, and actions. However, existing approache...
ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models : Abstract: Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. T...
Towards Explainable Personalized Recommendations by Learning from Users' Photos : Abstract: Explaining the output of a complex system, such as a Recommender System (RS), is becoming of utmost importance for both users and companies. In this paper we explore the idea that personaliz...
Estimating Treatment Effects in Networks using Domain Adversarial Training : Abstract: Estimating heterogeneous treatment effects in network settings is complicated by interference, meaning that the outcome of an instance can be influenced by the treatment status of others. Ex...
Parameter-Free Hypergraph Neural Network for Few-Shot Node Classification : Abstract: Few-shot node classification on hypergraphs requires models that generalize from scarce labels while capturing high-order structures. Existing hypergraph neural networks (HNNs) effectively e...
Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting : Abstract: Catastrophic forgetting (CF) poses a persistent challenge in continual learning (CL), especially within federated learning (FL) environments characterized by non-i.i.d. time series data. Whi...
Uniform Convergence Beyond Glivenko-Cantelli : Abstract: We characterize conditions under which collections of distributions on $\{0,1\}^\mathbb{N}$ admit uniform estimation of their mean. Prior work from Vapnik and Chervonenkis (1971) has focused...
Surrogate-based quantification of policy uncertainty in generative flow networks : Abstract: Generative flow networks are able to sample, via sequential construction, high-reward, complex objects according to a reward function. However, such reward functions are often estimated appr...
A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment : Abstract: Post-disaster road assessment (PDRA) is essential for emergency response, enabling rapid evaluation of infrastructure conditions and efficient allocation of resources. Although drones provid...
Probe-based Fine-tuning for Reducing Toxicity : Abstract: Probes trained on model activations can detect undesirable behaviors like deception or biases that are difficult to identify from outputs alone. This makes them useful detectors to identify ...
FrameShield: Adversarially Robust Video Anomaly Detection : Abstract: Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to the inhe...
Excision Score: Evaluating Edits with Surgical Precision : Abstract: Many tasks revolve around editing a document, whether code or text. We formulate the revision similarity problem to unify a wide range of machine learning evaluation problems whose goal is t...
Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression : Abstract: Action Quality Assessment (AQA), which aims at automatic and fair evaluation of athletic performance, has gained increasing attention in recent years. However, athletes are often in rapid mo...
Tensor Product Attention Is All You Need : Abstract: Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we prop...
CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning : Abstract: Constructing comprehensive knowledge graphs requires the use of multiple ontologies in order to fully contextualize data into a domain. Ontology matching finds equivalences between concepts ...
A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection : Abstract: Companies spend large amounts of money on public relations campaigns to project a positive brand image. However, sometimes there is a mismatch between what they say and what they do. Oil & g...
A Knowledge-Graph Translation Layer for Mission-Aware Multi-Agent Path Planning in Spatiotemporal Dynamics : Abstract: The coordination of autonomous agents in dynamic environments is hampered by the semantic gap between high-level mission objectives and low-level planner inputs. To address this, we introduc...
Image and Point-cloud Classification for Jet Analysis in High-Energy Physics: A survey : Abstract: Nowadays, there has been a growing trend in the field of high-energy physics (HEP), in both its experimental and phenomenological studies, to incorporate machine learning (ML) and its specia...
Consciousness, natural and artificial: an evolutionary advantage for reasoning on reactive substrates : Abstract: Precisely defining consciousness and identifying the mechanisms that effect it is a long-standing question, particularly relevant with advances in artificial intelligence. The scientific com...
This EEG Looks Like These EEGs: Interpretable Interictal Epileptiform Discharge Detection With ProtoEEG-kNN : Abstract: The presence of interictal epileptiform discharges (IEDs) in electroencephalogram (EEG) recordings is a critical biomarker of epilepsy. Even trained neurologists find detecting IEDs difficul...
Integrated representational signatures strengthen specificity in brains and models : Abstract: The extent to which different neural or artificial neural networks (models) rely on equivalent representations to support similar tasks remains a central question in neuroscience and machine...
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards : Abstract: The role of reasoning in Audio Large Language Models remains widely underexplored, as introducing a reasoning process often degrades rather than improves performance during inference, a phen...
Crisis-Resilient Portfolio Management via Graph-based Spatio-Temporal Learning : Abstract: Financial time series forecasting faces a fundamental challenge: predicting optimal asset allocations requires understanding regime-dependent correlation structures that transform during cri...
CC-GRMAS: A Multi-Agent Graph Neural System for Spatiotemporal Landslide Risk Assessment in High Mountain Asia : Abstract: Landslides are a growing climate induced hazard with severe environmental and human consequences, particularly in high mountain Asia. Despite increasing access to satellite and temporal data...
Multimodal Negative Learning : Abstract: Multimodal learning systems often encounter challenges related to modality imbalance, where a dominant modality may overshadow others, thereby hindering the learning of weak modalities. Conv...
HA-RAG: Hotness-Aware RAG Acceleration via Mixed Precision and Data Placement : Abstract: Retrieval-Augmented Generation (RAG) improves model output accuracy by leveraging external knowledge bases, serving as an effective solution to address hallucination issues and knowledge-upd...
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People : Abstract: Many high-stakes applications of AI require forming data-driven hypotheses and making targeted guesses; e.g., in scientific and diagnostic settings. Given limited resources, to what extent d...
Preventing Shortcuts in Adapter Training via Providing the Shortcuts : Abstract: Adapter-based training has emerged as a key mechanism for extending the capabilities of powerful foundation image generators, enabling personalized and stylized text-to-image synthesis. Thes...
Video-As-Prompt: Unified Semantic Control for Video Generation : Abstract: Unified, generalizable semantic control in video generation remains a critical open challenge. Existing methods either introduce artifacts by enforcing inappropriate pixel-wise priors from s...
Code-enabled language models can outperform reasoning models on diverse tasks : Abstract: Reasoning models (RMs), language models (LMs) trained with reinforcement learning to produce long-form natural language reasoning, have been remarkably successful, but they still require lar...
Aircraft Collision Avoidance Systems: Technological Challenges and Solutions on the Path to Regulatory Acceptance : Abstract: Aircraft collision avoidance systems is critical to modern aviation. These systems are designed to predict potential collisions between aircraft and recommend appropriate avoidance actions. ...
Security Logs to ATT&CK Insights: Leveraging LLMs for High-Level Threat Understanding and Cognitive Trait Inference : Abstract: Understanding adversarial behavior in cybersecurity has traditionally relied on high-level intelligence reports and manual interpretation of attack chains. However, real-time defense require...
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing : Abstract: This study investigates the vulnerabilities of autonomous navigation and landing systems in Urban Air Mobility (UAM) vehicles. Specifically, it focuses on Trojan attacks that target deep lea...
Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation : Abstract: Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological ...
Do LLMs Truly Understand When a Precedent Is Overruled? : Abstract: Large language models (LLMs) with extended context windows show promise for complex legal reasoning tasks, yet their ability to understand long legal documents remains insufficiently evaluat...
Meta-Learning for Cross-Task Generalization in Protein Mutation Property Prediction : Abstract: Protein mutations can have profound effects on biological function, making accurate prediction of property changes critical for drug discovery, protein engineering, and precision medicine. C...
3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models : Abstract: Current Vision-Language Models (VLMs) struggle to ground anatomical regions in 3D medical images and reason about them in a step-by-step manner, a key requirement of real-world diagnostic as...
REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering : Abstract: Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLM...
Memory Constrained Dynamic Subnetwork Update for Transfer Learning : Abstract: On-device neural network training faces critical memory constraints that limit the adaptation of pre-trained models to downstream tasks. We present MeDyate, a theoretically-grounded framewor...
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can ...
GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer : Abstract: In response to the increasingly critical demand for accurate prediction of GPU memory resources in deep learning tasks, this paper deeply analyzes the current research status and innovativel...
VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models : Abstract: Foundation models have advanced computer vision by enabling strong performance across diverse tasks through large-scale pretraining and supervised fine-tuning. However, they may underperform...
Exploring Spiking Neural Networks for Binary Classification in Multivariate Time Series at the Edge : Abstract: We present a general framework for training spiking neural networks (SNNs) to perform binary classification on multivariate time series, with a focus on step-wise prediction and high precisi...
Race and Gender in LLM-Generated Personas: A Large-Scale Audit of 41 Occupations : Abstract: Generative AI tools are increasingly used to create portrayals of people in occupations, raising concerns about how race and gender are represented. We conducted a large-scale audit of over ...
Physically consistent and uncertainty-aware learning of spatiotemporal dynamics : Abstract: Accurate long-term forecasting of spatiotemporal dynamics remains a fundamental challenge across scientific and engineering domains. Existing machine learning methods often neglect governing...
JSTprove: Pioneering Verifiable AI for a Trustless Future : Abstract: The integration of machine learning (ML) systems into critical industries such as healthcare, finance, and cybersecurity has transformed decision-making processes, but it also brings new cha...
AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents : Abstract: The emergence of foundation models (FMs) has enabled the development of highly capable and autonomous agents, unlocking new application opportunities across a wide range of domains. Evaluati...
Reasoning's Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection : Abstract: Reasoning has become a central paradigm for large language models (LLMs), consistently boosting accuracy across diverse benchmarks. Yet its suitability for precision-sensitive tasks remains ...
On the Sample Complexity of Differentially Private Policy Optimization : Abstract: Policy optimization (PO) is a cornerstone of modern reinforcement learning (RL), with diverse applications spanning robotics, healthcare, and large language model training. The increasing de...
Deep learning-based automated damage detection in concrete structures using images from earthquake events : Abstract: Timely assessment of integrity of structures after seismic events is crucial for public safety and emergency response. This study focuses on assessing the structural damage conditions using ...
Bridging Language Gaps with Adaptive RAG: Improving Indonesian Language Question Answering : Abstract: Question Answering (QA) has seen significant improvements with the advancement of machine learning models, further studies enhanced this question answering system by retrieving external info...
Soppia: A Structured Prompting Framework for the Proportional Assessment of Non-Pecuniary Damages in Personal Injury Cases : Abstract: Applying complex legal rules characterized by multiple, heterogeneously weighted criteria presents a fundamental challenge in judicial decision-making, often hindering the consistent realiza...
CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases : Abstract: Intelligent drug recommendation based on Electronic Health Records (EHRs) is critical for improving for improving the quality and efficiency of clinical decision-making. By leveraging large-...
M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property Prediction : Abstract: Molecular property prediction (MPP) is a cornerstone of drug discovery and materials science, yet conventional deep learning approaches depend on large labeled datasets that are often unavai...
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only : Abstract: Supervised fine-tuning (SFT) has emerged as a crucial method for aligning large language models (LLMs) with human-annotated demonstrations. However, SFT, being an off-policy approach similar...
ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs : Abstract: In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observ...
Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility : Abstract: High-definition 3D city maps underpin smart transportation, digital twins, and autonomous driving, where object level change detection across bi temporal LiDAR enables HD map maintenance, co...
The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection : Abstract: Ensuring that Large Language Models (LLMs) generate summaries faithful to a given source document is essential for real-world applications. While prior research has explored LLM faithfulness...
Generalizable Hierarchical Skill Learning via Object-Centric Representation : Abstract: We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robo...
Enhanced Evolutionary Multi-Objective Deep Reinforcement Learning for Reliable and Efficient Wireless Rechargeable Sensor Networks : Abstract: Despite rapid advancements in sensor networks, conventional battery-powered sensor networks suffer from limited operational lifespans and frequent maintenance requirements that severely cons...
Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications : Abstract: Large Language Models (LLMs) have achieved remarkable success in natural language processing through strong semantic understanding and generation. However, their black-box nature limits stru...
Quantifying CBRN Risk in Frontier Models : Abstract: Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological, and nuclear (CBRN) weapons knowledge. We pr...
Hierarchical AI Multi-Agent Fundamental Investing: Evidence from China's A-Share Market : Abstract: We present a multi-agent, AI-driven framework for fundamental investing that integrates macro indicators, industry-level and firm-specific information to construct optimized equity portfolio...
Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design : Abstract: Designing de novo 3D molecules with desirable properties remains a fundamental challenge in drug discovery and molecular engineering. While diffusion models have demonstrated remarkable capa...
Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach : Abstract: Split Federated Learning (SFL) enables scalable training on edge devices by combining the parallelism of Federated Learning (FL) with the computational offloading of Split Learning (SL). Des...
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference : Abstract: Reinforcement learning (RL) has become a predominant technique to align language models (LMs) with human preferences or promote outputs which are deemed to be desirable by a given reward fun...
PLAN: Proactive Low-Rank Allocation for Continual Learning : Abstract: Continual learning (CL) requires models to continuously adapt to new tasks without forgetting past knowledge. In this work, we propose \underline{P}roactive \underline{L}ow-rank \underline{A...
Securing AI Agent Execution : Abstract: Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model Context Protocol (MCP) has become the de facto...
Physics-Informed Neural Networks for MIMO Beam Map and Environment Reconstruction : Abstract: As communication networks evolve towards greater complexity (e.g., 6G and beyond), a deep understanding of the wireless environment becomes increasingly crucial. When explicit knowledge of t...
Correlation Dimension of Auto-Regressive Large Language Models : Abstract: Large language models (LLMs) have achieved remarkable progress in natural language generation, yet they continue to display puzzling behaviors -- such as repetition and incoherence -- even w...
Sparser Block-Sparse Attention via Token Permutation : Abstract: Scaling the context length of large language models (LLMs) offers significant benefits but is computationally expensive. This expense stems primarily from the self-attention mechanism, whose...
Pctx: Tokenizing Personalized Context for Generative Recommendation : Abstract: Generative recommendation (GR) models tokenize each action into a few discrete tokens (called semantic IDs) and autoregressively generate the next tokens as predictions, showing advantages s...
WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation : Abstract: While recent sound event detection (SED) systems can identify baleen whale calls in marine audio, challenges related to false positive and minority-class detection persist. We propose the bo...
Efficient semantic uncertainty quantification in language models via diversity-steered sampling : Abstract: Accurately estimating semantic aleatoric and epistemic uncertainties in large language models (LLMs) is particularly challenging in free-form question answering (QA), where obtaining stable ...
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization : Abstract: The rapid scaling of large language models (LLMs) has made low-precision training essential for reducing memory, improving efficiency, and enabling larger models and datasets. Existing conve...
Seemingly Redundant Modules Enhance Robust Odor Learning in Fruit Flies : Abstract: Biological circuits have evolved to incorporate multiple modules that perform similar functions. In the fly olfactory circuit, both lateral inhibition (LI) and neuronal spike frequency adapt...
TripTide: A Benchmark for Adaptive Travel Planning under Disruptions : Abstract: Recent efforts like TripCraft and TravelPlanner have advanced the use of Large Language Models ( LLMs) for personalized, constraint aware travel itinerary generation. Yet, real travel often ...
Weak-to-Strong Generalization under Distribution Shifts : Abstract: As future superhuman models become increasingly complex, accurately supervising their behavior may exceed human capabilities. Recent works have demonstrated that in such scenarios, weak mode...
CausalRec: A CausalBoost Attention Model for Sequential Recommendation : Abstract: Recent advances in correlation-based sequential recommendation systems have demonstrated substantial success. Specifically, the attention-based model outperforms other RNN-based and Markov c...
World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data : Abstract: Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack compl...
$\alpha$-LoRA: Effective Fine-Tuning via Base Model Rescaling : Abstract: Fine-tuning has proven to be highly effective in adapting pre-trained models to perform better on new desired tasks with minimal data samples. Among the most widely used approaches are repar...
CT-CLIP: A Multi-modal Fusion Framework for Robust Apple Leaf Disease Recognition in Complex Environments : Abstract: In complex orchard environments, the phenotypic heterogeneity of different apple leaf diseases, characterized by significant variation among lesions, poses a challenge to traditional multi-s...
Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding : Abstract: Eye gaze offers valuable cues about attention, short-term intent, and future actions, making it a powerful signal for modeling egocentric behavior. In this work, we propose a gaze-regularize...
Patient-specific AI for generation of 3D dosimetry imaging from two 2D-planar measurements : Abstract: In this work we explored the use of patient specific reinforced learning to generate 3D activity maps from two 2D planar images (anterior and posterior). The solution of this problem remains...
HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences : Abstract: HIKMA Semi-Autonomous Conference is the first experiment in reimagining scholarly communication through an end-to-end integration of artificial intelligence into the academic publishing and ...
Compressing Quaternion Convolutional Neural Networks for Audio Classification : Abstract: Conventional Convolutional Neural Networks (CNNs) in the real domain have been widely used for audio classification. However, their convolution operations process multi-channel inputs indepe...
Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study : Abstract: Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires mor...
REvolution: An Evolutionary Framework for RTL Generation driven by Large Language Models : Abstract: Large Language Models (LLMs) are used for Register-Transfer Level (RTL) code generation, but they face two main challenges: functional correctness and Power, Performance, and Area (PPA) opti...
Large Language Models as Model Organisms for Human Associative Learning : Abstract: Associative learning--forming links between co-occurring items--is fundamental to human cognition, reshaping internal representations in complex ways. Testing hypotheses on how representatio...
DreamerV3-XP: Optimizing exploration through uncertainty estimation : Abstract: We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconst...
Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings : Abstract: As generative AI continues to evolve, Vision Language Models (VLMs) have emerged as promising tools in various healthcare applications. One area that remains relatively underexplored is thei...
Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification : Abstract: [Context and motivation] Large language models (LLMs) show notable results in natural language processing (NLP) tasks for requirements engineering (RE). However, their use is compromised by ...
REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring : Abstract: With the widespread adoption of wearable devices in our daily lives, the demand and appeal for remote patient monitoring have significantly increased. Most research in this field has concent...
PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis : Abstract: Interactive world models that simulate object dynamics are crucial for robotics, VR, and AR. However, it remains a significant challenge to learn physics-consistent dynamics models from limi...
Enhancing Social Robots through Resilient AI : Abstract: As artificial intelligence continues to advance and becomes more integrated into sensitive areas like healthcare, education, and everyday life, it's crucial for these systems to be both resi...
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs : Abstract: Vision encoders are indispensable for allowing impressive performance of Multi-modal Large Language Models (MLLMs) in vision language tasks such as visual question answering and reasoning. H...
Human and AI Trust: Trust Attitude Measurement Instrument : Abstract: With the current progress of Artificial Intelligence (AI) technology and its increasingly broader applications, trust is seen as a required criterion for AI usage, acceptance, and deployment...
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos : Abstract: This paper presents a novel approach for pretraining robotic manipulation Vision-Language-Action (VLA) models using a large corpus of unscripted real-life video recordings of human hand acti...
From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene : Abstract: Large language models are demonstrating increasing capabilities, excelling at benchmarks once considered very difficult. As their capabilities grow, there is a need for more challenging eval...
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation : Abstract: Group Relative Policy Optimization (GRPO) has shown strong potential for flow-matching-based text-to-image (T2I) generation, but it faces two key limitations: inaccurate advantage attributio...
Generative Correlation Manifolds: Generating Synthetic Data with Preserved Higher-Order Correlations : Abstract: The increasing need for data privacy and the demand for robust machine learning models have fueled the development of synthetic data generation techniques. However, current methods often suc...
The Universal Landscape of Human Reasoning : Abstract: Understanding how information is dynamically accumulated and transformed in human reasoning has long challenged cognitive psychology, philosophy, and artificial intelligence. Existing accoun...
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations : Abstract: Knowledge distillation is a promising approach to transfer capabilities from complex teacher models to smaller, resource-efficient student models that can be deployed easily, particularly in...
DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection : Abstract: Deploying reinforcement learning (RL) in safety-critical settings is constrained by brittleness under distribution shift. We study out-of-distribution (OOD) detection for RL time series and ...
A Dynamic Knowledge Distillation Method Based on the Gompertz Curve : Abstract: This paper introduces a novel dynamic knowledge distillation framework, Gompertz-CNN, which integrates the Gompertz growth model into the training process to address the limitations of tradi...
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging : Abstract: Tracking human full-body motion using sparse wearable inertial measurement units (IMUs) overcomes the limitations of occlusion and instrumentation of the environment inherent in vision-based...
On Thin Ice: Towards Explainable Conservation Monitoring via Attribution and Perturbations : Abstract: Computer vision can accelerate ecological research and conservation monitoring, yet adoption in ecology lags in part because of a lack of trust in black-box neural-network-based models. We s...
Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting : Abstract: Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF) by capturing positional and semantic topological relationships among input tokens. Howev...
Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning : Abstract: Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has bec...
Brain-like Variational Inference : Abstract: Inference in both brains and machines can be formalized by optimizing a shared objective: maximizing the evidence lower bound (ELBO) in machine learning, or minimizing variational free energ...
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty : Abstract: User prompts for generative AI models are often underspecified, leading to a misalignment between the user intent and models' understanding. As a result, users commonly have to painstakingly...
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search : Abstract: Recent advances demonstrate that increasing inference-time computation can significantly boost the reasoning capabilities of large language models (LLMs). Although repeated sampling (i.e., g...
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code : Abstract: In recent years, large language models (LLMs) have shown remarkable capabilities in various artificial intelligence problems. However, they fail to plan reliably, even when prompted with a d...
Information-Theoretic Reward Decomposition for Generalizable RLHF : Abstract: A generalizable reward model is crucial in Reinforcement Learning from Human Feedback (RLHF) as it enables correctly evaluating unseen prompt-response pairs. However, existing reward models ...
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? : Abstract: We introduce MLRC-Bench, a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions, with a focus on open research pr...
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers : Abstract: We present causal head gating (CHG), a scalable method for interpreting the functional roles of attention heads in transformer models. CHG learns soft gates over heads and assigns them a cau...
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations : Abstract: Large language models (LLMs) can sometimes report the strategies they actually use to solve tasks, yet at other times seem unable to recognize those strategies that govern their behavior. Th...
Reinforced Latent Reasoning for LLM-based Recommendation : Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks, sparking growing interest in their application to preference reasoning in r...
Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation : Abstract: Large Language Models (LLMs) are increasingly explored for legal argument generation, yet they pose significant risks of manipulation through hallucination and ungrounded persuasion, and oft...
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning : Abstract: Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks. Despite the substantial empirical gains demonstrated...
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards : Abstract: Large Language Models (LLMs) continue to exhibit vulnerabilities despite deliberate safety alignment efforts, posing significant risks to users and society. To safeguard against the risk of ...
Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning : Abstract: Diffusion models have recently emerged as a powerful approach for trajectory planning. However, their inherently non-sequential nature limits their effectiveness in long-horizon reasoning ta...
Cascaded Language Models for Cost-effective Human-AI Decision-Making : Abstract: A challenge in human-AI decision-making is to balance three factors: the correctness of predictions, the cost of knowledge and reasoning complexity, and the confidence about whether to absta...
How to Train Your LLM Web Agent: A Statistical Diagnosis : Abstract: LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held bac...
ViTime: Foundation Model for Time Series Forecasting Powered by Vision Intelligence : Abstract: Time series forecasting (TSF) possesses great practical values in various fields, including power and energy, transportation, etc. TSF methods have been studied based on knowledge from class...
Teaching Transformers Causal Reasoning through Axiomatic Training : Abstract: For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since active interventions are costly, we study to what extent a system can learn causal reas...
Size and Smoothness Aware Adaptive Focal Loss for Small Tumor Segmentation : Abstract: Deep learning has achieved remarkable accuracy in medical image segmentation, particularly for larger structures with well-defined boundaries. However, its effectiveness can be challenged by...
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models : Abstract: Large Language Models (LLMs) excel in code-related tasks like code generation, but benchmark evaluations often overlook task characteristics, such as difficulty. Moreover, benchmarks are usu...
Exploring the Limitations of Layer Synchronization in Spiking Neural Networks : Abstract: Neural-network processing in machine learning applications relies on layer synchronization. This is practiced even in artificial Spiking Neural Networks (SNNs), which are touted as consisten...
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning : Abstract: Reinforcement learning with general utilities (RLGU) offers a unifying framework to capture several problems beyond standard expected returns, including imitation learning, pure exploration,...
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees : Abstract: In the domain of complex reasoning tasks, such as mathematical reasoning, recent advancements have proposed the use of Direct Preference Optimization (DPO) to suppress output of dispreferred...
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques : Abstract: Cognitive decline is a natural part of aging. However, under some circumstances, this decline is more pronounced than expected, typically due to disorders such as Alzheimer's disease. Early ...
Understanding Adam Requires Better Rotation Dependent Assumptions : Abstract: Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotation...
Training the Untrainable: Introducing Inductive Bias via Representational Alignment : Abstract: We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. We call a network untrainabl...
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback : Abstract: Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended ex...
Interpretable Next-token Prediction via the Generalized Induction Head : Abstract: While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized I...
Domain Adaptation-based Edge Computing for Cross-Conditions Fault Diagnosis : Abstract: Fault diagnosis of mechanical equipment provides robust support for industrial production. It is worth noting that, the operation of mechanical equipment is accompanied by changes in factors...
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning : Abstract: Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the num...
DynamicPAE: Generating Scene-Aware Physical Adversarial Examples in Real-Time : Abstract: Physical adversarial examples (PAEs) are regarded as whistle-blowers of real-world risks in deep-learning applications, thus worth further investigation. However, current PAE generation stud...
How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models : Abstract: Language is a deep-rooted means of perpetration of stereotypes and discrimination. Large Language Models (LLMs), now a pervasive technology in our everyday lives, can cause extensive harm wh...
Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM : Abstract: This study introduces a human-in-the-loop pipeline that converts unscaled, hand-drawn floor plan sketches into semantically consistent 3D BIM models. The workflow leverages multimodal large ...
Cultural Alien Sampler: Open-ended art generation balancing originality and coherence : Abstract: In open-ended domains like art, autonomous agents must generate ideas that are both original and internally coherent, yet current Large Language Models (LLMs) either default to familiar cult...
Fuzzy numbers revisited: operations on extensional fuzzy numbers : Abstract: Fuzzy numbers are commonly represented with fuzzy sets. Their objective is to better represent imprecise data. However, operations on fuzzy numbers are not as straightforward as maths on cri...
Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems : Abstract: Harmonizing medication data across Electronic Health Record (EHR) systems is a persistent barrier to monitoring medications for opioid use disorder (MOUD). In heterogeneous EHR systems, key ...
Epistemic Deference to AI : Abstract: When should we defer to AI outputs over human expert judgment? Drawing on recent work in social epistemology, I motivate the idea that some AI systems qualify as Artificial Epistemic Authori...
From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL : Abstract: The complexity of Structured Query Language (SQL) and the specialized nature of geospatial functions in tools like PostGIS present significant barriers to non-experts seeking to analyze spat...
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning : Abstract: Recently, large models have shown significant potential for smart healthcare. However, the deployment of Large Vision-Language Models (LVLMs) for clinical services is currently hindered by t...
Confounding Robust Deep Reinforcement Learning: A Causal Approach : Abstract: A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-lear...
DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance : Abstract: This paper presents a first empirical study of agentic AI as autonomous decision-makers in decentralized governance. Using more than 3K proposals from major protocols, we build an agentic AI...
PanicToCalm: A Proactive Counseling Agent for Panic Attacks : Abstract: Panic attacks are acute episodes of fear and distress, in which timely, appropriate intervention can significantly help individuals regain stability. However, suitable datasets for training ...
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge : Abstract: Retrieval-Augmented Generation (RAG) empowers Large Language Models (LLMs) to dynamically integrate external knowledge during inference, improving their factual accuracy and adaptability. Ho...
How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation : Abstract: Designing optimal prompts and reasoning processes for large language models (LLMs) on domain-specific tasks is both necessary and challenging in real-world applications. Determining how to i...
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation : Abstract: We introduce String Seed of Thought (SSoT), a novel prompting method for LLMs that improves Probabilistic Instruction Following (PIF). We define PIF as a task requiring an LLM to select its ...
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models : Abstract: Pre-trained vision-language models (VLMs), such as CLIP, have demonstrated remarkable zero-shot generalization, enabling deployment in a wide range of real-world tasks without additional tas...
Shylock: Causal Discovery in Multivariate Time Series based on Hybrid Constraints : Abstract: Causal relationship discovery has been drawing increasing attention due to its prevalent application. Existing methods rely on human experience, statistical methods, or graphical criteria me...
OutboundEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Outbound Evaluation of Xbench's Professional-Aligned Series : Abstract: We propose OutboundEval, a comprehensive benchmark for evaluating large language models (LLMs) in expert-level intelligent outbound calling scenarios. Unlike existing methods that suffer fro...
Out-of-Distribution Detection for Safety Assurance of AI and Autonomous Systems : Abstract: The operational capabilities and application domains of AI-enabled autonomous systems have expanded significantly in recent years due to advances in robotics and machine learning (ML). Demon...
Investigating Scale Independent UCT Exploration Factor Strategies : Abstract: The Upper Confidence Bounds For Trees (UCT) algorithm is not agnostic to the reward scale of the game it is applied to. For zero-sum games with the sparse rewards of $\{-1,0,1\}$ at the end ...
When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails : Abstract: Large Reasoning Models (LRMs) demonstrate remarkable capabilities on complex reasoning tasks but remain vulnerable to severe safety risks, including harmful content generation and jailbreak ...
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles : Abstract: Background: Trustworthy AI serves as a foundational pillar for two major AI ethics conferences: AIES and FAccT. However, current research often adopts techno-centric approaches, focusing pri...
Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning : Abstract: Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the po...
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation : Abstract: Chest X-ray (CXR) plays a pivotal role in clinical diagnosis, and a variety of task-specific and foundation models have been developed for automatic CXR interpretation. However, these models...
Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation : Abstract: Large Language Models (LLMs) often struggle with generating truly innovative ideas, typically defaulting to high-probability, familiar concepts within their training data's "gravity wells." ...
Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning : Abstract: Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Mode...
Advancing Symbolic Integration in Large Language Models: Beyond Conventional Neurosymbolic AI : Abstract: LLMs have demonstrated highly effective learning, human-like response generation,and decision-making capabilities in high-risk sectors. However, these models remain black boxes because they ...
AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving : Abstract: This study presents AutoOpt-11k, a unique image dataset of over 11,000 handwritten and printed mathematical optimization models corresponding to single-objective, multi-objective, multi-leve...
Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP : Abstract: Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the com...
EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law : Abstract: Large language models (LLMs) are increasingly deployed as agents in various contexts by providing tools at their disposal. However, LLM agents can exhibit unpredictable behaviors, including ...
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts : Abstract: Long-horizon reasoning in LLM-based agents often fails not from generative weakness but from insufficient verification of intermediate reasoning. Co-Sight addresses this challenge by turning...
Learning Neural Control Barrier Functions from Expert Demonstrations using Inverse Constraint Learning : Abstract: Safety is a fundamental requirement for autonomous systems operating in critical domains. Control barrier functions (CBFs) have been used to design safety filters that minimally alter nomina...
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine : Abstract: Recent studies operationalize self-improvement through coding agents that edit their own codebases. They grow a tree of self-modifications through expansion strategies that favor higher soft...
DeepAgent: A General Reasoning Agent with Scalable Toolsets : Abstract: Large reasoning models have demonstrated strong problem-solving abilities, yet real-world tasks often require external tools and long-horizon interactions. Existing agent frameworks typicall...
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite : Abstract: AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...

Research Sources: 580 | Generated: 10/27/2025