AI RESEARCH PAPERS & ACADEMIC SOURCES
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models : Abstract: The development of large vision language models drives the demand for managing, and applying massive amounts of multimodal data, making OCR technology, which extracts information from visual...
- WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models : Abstract: Digital watermarking is essential for securing generated images from diffusion models. Accurate watermark evaluation is critical for algorithm development, yet existing methods have signific...
- MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis : Abstract: Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for...
- RAD: Region-Aware Diffusion Models for Image Inpainting : Abstract: Diffusion models have achieved remarkable success in image generation, with applications broadening across various domains. Inpainting is one such application that can benefit significantly ...
- DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing : Abstract: Current image immunization defense techniques against diffusion-based editing embed imperceptible noise into target images to disrupt editing models. However, these methods face scalability ...
- Quasi-Medial Distance Field (Q-MDF): A Robust Method for Approximating and Discretizing Neural Medial Axes : Abstract: The medial axis, a lower-dimensional descriptor that captures the extrinsic structure of a shape, plays an important role in digital geometry processing. Despite its importance, computing th...
- Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction : Abstract: Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has e...
- PDF-HR: Pose Distance Fields for Humanoid Robots : Abstract: Pose and motion priors play a crucial role in humanoid robotics. Although such priors have been widely studied in human motion recovery (HMR) domain with a range of models, their adoption fo...
- EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models : Abstract: Deploying humanoid robots in real-world settings is fundamentally challenging, as it demands tight integration of perception, locomotion, and manipulation under partial-information observati...
- Self-evolving Embodied AI : Abstract: Embodied Artificial Intelligence (AI) is an intelligent system formed by agents and their environment through active perception, embodied cognition, and action interaction. Existing embodied...
- Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition : Abstract: Visual Place Recognition (VPR) is a key component for localisation in GNSS-denied environments, but its performance critically depends on selecting an image matching threshold (operating poi...
- GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning : Abstract: Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One ...
- Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions : Abstract: Traditional Simultaneous Localization and Mapping (SLAM) systems often face limitations including coarse rendering quality, insufficient recovery of scene details, and poor robustness in dyn...
- An Improved Boosted DC Algorithm for Nonsmooth Functions with Applications in Image Recovery : Abstract: We propose a new approach to perform the boosted difference of convex functions algorithm (BDCA) on non-smooth and non-convex problems involving the difference of convex (DC) functions. The ...
- MS-SCANet: A Multiscale Transformer-Based Architecture with Dual Attention for No-Reference Image Quality Assessment : Abstract: We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual...
- AtlasPatch: An Efficient and Scalable Tool for Whole Slide Image Preprocessing in Computational Pathology : Abstract: Whole-slide image (WSI) preprocessing, typically comprising tissue detection followed by patch extraction, is foundational to AI-driven computational pathology workflows. This remains a majo...
- Efficient Long-Horizon Vision-Language-Action Models via Static-Dynamic Disentanglement : Abstract: Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for generalist robotic control. Built upon vision-language model (VLM) architectures, VLAs predict actions c...
- VLS: Steering Pretrained Robot Policies via Vision-Language Models : Abstract: Why do pretrained diffusion or flow-matching policies fail when the same task is performed near an obstacle, on a shifted support surface, or amid mild clutter? Such failures rarely reflect ...
- Beyond the Vehicle: Cooperative Localization by Fusing Point Clouds for GPS-Challenged Urban Scenarios : Abstract: Accurate vehicle localization is a critical challenge in urban environments where GPS signals are often unreliable. This paper presents a cooperative multi-sensor and multi-modal localizatio...
- To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction? : Abstract: Pathology foundation models (PFMs) have rapidly advanced and are becoming a common backbone for downstream clinical tasks, offering strong transferability across tissues and institutions. Ho...
- DINO-AD: Unsupervised Anomaly Detection with Frozen DINO-V3 Features : Abstract: Unsupervised anomaly detection (AD) in medical images aims to identify abnormal regions without relying on pixel-level annotations, which is crucial for scalable and label-efficient diagnost...
- CoWTracker: Tracking by Warping instead of Correlation : Abstract: Dense point tracking is a fundamental problem in computer vision, with applications ranging from video analysis to robotic manipulation. State-of-the-art trackers typically rely on cost volu...
- PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation : Abstract: We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because the...
- Laminating Representation Autoencoders for Efficient Diffusion : Abstract: Recent work has shown that diffusion models can generate high-quality images by operating directly on SSL patch features rather than pixel-space latents. However, the dense patch grids from ...
- When LLaVA Meets Objects: Token Composition for Vision-Language-Models : Abstract: Current autoregressive Vision Language Models (VLMs) usually rely on a large number of visual tokens to represent images, resulting in a need for more compute especially at inference time. T...
- LitS: A novel Neighborhood Descriptor for Point Clouds : Abstract: With the advancement of 3D scanning technologies, point clouds have become fundamental for representing 3D spatial data, with applications that span across various scientific and technologic...
- X2HDR: HDR Image Generation in a Perceptually Uniform Space : Abstract: High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dyn...
- VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? : Abstract: Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text que...
- Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention : Abstract: Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient dep...
- Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation : Abstract: Semantic segmentation of high-resolution remote-sensing imagery is critical for urban mapping and land-cover monitoring, yet training data typically exhibits severe long-tailed pixel imbalan...
- How to rewrite the stars: Mapping your orchard over time through constellations of fruits : Abstract: Following crop growth through the vegetative cycle allows farmers to predict fruit setting and yield in early stages, but it is a laborious and non-scalable task if performed by a human who ...
- Annotation Free Spacecraft Detection and Segmentation using Vision Language Models : Abstract: Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related applications remains largely unex...
- AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation : Abstract: Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However...
- PIO-FVLM: Rethinking Training-Free Visual Token Reduction for VLM Acceleration from an Inference-Objective Perspective : Abstract: Recently, reducing redundant visual tokens in vision-language models (VLMs) to accelerate VLM inference has emerged as a hot topic. However, most existing methods rely on heuristics construc...
- A labeled dataset of simulated phlebotomy procedures for medical AI: polygon annotations for object detection and human-object interaction : Abstract: This data article presents a dataset of 11,884 labeled images documenting a simulated blood extraction (phlebotomy) procedure performed on a training arm. Images were extracted from high-def...
- ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry : Abstract: We present ImmuVis, an efficient convolutional foundation model for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as...
- SalFormer360: a transformer-based saliency estimation model for 360-degree videos : Abstract: Saliency estimation has received growing attention in recent years due to its importance in a wide range of applications. In the context of 360-degree video, it has been particularly valuabl...
- PEPR: Privileged Event-based Predictive Regularization for Domain Generalization : Abstract: Deep neural networks for visual perception are highly susceptible to domain shift, which poses a critical challenge for real-world deployment under conditions that differ from the training d...
- Understanding Degradation with Vision Language Model : Abstract: Understanding visual degradations is a critical yet challenging problem in computer vision. While recent Vision-Language Models (VLMs) excel at qualitative description, they often fall short...
- Nix and Fix: Targeting 1000x Compression of 3D Gaussian Splatting with Diffusion Models : Abstract: 3D Gaussian Splatting (3DGS) revolutionized novel view rendering. Instead of inferring from dense spatial points, as implicit representations do, 3DGS uses sparse Gaussians. This enables rea...
- S-MUSt3R: Sliding Multi-view 3D Reconstruction : Abstract: The recent paradigm shift in 3D vision led to the rise of foundation models with remarkable capabilities in 3D perception from uncalibrated images. However, extending these models to large-s...
- Vision-aligned Latent Reasoning for Multi-modal Large Language Model : Abstract: Despite recent advancements in Multi-modal Large Language Models (MLLMs) on diverse understanding tasks, these models struggle to solve problems which require extensive multi-step reasoning....
- SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening : Abstract: Recently, diffusion models bring novel insights for Pan-sharpening and notably boost fusion precision. However, most existing models perform diffusion in the pixel space and train distinct m...
- Temporal Slowness in Central Vision Drives Semantic Object Learning : Abstract: Humans acquire semantic object representations from egocentric visual streams with minimal supervision. Importantly, the visual system processes with high resolution only the center of its f...
- Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search : Abstract: Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning...
- SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking : Abstract: Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general poin...
- TrajVG: 3D Trajectory-Coupled Visual Geometry Learning : Abstract: Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavi...
- LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration : Abstract: Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration metho...
- Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion : Abstract: Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images, retaining texture details and preserving significant information. Recently, some MMI...
- When and Where to Attack? Stage-wise Attention-Guided Adversarial Attack on Large Vision Language Models : Abstract: Adversarial attacks against Large Vision-Language Models (LVLMs) are crucial for exposing safety vulnerabilities in modern multimodal systems. Recent attacks based on input transformations, ...
- Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception : Abstract: We present Neural Memory Object (NeMO), a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB image...
- Multiview Self-Representation Learning across Heterogeneous Views : Abstract: Features of the same sample generated by different pretrained models often exhibit inherently distinct feature distributions because of discrepancies in the model pretraining objectives or a...
- JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction : Abstract: Reconstructing high-fidelity animatable 3D human avatars from monocular RGB videos remains challenging, particularly in unconstrained in-the-wild scenarios where camera parameters and human ...
- Light Up Your Face: A Physically Consistent Dataset and Diffusion Model for Face Fill-Light Enhancement : Abstract: Face fill-light enhancement (FFE) brightens underexposed faces by adding virtual fill light while keeping the original scene illumination and background unchanged. Most face relighting metho...
- KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing : Abstract: Despite the significant progress of Multimodal Large Language Models (MLLMs) across diverse tasks, hallucination -- corresponding to the generation of visually inconsistent objects, attribut...
- Decoupled Hierarchical Distillation for Multimodal Emotion Recognition : Abstract: Human multimodal emotion recognition (MER) seeks to infer human emotions by integrating information from language, visual, and acoustic modalities. Although existing MER approaches have achi...
- Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery : Abstract: Monocular video human mesh recovery faces fundamental challenges in maintaining metric consistency and temporal stability due to inherent depth ambiguities and scale uncertainties. While exi...
- An Intuitionistic Fuzzy Logic Driven UNet architecture: Application to Brain Image segmentation : Abstract: Accurate segmentation of MRI brain images is essential for image analysis, diagnosis of neuro-logical disorders and medical image computing. In the deep learning approach, the convolutional ...
- Adaptive 1D Video Diffusion Autoencoder : Abstract: Recent video generation models largely rely on video autoencoders that compress pixel-space videos into latent representations. However, existing video autoencoders suffer from three major l...
- VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents : Abstract: This work presents VTok, a unified video tokenization framework that can be used for both generation and understanding tasks. Unlike the leading vision-language systems that tokenize videos ...
- Continuous Degradation Modeling via Latent Flow Matching for Real-World Super-Resolution : Abstract: While deep learning-based super-resolution (SR) methods have shown impressive outcomes with synthetic degradation scenarios such as bicubic downsampling, they frequently struggle to perform ...
- DiMo: Discrete Diffusion Modeling for Motion Generation and Understanding : Abstract: Prior masked modeling motion generation methods predominantly study text-to-motion. We present DiMo, a discrete diffusion-style framework, which extends masked modeling to bidirectional text...
- Partial Ring Scan: Revisiting Scan Order in Vision State Space Models : Abstract: State Space Models (SSMs) have emerged as efficient alternatives to attention for vision tasks, offering lineartime sequence processing with competitive accuracy. Vision SSMs, however, requi...
- Point2Insert: Video Object Insertion via Sparse Point Guidance : Abstract: This paper introduces Point2Insert, a sparse-point-based framework for flexible and user-friendly object insertion in videos, motivated by the growing popularity of accurate, low-effort obje...
- Context Determines Optimal Architecture in Materials Segmentation : Abstract: Segmentation architectures are typically benchmarked on single imaging modalities, obscuring deployment-relevant performance variations: an architecture optimal for one modality may underper...
- SuperPoint-E: local features for 3D reconstruction via tracking adaptation in endoscopy : Abstract: In this work, we focus on boosting the feature extraction to improve the performance of Structure-from-Motion (SfM) in endoscopy videos. We present SuperPoint-E, a new local feature extracti...
- VideoBrain: Learning Adaptive Frame Sampling for Long Video Understanding : Abstract: Long-form video understanding remains challenging for Vision-Language Models (VLMs) due to the inherent tension between computational constraints and the need to capture information distribu...
- iSight: Towards expert-AI co-assessment for improved immunohistochemistry staining interpretation : Abstract: Immunohistochemistry (IHC) provides information on protein expression in tissue sections and is commonly used to support pathology diagnosis and disease triage. While AI models for H\&E-stai...
- Seeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object Removal : Abstract: We present SeeingThroughClutter, a method for reconstructing structured 3D representations from single images by segmenting and modeling objects individually. Prior approaches rely on interm...
- Artifact Removal and Image Restoration in AFM:A Structured Mask-Guided Directional Inpainting Approach : Abstract: Atomic Force Microscopy (AFM) enables high-resolution surface imaging at the nanoscale, yet the output is often degraded by artifacts introduced by environmental noise, scanning imperfection...
- Fast, Unsupervised Framework for Registration Quality Assessment of Multi-stain Histological Whole Slide Pairs : Abstract: High-fidelity registration of histopathological whole slide images (WSIs), such as hematoxylin & eosin (H&E) and immunohistochemistry (IHC), is vital for integrated molecular analysis but ch...
- A Parameterizable Convolution Accelerator for Embedded Deep Learning Applications : Abstract: Convolutional neural network (CNN) accelerators implemented on Field-Programmable Gate Arrays (FPGAs) are typically designed with a primary focus on maximizing performance, often measured in...
- AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting : Abstract: The growing demand for rapid and scalable 3D asset creation has driven interest in feed-forward 3D reconstruction methods, with 3D Gaussian Splatting (3DGS) emerging as an effective scene re...
- TiCLS : Tightly Coupled Language Text Spotter : Abstract: Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues...
- Entropy Reveals Block Importance in Masked Self-Supervised Vision Transformers : Abstract: Masked self-supervised vision transformers have become a dominant pretraining paradigm, yet their substantial model size poses significant challenges for resource-constrained deployment and ...
- GPAIR: Gaussian-Kernel-Based Ultrafast 3D Photoacoustic Iterative Reconstruction : Abstract: Although the iterative reconstruction (IR) algorithm can substantially correct reconstruction artifacts in photoacoustic (PA) computed tomography (PACT), it suffers from long reconstruction ...
- 4DPC$^2$hat: Towards Dynamic Point Cloud Understanding with Failure-Aware Bootstrapping : Abstract: Point clouds provide a compact and expressive representation of 3D objects, and have recently been integrated into multimodal large language models (MLLMs). However, existing methods primari...
- Intellectual Property Protection for 3D Gaussian Splatting Assets: A Survey : Abstract: 3D Gaussian Splatting (3DGS) has become a mainstream representation for real-time 3D scene synthesis, enabling applications in virtual and augmented reality, robotics, and 3D content creatio...
- SpeechMapper: Speech-to-text Embedding Projector for LLMs : Abstract: Current speech LLMs bridge speech foundation models to LLMs using projection layers, training all of these components on speech instruction data. This strategy is computationally intensive a...
- PersoBench: Benchmarking Personalized Response Generation in Large Language Models : Abstract: While large language models (LLMs) have exhibited impressive conversational capabilities, their proficiency in delivering personalized responses remains unclear. Although recent benchmarks a...
- Horizon-LM: A RAM-Centric Architecture for LLM Training : Abstract: The rapid growth of large language models (LLMs) has outpaced the evolution of single-GPU hardware, making model scale increasingly constrained by memory capacity rather than computation. Wh...
- Speaker-Aware Simulation Improves Conversational Speech Recognition : Abstract: Automatic speech recognition (ASR) for conversational speech remains challenging due to the limited availability of large-scale, well-annotated multi-speaker dialogue data and the complex te...
- Inference-Time Reasoning Selectively Reduces Implicit Social Bias in Large Language Models : Abstract: Drawing on constructs from psychology, prior work has identified a distinction between explicit and implicit bias in large language models (LLMs). While many LLMs undergo post-training align...
- AIANO: Enhancing Information Retrieval with AI-Augmented Annotation : Abstract: The rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) has rapidly increased the need for high-quality, curated information retrieval datasets. These datasets, how...
- Unmasking Superspreaders: Data-Driven Approaches for Identifying and Comparing Key Influencers of Conspiracy Theories on X.com : Abstract: Conspiracy theories can threaten society by spreading misinformation, deepening polarization, and eroding trust in democratic institutions. Social media often fuels the spread of conspiracie...
- PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classification and Personalized Response Generation : Abstract: Understanding and classifying user personas is critical for delivering effective personalization. While persona information offers valuable insights, its full potential is realized only when...
- Frontend Token Enhancement for Token-Based Speech Recognition : Abstract: Discretized representations of speech signals are efficient alternatives to continuous features for various speech applications, including automatic speech recognition (ASR) and speech langu...
- BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning : Abstract: Music understanding is a complex task that often requires reasoning over both structural and semantic elements of audio. We introduce BASS, designed to evaluate music understanding and reaso...
- Chaplains' Reflections on the Design and Usage of AI for Conversational Care : Abstract: Despite growing recognition that responsible AI requires domain knowledge, current work on conversational AI primarily draws on clinical expertise that prioritises diagnosis and intervention...
- CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation : Abstract: From generating headlines to fabricating news, the Large Language Models (LLMs) are typically assessed by their final outputs, under the safety assumption that a refusal response signifies s...
- Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say "I Don't Know" : Abstract: Large language models often struggle to recognize their knowledge limits in closed-book question answering, leading to confident hallucinations. While decomposed prompting is typically used ...
- OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models : Abstract: Omni-modal Large Language Models (Omni-LLMs) have demonstrated strong capabilities in audio-video understanding tasks. However, their reliance on long multimodal token sequences leads to sub...
- Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation : Abstract: Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT sys...
- "Be My Cheese?": Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs : Abstract: We present a large-scale human evaluation benchmark for assessing cultural localisation in machine translation produced by state-of-the-art multilingual large language models (LLMs). Existin...
- Linguistically Informed Evaluation of Multilingual ASR for African Languages : Abstract: Word Error Rate (WER) mischaracterizes ASR models' performance for African languages by combining phonological, tone, and other linguistic errors into a single lexical error. By contrast, Fe...
- LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers : Abstract: Tokenization is fundamental to how language models represent and process text, yet the behavior of widely used BPE tokenizers has received far less study than model architectures and trainin...
- ERNIE 5.0 Technical Report : Abstract: In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All moda...
- LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse : Abstract: Detecting uncivil language is crucial for maintaining safe, inclusive, and democratic online spaces. Yet existing classifiers often misinterpret posts containing uncivil cues but expressing ...
- Investigating Disability Representations in Text-to-Image Models : Abstract: Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. ...
- Approaches to Semantic Textual Similarity in Slovak Language: From Algorithms to Transformers : Abstract: Semantic textual similarity (STS) plays a crucial role in many natural language processing tasks. While extensively studied in high-resource languages, STS remains challenging for under-reso...
- Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models : Abstract: Generative Reward Models (GenRMs) and LLM-as-a-Judge exhibit deceptive alignment by producing correct judgments for incorrect reasons, as they are trained and evaluated to prioritize Outcome...
- Mapping the Web of Science, a large-scale graph and text-based dataset with LLM embeddings : Abstract: Large text data sets, such as publications, websites, and other text-based media, inherit two distinct types of features: (1) the text itself, its information conveyed through semantics, and...
- LEAD: Layer-wise Expert-aligned Decoding for Faithful Radiology Report Generation : Abstract: Radiology Report Generation (RRG) aims to produce accurate and coherent diagnostics from medical images. Although large vision language models (LVLM) improve report fluency and accuracy, the...
- Disentangling meaning from language in LLM-based machine translation : Abstract: Mechanistic Interpretability (MI) seeks to explain how neural networks implement their capabilities, but the scale of Large Language Models (LLMs) has limited prior MI work in Machine Transl...
- Beyond Holistic Scores: Automatic Trait-Based Quality Scoring of Argumentative Essays : Abstract: Automated Essay Scoring systems have traditionally focused on holistic scores, limiting their pedagogical usefulness, especially in the case of complex essay genres such as argumentative wri...
- Semantic Self-Distillation for Language Model Uncertainty : Abstract: Large language models present challenges for principled uncertainty quantification, in part due to their complexity and the diversity of their outputs. Semantic dispersion, or the variance i...
- Can LLMs capture stable human-generated sentence entropy measures? : Abstract: Predicting upcoming words is a core mechanism of language comprehension and may be quantified using Shannon entropy. There is currently no empirical consensus on how many human responses are...
- Textual Planning with Explicit Latent Transitions : Abstract: Planning with LLMs is bottlenecked by token-by-token generation and repeated full forward passes, making multi-step lookahead and rollout-based search expensive in latency and compute. We pr...
- $C$-$\Delta\Theta$: Circuit-Restricted Weight Arithmetic for Selective Refusal : Abstract: Modern deployments require LLMs to enforce safety policies at scale, yet many controls rely on inference-time interventions that add recurring compute cost and serving complexity. Activation...
- ReFRAME or Remain: Unsupervised Lexical Semantic Change Detection with Frame Semantics : Abstract: The majority of contemporary computational methods for lexical semantic change (LSC) detection are based on neural embedding distributional representations. Although these models perform wel...
- Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models : Abstract: Fine-tuning Multimodal Large Language Models (MLLMs) on task-specific data is an effective way to improve performance on downstream applications. However, such adaptation often leads to a de...
- PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation : Abstract: Personalization and contextual coherence are two essential components in building effective persona-grounded dialogue systems. These aspects play a crucial role in enhancing user engagement ...
- Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough : Abstract: Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior acro...
- Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition : Abstract: Grounded Multimodal Named Entity Recognition (GMNER) aims to extract text-based entities, assign them semantic categories, and ground them to corresponding visual regions. In this work, we e...
- Fine-Grained Activation Steering: Steering Less, Achieving More : Abstract: Activation steering has emerged as a cost-effective paradigm for modifying large language model (LLM) behaviors. Existing methods typically intervene at the block level, steering the bundled...
- Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models : Abstract: Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking....
- Evaluating the Presence of Sex Bias in Clinical Reasoning by Large Language Models : Abstract: Large language models (LLMs) are increasingly embedded in healthcare workflows for documentation, education, and clinical decision support. However, these systems are trained on large text c...
- Beyond Rejection Sampling: Trajectory Fusion for Scaling Mathematical Reasoning : Abstract: Large language models (LLMs) have made impressive strides in mathematical reasoning, often fine-tuned using rejection sampling that retains only correct reasoning trajectories. While effecti...
- Can Vision Replace Text in Working Memory? Evidence from Spatial n-Back in Vision-Language Models : Abstract: Working memory is a central component of intelligent behavior, providing a dynamic workspace for maintaining and updating task-relevant information. Recent work has used n-back tasks to prob...
- A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction : Abstract: Information Extraction (IE), encompassing Named Entity Recognition (NER), Named Entity Linking (NEL), and Relation Extraction (RE), is critical for transforming the rapidly growing volume of...
- Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision : Abstract: Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevailing paradigms typ...
- ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation : Abstract: Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, oft...
- Scaling Agentic Verifier for Competitive Coding : Abstract: Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt. Execution-based re-ranki...
- DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0) : Abstract: We present DementiaBank-Emotion, the first multi-rater emotion annotation corpus for Alzheimer's disease (AD) speech. Annotating 1,492 utterances from 108 speakers for Ekman's six basic emot...
- CoLT: Reasoning with Chain of Latent Tool Calls : Abstract: Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs), and latent reasoning methods have been proposed to accelerate the inefficie...
- Tokenization and Morphological Fidelity in Uralic NLP: A Cross-Lingual Evaluation : Abstract: Subword tokenization critically affects Natural Language Processing (NLP) performance, yet its behavior in morphologically rich and low-resource language families remains under-explored. Thi...
- DELTA: Deliberative Multi-Agent Reasoning with Reinforcement Learning for Multimodal Psychological Counseling : Abstract: Psychological counseling is a fundamentally multimodal cognitive process in which clinicians integrate verbal content with visual and vocal cues to infer clients' mental states and respond e...
- Expert Selections In MoE Models Reveal (Almost) As Much As Text : Abstract: We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expe...
- Abstraction Induces the Brain Alignment of Language and Speech Models : Abstract: Research has repeatedly demonstrated that intermediate hidden states extracted from large language models and speech audio models predict measured brain response to natural language stimuli....
- Likelihood-Based Reward Designs for General LLM Reasoning : Abstract: Fine-tuning large language models (LLMs) on reasoning benchmarks via reinforcement learning requires a specific reward function, often binary, for each benchmark. This comes with two potenti...
- Automatic Classification of Pedagogical Materials against CS Curriculum Guidelines : Abstract: Professional societies often publish curriculum guidelines to help programs align their content to international standards. In Computer Science, the primary standard is published by ACM and ...
- Generative Modeling of Neural Dynamics via Latent Stochastic Differential Equations : Abstract: We propose a probabilistic framework for developing computational models of biological neural systems. In this framework, physiological recordings are viewed as discrete-time partial observa...
- Coupled Integral PINN for Discontinuity : Abstract: Physics-Informed Neural Networks (PINNs) solve forward PDEs by minimizing residual losses from the governing equations with initial and boundary conditions, but they often struggle with disc...
- Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM : Abstract: We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying mat...
- Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry : Abstract: Recent advances in Symmetric Positive Definite (SPD) matrix learning show that Riemannian metrics are fundamental to effective SPD neural networks. Motivated by this, we revisit the geometry...
- ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs : Abstract: Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (Lo...
- P-Tensors: a General Formalism for Constructing Higher Order Message Passing Networks : Abstract: Several recent papers have proposed increasing the expressive power of graph neural networks by exploiting subgraphs or other topological structures. In parallel, researchers have investigat...
- Dictionary Learning under Symmetries via Group Representations : Abstract: The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we e...
- Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive : Abstract: Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on...
- Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games : Abstract: Reverse Kullback-Leibler (KL) divergence-based regularization with respect to a fixed reference policy is widely used in modern reinforcement learning to preserve the desired traits of the r...
- Information Shapes Koopman Representation : Abstract: The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional na...
- Learning Hidden Physics and System Parameters with Deep Operator Networks : Abstract: Discovering hidden physical laws and identifying governing system parameters from sparse observations are central challenges in computational science and engineering. Existing data-driven me...
- STAND: Self-Aware Precondition Induction for Interactive Task Learning : Abstract: In interactive task learning (ITL), AI agents learn new capabilities from limited human instruction provided during task execution. STAND is a new method of data-efficient rule precondition ...
- Scalable physical source-to-field inference with hypernetworks : Abstract: We present a generative model that amortises computation for the field and potential around e.g.~gravitational or electromagnetic sources. Exact numerical calculation has either computationa...
- Reinforced Attention Learning : Abstract: Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs...
- XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas : Abstract: Accurate risk stratification of precancerous polyps during routine colonoscopy screenings is essential for lowering the risk of developing colorectal cancer (CRC). However, assessment of low...
- Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model : Abstract: Setting the learning rate for a deep learning model is a critical part of successful training, yet choosing this hyperparameter is often done empirically with trial and error. In this work, ...
- Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates : Abstract: A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfa...
- Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging : Abstract: Retrieval-augmented generation (RAG) has become the backbone of grounding Large Language Models (LLMs), improving knowledge updates and reducing hallucinations. Recently, LLM-based retriever...
- Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding : Abstract: We propose a cross-attention Transformer for joint decoding of uplink OFDM signals received by multiple coordinated access points. A shared per-receiver encoder learns time-frequency structu...
- Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels : Abstract: Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to pre...
- Beyond Learning on Molecules by Weakly Supervising on Molecules : Abstract: Molecular representations are inherently task-dependent, yet most pre-trained molecular encoders are not. Task conditioning promises representations that reorganize based on task description...
- Causal explanations of outliers in systems with lagged time-dependencies : Abstract: Root-cause analysis in controlled time dependent systems poses a major challenge in applications. Especially energy systems are difficult to handle as they exhibit instantaneous as well as d...
- Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates : Abstract: Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is backdoor attacks, in which adversaries embe...
- Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models : Abstract: The increasingly crowded radio frequency (RF) spectrum forces communication signals to coexist, creating heterogeneous interferers whose structure often departs from Gaussian models. Recover...
- Targeted Synthetic Control Method : Abstract: The synthetic control method (SCM) estimates causal effects in panel data with a single-treated unit by constructing a counterfactual outcome as a weighted combination of untreated control u...
- Focus-LIME: Surgical Interpretation of Long-Context Large Language Models via Proxy-Based Neighborhood Selection : Abstract: As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debug...
- A principled framework for uncertainty decomposition in TabPFN : Abstract: TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no me...
- Rethinking Weight Tying: Pseudo-Inverse Tying for Stable LM Training and Updates : Abstract: Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, weight sharing does not...
- Universality of General Spiked Tensor Models : Abstract: We study the rank-one spiked tensor model in the high-dimensional regime, where the noise entries are independent and identically distributed with zero mean, unit variance, and finite fourth...
- Bayesian PINNs for uncertainty-aware inverse problems (BPINN-IP) : Abstract: The main contribution of this paper is to develop a hierarchical Bayesian formulation of PINNs for linear inverse problems, which is called BPINN-IP. The proposed methodology extends PINN to...
- Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference : Abstract: A/B testing on platforms often faces challenges from network interference, where a unit's outcome depends not only on its own treatment but also on the treatments of its network neighbors. T...
- Machine Learning-Driven Crystal System Prediction for Perovskites Using Augmented X-ray Diffraction Data : Abstract: Prediction of crystal system from X-ray diffraction (XRD) spectra is a critical task in materials science, particularly for perovskite materials which are known for their diverse application...
- HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation : Abstract: Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robus...
- Optimal Rates for Feasible Payoff Set Estimation in Games : Abstract: We study a setting in which two players play a (possibly approximate) Nash equilibrium of a bimatrix game, while a learner observes only their actions and has no knowledge of the equilibrium...
- Anytime-Valid Conformal Risk Control : Abstract: Prediction sets provide a means of quantifying the uncertainty in predictive tasks. Using held out calibration data, conformal prediction and risk control can produce prediction sets that ex...
- A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization : Abstract: In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse gr...
- Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation : Abstract: Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controlla...
- Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement : Abstract: Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models...
- Proxy Compression for Language Modeling : Abstract: Modern language models are trained almost exclusively on token sequences produced by a fixed tokenizer, an external lossless compressor often over UTF-8 byte sequences, thereby coupling the ...
- Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications : Abstract: The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking beha...
- Aortic Valve Disease Detection from PPG via Physiology-Informed Self-Supervised Learning : Abstract: Traditional diagnosis of aortic valve disease relies on echocardiography, but its cost and required expertise limit its use in large-scale early screening. Photoplethysmography (PPG) has eme...
- SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction : Abstract: Achieving highly accurate and real-time 3D occupancy prediction from cameras is a critical requirement for the safe and practical deployment of autonomous vehicles. While this shift to spars...
- Provable Target Sample Complexity Improvements as Pre-Trained Models Scale : Abstract: Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empiri...
- AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting : Abstract: Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often...
- The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment : Abstract: Safety risks of AI models have been widely studied at deployment time, such as jailbreak attacks that elicit harmful outputs. In contrast, safety risks emerging during training remain largel...
- Piece of CAKE: Adaptive Execution Engines via Microsecond-Scale Learning : Abstract: Low-level database operators often admit multiple physical implementations ("kernels") that are semantically equivalent but have vastly different performance characteristics depending on the...
- Maximin Relative Improvement: Fair Learning as a Bargaining Problem : Abstract: When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. Thi...
- Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking : Abstract: Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains const...
- Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits : Abstract: Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair e...
- ZKBoost: Zero-Knowledge Verifiable Training for XGBoost : Abstract: Gradient boosted decision trees, particularly XGBoost, are among the most effective methods for tabular data. As deployment in sensitive settings increases, cryptographic guarantees of model...
- Efficient Subgroup Analysis via Optimal Trees with Global Parameter Fusion : Abstract: Identifying and making statistical inferences on differential treatment effects (commonly known as subgroup analysis in clinical research) is central to precision health. Subgroup analysis a...
- Thermodynamic assessment of machine learning models for solid-state synthesis prediction : Abstract: Machine learning models have recently emerged to predict whether hypothetical solid-state materials can be synthesized. These models aim to circumvent direct first-principles modeling of sol...
- A Multi-Modal Foundational Model for Wireless Communication and Sensing : Abstract: Artificial intelligence is a key enabler for next-generation wireless communication and sensing. Yet, today's learning-based wireless techniques do not generalize well: most models are task-...
- Functional Stochastic Localization : Abstract: Eldan's stochastic localization is a probabilistic construction that has proved instrumental to modern breakthroughs in high-dimensional geometry and the design of sampling algorithms. Motiv...
- Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits : Abstract: We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $ν$-ary tree ($ν\ge 2$) and w...
- Learning Multi-type heterogeneous interacting particle systems : Abstract: We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-traje...
- Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks : Abstract: In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the availab...
- C-IDS: Solving Contextual POMDP via Information-Directed Objective : Abstract: We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces dist...
- SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild? : Abstract: Spatial reasoning is a fundamental aspect of human cognition, yet it remains a major challenge for contemporary vision-language models (VLMs). Prior work largely relied on synthetic or LLM-g...
- A Hitchhiker's Guide to Poisson Gradient Estimation : Abstract: Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address ...
- Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs : Abstract: Machine learning models trained on real-world data often inherit and amplify biases against certain social groups, raising urgent concerns about their deployment at scale. While numerous bia...
- Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations : Abstract: Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regulari...
- Prenatal Stress Detection from Electrocardiography Using Self-Supervised Deep Learning: Development and External Validation : Abstract: Prenatal psychological stress affects 15-25% of pregnancies and increases risks of preterm birth, low birth weight, and adverse neurodevelopmental outcomes. Current screening relies on subje...
- PENGUIN: General Vital Sign Reconstruction from PPG with Flow Matching State Space Model : Abstract: Photoplethysmography (PPG) plays a crucial role in continuous cardiovascular health monitoring as a non-invasive and cost-effective modality. However, PPG signals are susceptible to motion a...
- The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving : Abstract: We present the Turing Synthetic Radar Dataset, a comprehensive dataset to serve both as a benchmark for radar pulse deinterleaving research and as an enabler of new research methods. The dat...
- Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging : Abstract: Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees. While learning-based approaches such as deep unrolling and meta-learni...
- Online unsupervised Hebbian learning in deep photonic neuromorphic networks : Abstract: While software implementations of neural networks have driven significant advances in computation, the von Neumann architecture imposes fundamental limitations on speed and energy efficiency...
- Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism : Abstract: Large language models have transformed many applications but remain expensive to train. Sparse Mixture of Experts (MoE) addresses this through conditional computation, with Expert Parallel (...
- The Key to State Reduction in Linear Attention: A Rank-based Perspective : Abstract: Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the state of trained linear attention mod...
- Robust Generalizable Heterogeneous Legal Link Prediction : Abstract: Recent work has applied link prediction to large heterogeneous legal citation networks \new{with rich meta-features}. We find that this approach can be improved by including edge dropout and...
- Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning : Abstract: We introduce Afferent Learning, a framework that produces Computational Afferent Traces (CATs) as adaptive, internal risk signals for damage-avoidance learning. Inspired by biological system...
- Maximum-Volume Nonnegative Matrix Factorization : Abstract: Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $...
- From independent patches to coordinated attention: Controlling information flow in vision transformers : Abstract: We make the information transmitted by attention an explicit, measurable quantity in vision transformers. By inserting variational information bottlenecks on all attention-mediated writes to...
- Legendre Memory Unit with A Multi-Slice Compensation Model for Short-Term Wind Speed Forecasting Based on Wind Farm Cluster Data : Abstract: With more wind farms clustered for integration, the short-term wind speed prediction of such wind farm clusters is critical for normal operation of power systems. This paper focuses on achie...
- Dynamical Regimes of Multimodal Diffusion Models : Abstract: Diffusion based generative models have achieved unprecedented fidelity in synthesizing high dimensional data, yet the theoretical mechanisms governing multimodal generation remain poorly und...
- Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification : Abstract: In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver op...
- Generative Modeling via Drifting : Abstract: Generative modeling can be formulated as learning a mapping f such that its pushforward distribution matches the data distribution. The pushforward behavior can be carried out iteratively at...
- NeuroCanvas: VLLM-Powered Robust Seizure Detection by Reformulating Multichannel EEG as Image : Abstract: Accurate and timely seizure detection from Electroencephalography (EEG) is critical for clinical intervention, yet manual review of long-term recordings is labor-intensive. Recent efforts to...
- Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations : Abstract: Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setti...
- A Dual-TransUNet Deep Learning Framework for Multi-Source Precipitation Merging and Improving Seasonal and Extreme Estimates : Abstract: Multi-source precipitation products (MSPs) from satellite retrievals and reanalysis are widely used for hydroclimatic monitoring, yet spatially heterogeneous biases and limited skill for ext...
- Decomposing Query-Key Feature Interactions Using Contrastive Covariances : Abstract: Despite the central role of attention heads in Transformers, we lack tools to understand why a model attends to a particular token. To address this, we study the query-key (QK) space -- the ...
- Rationality Measurement and Theory for Reinforcement Learning Agents : Abstract: This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in depl...
- DMFlow: Disordered Materials Generation by Flow Matching : Abstract: The design of materials with tailored properties is crucial for technological progress. However, most deep generative models focus exclusively on perfectly ordered crystals, neglecting the i...
- Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods : Abstract: Cuffless blood pressure screening based on easily acquired photoplethysmography (PPG) signals offers a practical pathway toward scalable cardiovascular health assessment. Despite rapid progr...
- Bounded-Abstention Multi-horizon Time-series Forecasting : Abstract: Multi-horizon time-series forecasting involves simultaneously making predictions for a consecutive sequence of subsequent time steps. This task arises in many application domains, such as he...
- Towards Understanding and Avoiding Limitations of Convolutions on Graphs : Abstract: While message-passing neural networks (MPNNs) have shown promising results, their real-world impact remains limited. Although various limitations have been identified, their theoretical foun...
- Static and auto-regressive neural emulation of phytoplankton biomass dynamics from physical predictors in the global ocean : Abstract: Phytoplankton is the basis of marine food webs, driving both ecological processes and global biogeochemical cycles. Despite their ecological and climatic significance, accurately simulating ...
- REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency : Abstract: Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student by aligning their predictive distributions. However, conventional KD formulations - typically ...
- Generalized Schr\"odinger Bridge on Graphs : Abstract: Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, exist...
- SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF : Abstract: Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. PPO performs well empirically but has a heuristic motivation and handles the KL-d...
- MTS-JEPA: Multi-Resolution Joint-Embedding Predictive Architecture for Time-Series Anomaly Prediction : Abstract: Multivariate time series underpin modern critical infrastructure, making the prediction of anomalies a vital necessity for proactive risk mitigation. While Joint-Embedding Predictive Archite...
- RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness : Abstract: Protein inverse folding, the task of predicting amino acid sequences for desired structures, is pivotal for de novo protein design. However, existing GNN-based methods typically suffer from ...
- QUATRO: Query-Adaptive Trust Region Policy Optimization for LLM Fine-tuning : Abstract: GRPO-style reinforcement learning (RL)-based LLM fine-tuning algorithms have recently gained popularity. Relying on heuristic trust-region approximations, however, they can lead to brittle o...
- Resilient Load Forecasting under Climate Change: Adaptive Conditional Neural Processes for Few-Shot Extreme Load Forecasting : Abstract: Extreme weather can substantially change electricity consumption behavior, causing load curves to exhibit sharp spikes and pronounced volatility. If forecasts are inaccurate during those per...
- Jacobian Regularization Stabilizes Long-Term Integration of Neural Differential Equations : Abstract: Hybrid models and Neural Differential Equations (NDE) are getting increasingly important for the modeling of physical systems, however they often encounter stability and accuracy issues duri...
- Stochastic Decision Horizons for Constrained Reinforcement Learning : Abstract: Constrained Markov decision processes (CMDPs) provide a principled model for handling constraints, such as safety and other auxiliary objectives, in reinforcement learning. The common approa...
- Probabilistic Label Spreading: Efficient and Consistent Estimation of Soft Labels with Epistemic Uncertainty on Graphs : Abstract: Safe artificial intelligence for perception tasks remains a major challenge, partly due to the lack of data with high-quality labels. Annotations themselves are subject to aleatoric and epis...
- Finding Structure in Continual Learning : Abstract: Learning from a stream of tasks usually pits plasticity against stability: acquiring new knowledge often causes catastrophic forgetting of past information. Most methods address this by summ...
- Gradient Flow Through Diagram Expansions: Learning Regimes and Explicit Solutions : Abstract: We develop a general mathematical framework to analyze scaling regimes and derive explicit analytic solutions for gradient flow (GF) in large learning problems. Our key innovation is a forma...
- Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning : Abstract: The Web is naturally heterogeneous with user devices, geographic regions, browsing patterns, and contexts all leading to highly diverse, unique datasets. Federated Learning (FL) is an import...
- Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning : Abstract: Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often...
- Hand Gesture Recognition from Doppler Radar Signals Using Echo State Networks : Abstract: Hand gesture recognition (HGR) is a fundamental technology in human computer interaction (HCI).In particular, HGR based on Doppler radar signals is suited for in-vehicle interfaces and robot...
- MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems : Abstract: LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, w...
- Separation-Utility Pareto Frontier: An Information-Theoretic Characterization : Abstract: We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outc...
- Theory of Speciation Transitions in Diffusion Models with General Class Structure : Abstract: Diffusion Models generate data by reversing a stochastic diffusion process, progressively transforming noise into structured samples drawn from a target distribution. Recent theoretical work...
- On the use of LLMs to generate a dataset of Neural Networks : Abstract: Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for...
- Reducing the labeling burden in time-series mapping using Common Ground: a semi-automated approach to tracking changes in land cover and species over time : Abstract: Reliable classification of Earth Observation data depends on consistent, up-to-date reference labels. However, collecting new labelled data at each time step remains expensive and logistical...
- Multi-scale hypergraph meets LLMs: Aligning large language models for time series analysis : Abstract: Recently, there has been great success in leveraging pre-trained large language models (LLMs) for time series analysis. The core idea lies in effectively aligning the modality between natura...
- EXaMCaP: Subset Selection with Entropy Gain Maximization for Probing Capability Gains of Large Chart Understanding Training Sets : Abstract: Recent works focus on synthesizing Chart Understanding (ChartU) training sets to inject advanced chart knowledge into Multimodal Large Language Models (MLLMs), where the sufficiency of the k...
- Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation : Abstract: Decentralized learning (DL) enables collaborative machine learning (ML) without a central server, making it suitable for settings where training data cannot be centrally hosted. We introduce...
- MirrorLA: Reflecting Feature Map for Vision Linear Attention : Abstract: Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We ident...
- RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models : Abstract: Evaluating fairness under domain shift is challenging because scalar metrics often obscure exactly where and how disparities arise. We introduce \textit{RISE} (Residual Inspection through So...
- Convolution Operator Network for Forward and Inverse Problems (FI-Conv): Application to Plasma Turbulence Simulations : Abstract: We propose the Convolutional Operator Network for Forward and Inverse Problems (FI-Conv), a framework capable of predicting system evolution and estimating parameters in complex spatio-tempo...
- Multi-Integration of Labels across Categories for Component Identification (MILCCI) : Abstract: Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a...
- From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution : Abstract: In partial multi-label learning (PML), the true labels are unobserved, which makes label disambiguation important but difficult. A key challenge is that ambiguous candidate labels can propag...
- Training A Foundation Model to Represent Graphs as Vectors : Abstract: This paper aims to train a graph foundation model that is able to represent any graph as a vector preserving structural and semantic information useful for downstream graph-level tasks such ...
- Cascading Robustness Verification: Toward Efficient Model-Agnostic Certification : Abstract: Certifying neural network robustness against adversarial examples is challenging, as formal guarantees often require solving non-convex problems. Hence, incomplete verifiers are widely used ...
- From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction : Abstract: Reconstructing high-dimensional spatiotemporal fields from sparse point-sensor measurements is a central challenge in learning parametric PDE dynamics. Existing approaches often struggle to ...
- LORE: Jointly Learning the Intrinsic Dimensionality and Relative Similarity Structure From Ordinal Data : Abstract: Learning the intrinsic dimensionality of subjective perceptual spaces such as taste, smell, or aesthetics from ordinal data is a challenging problem. We introduce LORE (Low Rank Ordinal Embe...
- Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving : Abstract: Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality ...
- BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models : Abstract: Large language model (LLM) inference is often bounded by memory footprint and memory bandwidth in resource-constrained deployments, making quantization a fundamental technique for efficient ...
- Training Data Efficiency in Multimodal Process Reward Models : Abstract: Multimodal Process Reward Models (MPRMs) are central to step-level supervision for visual reasoning in MLLMs. Training MPRMs typically requires large-scale Monte Carlo (MC)-annotated corpora...
- Generative Neural Operators through Diffusion Last Layer : Abstract: Neural operators have emerged as a powerful paradigm for learning discretization-invariant function-to-function mappings in scientific computing. However, many practical systems are inherent...
- Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting : Abstract: Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of...
- Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors : Abstract: The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previ...
- Learning to Reason in 13 Parameters : Abstract: Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventiona...
- Turning mechanistic models into forecasters by using machine learning : Abstract: The equations of complex dynamical systems may not be identified by expert knowledge, especially if the underlying mechanisms are unknown. Data-driven discovery methods address this challeng...
- Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery : Abstract: Semi-dual neural optimal transport learns a transport map via a max-min objective, yet training can converge to incorrect or degenerate maps. We fully characterize these spurious solutions i...
- Supervised Learning as Lossy Compression: Characterizing Generalization and Sample Complexity via Finite Blocklength Analysis : Abstract: This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finit...
- Rethinking Perplexity: Revealing the Impact of Input Length on Perplexity Evaluation in LLMs : Abstract: Perplexity is a widely adopted metric for assessing the predictive quality of large language models (LLMs) and often serves as a reference metric for downstream evaluations. However, recent ...
- CoRe: Context-Robust Remasking for Diffusion Language Models : Abstract: Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full c...
- Federated Concept-Based Models: Interpretable models with distributed supervision : Abstract: Concept-based models (CMs) enhance interpretability in deep learning by grounding predictions in human-understandable concepts. However, concept annotations are expensive to obtain and rarel...
- A Probabilistic Framework for Solving High-Frequency Helmholtz Equations via Diffusion Models : Abstract: Deterministic neural operators perform well on many PDEs but can struggle with the approximation of high-frequency wave phenomena, where strong input-to-output sensitivity makes operator lea...
- Stroke Lesions as a Rosetta Stone for Language Model Interpretability : Abstract: Large language models (LLMs) have achieved remarkable capabilities, yet methods to verify which model components are truly necessary for language function remain limited. Current interpretab...
- Agentic AI-Empowered Dynamic Survey Framework : Abstract: Survey papers play a central role in synthesizing and organizing scientific knowledge, yet they are increasingly strained by the rapid growth of research output. As new work continues to app...
- An Empirical Survey and Benchmark of Learned Distance Indexes for Road Networks : Abstract: The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Di...
- SEIS: Subspace-based Equivariance and Invariance Scores for Neural Representations : Abstract: Understanding how neural representations respond to geometric transformations is essential for evaluating whether learned features preserve meaningful spatial structure. Existing approaches ...
- Partition Trees: Conditional Density Estimation over General Outcome Spaces : Abstract: We propose Partition Trees, a tree-based framework for conditional density estimation over general outcome spaces, supporting both continuous and categorical variables within a unified formu...
- DADP: Domain Adaptive Diffusion Policy : Abstract: Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through dom...
- The Illusion of Generalization: Re-examining Tabular Language Model Evaluation : Abstract: Tabular Language Models (TLMs) have been claimed to achieve emergent generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utiliz...
- A Consensus-Bayesian Framework for Detecting Malicious Activity in Enterprise Directory Access Graphs : Abstract: This work presents a consensus-based Bayesian framework to detect malicious user behavior in enterprise directory access graphs. By modeling directories as topics and users as agents within ...
- Group Contrastive Learning for Weakly Paired Multimodal Data : Abstract: We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturba...
- eCP: Informative uncertainty quantification via Equivariantized Conformal Prediction with pre-trained models : Abstract: We study the effect of group symmetrization of pre-trained models on conformal prediction (CP), a post-hoc, distribution-free, finite-sample method of uncertainty quantification that offers ...
- Non-linear PCA via Evolution Strategies: a Novel Objective Function : Abstract: Principal Component Analysis (PCA) is a powerful and popular dimensionality reduction technique. However, due to its linear nature, it often fails to capture the complex underlying structure...
- Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study : Abstract: The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in c...
- Representation Geometry as a Diagnostic for Out-of-Distribution Robustness : Abstract: Robust generalization under distribution shift remains difficult to monitor and optimize in the absence of target-domain labels, as models with similar in-distribution accuracy can exhibit m...
- Grables: Tabular Learning Beyond Independent Rows : Abstract: Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where l...
- Autonomous AI Agents for Real-Time Affordable Housing Site Selection: Multi-Objective Reinforcement Learning Under Regulatory Constraints : Abstract: Affordable housing shortages affect billions, while land scarcity and regulations make site selection slow. We present AURA (Autonomous Urban Resource Allocator), a hierarchical multi-agent ...
- Online Vector Quantized Attention : Abstract: Standard sequence mixing layers used in language models struggle to balance efficiency and performance. Self-attention performs well on long context tasks but has expensive quadratic compute...
- Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer : Abstract: This paper tackles a critical bottleneck in Super-Structure-based divide-and-conquer causal discovery: the high computational cost of constructing accurate Super-Structures--particularly whe...
- Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking : Abstract: This paper investigates the forecasting performance of Echo State Networks (ESNs) for univariate time series forecasting using a subset of the M4 Forecasting Competition dataset. Focusing on...
- The Role of Target Update Frequencies in Q-Learning : Abstract: The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning. However, their selection remains poorly understood and is often treated merely as another...
- NeuroPareto: Calibrated Acquisition for Costly Many-Goal Search in Vast Parameter Spaces : Abstract: The pursuit of optimal trade-offs in high-dimensional search spaces under stringent computational constraints poses a fundamental challenge for contemporary multi-objective optimization. We ...
- "I'm happy even though it's not real": GenAI Photo Editing as a Remembering Experience : Abstract: Generative Artificial Intelligence (GenAI) is increasingly integrated into photo applications on personal devices, making editing photographs easier than ever while potentially influencing t...
- ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation : Abstract: The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches ...
- Zenith: Scaling up Ranking Models for Billion-scale Livestreaming Recommendation : Abstract: Accurately capturing feature interactions is essential in recommender systems, and recent trends show that scaling up model capacity could be a key driver for next-level predictive performan...
- CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification : Abstract: Medical image classification is a core task in computer-aided diagnosis (CAD), playing a pivotal role in early disease detection, treatment planning, and patient prognosis assessment. In oph...
- Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding : Abstract: We introduce Stingy Context, a hierarchical tree-based compression scheme achieving 18:1 reduction in LLM context for auto-coding tasks. Using our TREEFRAG exploit decomposition, we reduce a...
- Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation : Abstract: Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggl...
- Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints : Abstract: Pure exploration in bandits formalises multiple real-world problems, such as tuning hyper-parameters or conducting user studies to test a set of items, where different safety, resource, and ...
- Deep Multimodal Learning with Missing Modality: A Survey : Abstract: During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performa...
- Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias : Abstract: With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, ...
- Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic : Abstract: Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, whi...
- MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning : Abstract: Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where...
- Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving : Abstract: Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Larg...
- DEEPMED: Building a Medical DeepResearch Agent via Multi-hop Med-Search Data and Turn-Controlled Agentic Training & Inference : Abstract: Medical reasoning models remain constrained by parametric knowledge and are thus susceptible to forgetting and hallucinations. DeepResearch (DR) models ground outputs in verifiable evidence ...
- EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines : Abstract: While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore ...
- CastMind: An Interaction-Driven Agentic Reasoning Framework for Cognition-Inspired Time Series Forecasting : Abstract: Time series forecasting plays a crucial role in decision-making across many real-world applications. Despite substantial progress, most existing methods still treat forecasting as a static, ...
- Protein Autoregressive Modeling via Multiscale Structure Generation : Abstract: We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchi...
- Contrastive Continual Learning for Model Adaptability in Internet of Things : Abstract: Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements ca...
- Rethinking the Trust Region in LLM Reinforcement Learning : Abstract: Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite...
- Multi-layer Cross-Attention is Provably Optimal for Multi-modal In-context Learning : Abstract: Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively...
- CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation : Abstract: Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite f...
- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity : Abstract: Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques t...
- From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures : Abstract: Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream...
- El Agente Quntur: A research collaborator agent for quantum chemistry : Abstract: Quantum chemistry is a foundational enabling tool for the fields of chemistry, materials science, computational biology and others. Despite of its power, the practical application of quantum...
- El Agente Estructural: An Artificially Intelligent Molecular Editor : Abstract: We present El Agente Estructural, a multimodal, natural-language-driven geometry-generation and manipulation agent for autonomous chemistry and molecular modelling. Unlike molecular generati...
- It's not a Lottery, it's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task : Abstract: Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gra...
- Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning : Abstract: Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We p...
- Toward Reliable and Explainable Nail Disease Classification: Leveraging Adversarial Training and Grad-CAM Visualization : Abstract: Human nail diseases are gradually observed over all age groups, especially among older individuals, often going ignored until they become severe. Early detection and accurate diagnosis of su...
- SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization : Abstract: True self-evolution requires agents to act as lifelong learners that internalize novel experiences to solve future problems. However, rigorously measuring this foundational capability is hin...
- Beyond Rewards in Reinforcement Learning for Cyber Defence : Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained i...
- Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging : Abstract: The rapid proliferation of generative 3D models has created a critical bottleneck in animation pipelines: rigging. Existing automated methods are fundamentally limited by their approach to s...
- Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation : Abstract: While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity...
- Billion-Scale Graph Foundation Models : Abstract: Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending thi...
- Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty : Abstract: Multi-agent systems are increasingly equipped with heterogeneous multimodal sensors, enabling richer perception but introducing modality-specific and agent-dependent uncertainty. Existing mu...
- When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond? : Abstract: Large language models (LLMs) rarely admit uncertainty, often producing fluent but misleading answers, rather than abstaining (i.e., refusing to answer). This weakness is even evident in temp...
- Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach : Abstract: An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as Adversarial Machine Learning (AML). In this ...
- Exploiting contextual information to improve stance detection in informal political discourse with LLMs : Abstract: This study investigates the use of Large Language Models (LLMs) for political stance detection in informal online discourse, where language is often sarcastic, ambiguous, and context-depende...
- Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases : Abstract: Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains underexplored. We present a two-phase evaluatio...
- From Data to Behavior: Predicting Unintended Model Behaviors Before Training : Abstract: Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content. Existing methods struggle to detect such risks...
- Supporting software engineering tasks with agentic AI: Demonstration on document retrieval and test scenario generation : Abstract: The introduction of large language models ignited great retooling and rethinking of the software development models. The ensuing response of software engineering research yielded a massive b...
- Identifying Intervenable and Interpretable Features via Orthogonality Regularization : Abstract: With recent progress on fine-tuning language models around a fixed sparse autoencoder, we disentangle the decoder matrix into almost orthogonal features. This reduces interference and superp...
- Adaptive Prompt Elicitation for Text-to-Image Generation : Abstract: Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation ...
- SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation : Abstract: We present a visual-context image retrieval-augmented generation (ImageRAG) assisted AI agent for automatic target recognition (ATR) of synthetic aperture radar (SAR). SAR is a remote sensin...
- Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention : Abstract: Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to b...
- DRMOT: A Dataset and Framework for RGBD Referring Multi-Object Tracking : Abstract: Referring Multi-Object Tracking (RMOT) aims to track specific targets based on language descriptions and is vital for interactive AI systems such as robotics and autonomous driving. However,...
- Audio ControlNet for Fine-Grained Audio Generation and Editing : Abstract: We study the fine-grained text-to-audio (T2A) generation task. While recent models can synthesize high-quality audio from text descriptions, they often lack precise control over attributes s...
- Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting : Abstract: Time series forecasting in real-world applications requires both high predictive accuracy and interpretable uncertainty quantification. Traditional point prediction methods often fail to cap...
- Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility : Abstract: Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science, yet their ability to reproduce patterns of susceptibility to misinformation ...
- Delving into Muon and Beyond: Deep Analysis and Extensions : Abstract: The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechan...
- Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design : Abstract: Reinforcement learning has been widely applied to diffusion and flow models for visual tasks such as text-to-image generation. However, these tasks remain challenging because diffusion model...
- Towards Structured, State-Aware, and Execution-Grounded Reasoning for Software Engineering Agents : Abstract: Software Engineering (SE) agents have shown promising abilities in supporting various SE tasks. Current SE agents remain fundamentally reactive, making decisions mainly based on conversation...
- A Human-Centered Privacy Approach (HCP) to AI : Abstract: As the paradigm of Human-Centered AI (HCAI) gains prominence, its benefits to society are accompanied by significant ethical concerns, one of which is the protection of individual privacy. T...
- RexBERT: Context Specialized Bidirectional Encoders for E-commerce : Abstract: Encoder-only transformers remain indispensable in retrieval, classification, and ranking systems where latency, stability, and cost are paramount. Most general purpose encoders, however, are...
- VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration : Abstract: This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration. For the AVerImaTeC shared task, VILLAIN employ...
- Trust The Typical : Abstract: Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We argue for a fresh approach: robust safety com...
- Dual Mind World Model Inspired Network Digital Twin for Access Scheduling : Abstract: Emerging networked systems such as industrial IoT and real-time cyber-physical infrastructures demand intelligent scheduling strategies capable of adapting to dynamic traffic, deadlines, and...
- OmniRad: A Radiological Foundation Model for Multi-Task Medical Image Analysis : Abstract: Radiological analysis increasingly benefits from pretrained visual representations that can support heterogeneous downstream tasks across imaging modalities. In this work, we introduce OmniR...
- Continual Learning through Control Minimization : Abstract: Catastrophic forgetting remains a fundamental challenge for neural networks when tasks are trained sequentially. In this work, we reformulate continual learning as a control problem where le...
- LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding : Abstract: The proliferation of long-context large language models (LLMs) exposes a key bottleneck: the rapidly expanding key-value cache during decoding, which imposes heavy memory and latency costs. ...
- SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking : Abstract: Rapid urban expansion has fueled the growth of informal settlements in major cities of low- and middle-income countries, with Lahore and Karachi in Pakistan and Mumbai in India serving as pr...
- Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning : Abstract: Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another, typically on behalf of humans, in order to come to mutually acceptable ag...
- BrainVista: Modeling Naturalistic Brain Dynamics as Multimodal Next-Token Prediction : Abstract: Naturalistic fMRI characterizes the brain as a dynamic predictive engine driven by continuous sensory streams. However, modeling the causal forward evolution in realistic neural simulation i...
- Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish : Abstract: Constructing mechanistic models of neural circuits is a fundamental goal of neuroscience, yet verifying such models is limited by the lack of ground truth. To rigorously test model discovery...
- LLM-Empowered Cooperative Content Caching in Vehicular Fog Caching-Assisted Platoon Networks : Abstract: This letter proposes a novel three-tier content caching architecture for Vehicular Fog Caching (VFC)-assisted platoon, where the VFC is formed by the vehicles driving near the platoon. The s...
- Is Micro Domain-Adaptive Pre-Training Effective for Real-World Operations? Multi-Step Evaluation Reveals Potential and Bottlenecks : Abstract: When applying LLMs to real-world enterprise operations, LLMs need to handle proprietary knowledge in small domains of specific operations ($\textbf{micro domains}$). A previous study shows m...
- Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas : Abstract: People increasingly seek advice online from both human peers and large language model (LLM)-based chatbots. Such advice rarely involves identifying a single correct answer; instead, it typic...
- RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models : Abstract: Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under sta...
- Mixture of Masters: Sparse Chess Language Models with Player Routing : Abstract: Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-...
- No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data : Abstract: We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA ...
- SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing : Abstract: We present SPEAR, a multi-agent coordination framework for smart contract auditing that applies established MAS patterns in a realistic security analysis workflow. SPEAR models auditing as a...
- EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL : Abstract: Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to acquire increasingly complex reasoning and agentic behaviors. In this work, we propose two simple techniques to improv...
- Med-MMFL: A Multimodal Federated Learning Benchmark in Healthcare : Abstract: Federated learning (FL) enables collaborative model training across decentralized medical institutions while preserving data privacy. However, medical FL benchmarks remain scarce, with exist...
- History-Guided Iterative Visual Reasoning with Self-Correction : Abstract: Self-consistency methods are the core technique for improving the reasoning reliability of multimodal large language models (MLLMs). By generating multiple reasoning results through repeated...
- Performative Learning Theory : Abstract: Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole popul...
- Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts : Abstract: Large language models (LLMs) have demonstrated impressive capabilities across a wide range of natural language processing tasks. However, their outputs often exhibit social biases, raising f...
- LoRDO: Distributed Low-Rank Optimization with Infrequent Communication : Abstract: Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bot...
- Blockchain Federated Learning for Sustainable Retail: Reducing Waste through Collaborative Demand Forecasting : Abstract: Effective demand forecasting is crucial for reducing food waste. However, data privacy concerns often hinder collaboration among retailers, limiting the potential for improved predictive acc...
- Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture : Abstract: Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to...
- Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning : Abstract: Policy optimization methods like Group Relative Policy Optimization (GRPO) and its variants have achieved strong results on mathematical reasoning and code generation tasks. Despite extensiv...
- SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration : Abstract: Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across histo...
- Counterfactual Explanations for Hypergraph Neural Networks : Abstract: Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret, limiting their deployment in high-stakes settings...
- VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image : Abstract: 3D editing has emerged as a critical research area to provide users with flexible control over 3D assets. While current editing approaches predominantly focus on 3D Gaussian Splatting or mul...
- UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching : Abstract: Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that M...
- Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning : Abstract: Pre-trained vision-language models such as CLIP exhibit strong transferability, yet adapting them to downstream image classification tasks under limited annotation budgets remains challengin...
- Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner : Abstract: Large-scale vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization, but adapting them to downstream tasks typically requires costly labeled data. Existing unsuper...
- Efficient Equivariant High-Order Crystal Tensor Prediction via Cartesian Local-Environment Many-Body Coupling : Abstract: End-to-end prediction of high-order crystal tensor properties from atomic structures remains challenging: while spherical-harmonic equivariant models are expressive, their Clebsch-Gordan ten...
- DeFrame: Debiasing Large Language Models Against Framing Effects : Abstract: As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing c...
- Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement : Abstract: Large Vision-Language Models (LVLMs) have advanced rapidly by aligning visual patches with the text embedding space, but a fixed visual-token budget forces images to be resized to a uniform ...
- Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification : Abstract: Large language models (LLMs) are widely used as zero-shot and few-shot classifiers, where task behaviour is largely controlled through prompting. A growing number of works have observed that...
- ProxyWar: Dynamic Assessment of LLM Code Generation in Game Arenas : Abstract: Large language models (LLMs) have revolutionized automated code generation, yet the evaluation of their real-world effectiveness remains limited by static benchmarks and simplistic metrics. ...
- How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks : Abstract: Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as Role-Oriented Prompts (RoP) and Task-Oriented Pr...
- Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration : Abstract: Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orc...
- Contextual Drag: How Errors in the Context Affect LLM Reasoning : Abstract: Central to many self-improvement pipelines for large language models (LLMs) is the assumption that models can improve by reflecting on past mistakes. We study a phenomenon termed contextual ...
- Multi Objective Design Optimization of Non Pneumatic Passenger Car Tires Using Finite Element Modeling, Machine Learning, and Particle swarm Optimization and Bayesian Optimization Algorithms : Abstract: Non Pneumatic tires offer a promising alternative to pneumatic tires. However, their discontinuous spoke structures present challenges in stiffness tuning, durability, and high speed vibrati...
- SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization : Abstract: 4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation ...
- Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for enhancing reasoning in Large Language Models (LLMs). However, it frequently encounters challenge...
- From Dead Neurons to Deep Approximators: Deep Bernstein Networks as a Provable Alternative to Residual Layers : Abstract: Residual connections are the de facto standard for mitigating vanishing gradients, yet they impose structural constraints and fail to address the inherent inefficiencies of piecewise linear ...
- AppleVLM: End-to-end Autonomous Driving with Advanced Perception and Planning-Enhanced Vision-Language Models : Abstract: End-to-end autonomous driving has emerged as a promising paradigm integrating perception, decision-making, and control within a unified learning framework. Recently, Vision-Language Models (...
- ACIL: Active Class Incremental Learning for Image Classification : Abstract: Continual learning (or class incremental learning) is a realistic learning scenario for computer vision systems, where deep neural networks are trained on episodic data, and the data from pr...
- RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning : Abstract: Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to those of basic language models. In particul...
- OAT: Ordered Action Tokenization : Abstract: Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregre...
- Language Models Struggle to Use Representations Learned In-Context : Abstract: Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research...
- SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond trai...
- Enforcing Monotonic Progress in Legal Cross-Examination: Preventing Long-Horizon Stagnation in LLM-Based Inquiry : Abstract: Large language models (LLMs) exhibit impressive linguistic fluency but struggle to reliably complete long-horizon tasks under explicit procedural constraints. In legal cross-examination, pur...
- From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents : Abstract: The enhanced capabilities of LLM-based agents come with an emergency for model planning and tool-use abilities. Attributing to helpful-harmless trade-off from LLM alignment, agents typically...
- Natural Language Instructions for Scene-Responsive Human-in-the-Loop Motion Planning in Autonomous Driving using Vision-Language-Action Models : Abstract: Instruction-grounded driving, where passenger language guides trajectory planning, requires vehicles to understand intent before motion. However, most prior instruction-following planners re...
- HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating : Abstract: Event-based Action Recognition (EAR) has attracted significant attention due to the high temporal resolution and high dynamic range of event cameras. However, existing methods typically suff...
- Topology-Aware Revival for Efficient Sparse Training : Abstract: Static sparse training is a promising route to efficient learning by committing to a fixed mask pattern, yet the constrained structure reduces robustness. Early pruning decisions can lock th...
- Improving 2D Diffusion Models for 3D Medical Imaging with Inter-Slice Consistent Stochasticity : Abstract: 3D medical imaging is in high demand and essential for clinical diagnosis and scientific research. Currently, diffusion models (DMs) have become an effective tool for medical imaging reconst...
- Pruning for Generalization: A Transfer-Oriented Spatiotemporal Graph Framework : Abstract: Multivariate time series forecasting in graph-structured domains is critical for real-world applications, yet existing spatiotemporal models often suffer from performance degradation under d...
- MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments : Abstract: Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real-world scenarios. In this w...
- JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models : Abstract: Vision and language models (VLMs) are expected to analyse complex documents, such as those containing flowcharts, through a question-answering (QA) interface. The ability to recognise and in...
- KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning : Abstract: Heterogeneous multi-robot systems are increasingly deployed in long-horizon missions that require coordination among robots with diverse capabilities. However, existing planning approaches s...
- From Lemmas to Dependencies: What Signals Drive Light Verbs Classification? : Abstract: Light verb constructions (LVCs) are a challenging class of verbal multiword expressions, especially in Turkish, where rich morphology and productive complex predicates create minimal contras...
- Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems : Abstract: Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way tha...
- Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach : Abstract: Graph Foundation Models (GFMs) have achieved remarkable success in generalizing across diverse domains. However, they mainly focus on Text-Attributed Graphs (TAGs), leaving Multimodal-Attrib...
- Tinker Tales: Supporting Child-AI Collaboration through Co-Creative Storytelling with Educational Scaffolding : Abstract: Artificial intelligence (AI) is increasingly framed as a collaborative partner in creative activities, yet children's interactions with AI have largely been studied in AI-led instructional s...
- DMS2F-HAD: A Dual-branch Mamba-based Spatial-Spectral Fusion Network for Hyperspectral Anomaly Detection : Abstract: Hyperspectral anomaly detection (HAD) aims to identify rare and irregular targets in high-dimensional hyperspectral images (HSIs), which are often noisy and unlabelled data. Existing deep le...
- A computational account of dreaming: learning and memory consolidation : Abstract: A number of studies have concluded that dreaming is mostly caused by randomly arriving internal signals because "dream contents are random impulses", and argued that dream sleep is unlikely ...
- Structure-Informed Estimation for Pilot-Limited MIMO Channels via Tensor Decomposition : Abstract: Channel estimation in wideband multiple-input multiple-output (MIMO) systems faces fundamental pilot overhead limitations in high-dimensional beyond-5G and sixth-generation (6G) scenarios. T...
- Principles of Lipschitz continuity in neural networks : Abstract: Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advan...
- On the Credibility of Evaluating LLMs using Survey Questions : Abstract: Recent studies evaluate the value orientation of large language models (LLMs) using adapted social surveys, typically by prompting models with survey questions and comparing their responses ...
- PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models : Abstract: Relational Foundation Models (RFMs) facilitate data-driven decision-making by learning from complex multi-table databases. However, the diverse relational databases needed to train such mode...
- Understanding and Guiding Layer Placement in Parameter-Efficient Fine-Tuning of Large Language Models : Abstract: As large language models (LLMs) continue to grow, the cost of full-parameter fine-tuning has made parameter-efficient fine-tuning (PEFT) the default strategy for downstream adaptation. Const...
- PromptSplit: Revealing Prompt-Level Disagreement in Generative Models : Abstract: Prompt-guided generative AI models have rapidly expanded across vision and language domains, producing realistic and diverse outputs from textual inputs. The growing variety of such models, ...
- Rational ANOVA Networks : Abstract: Deep neural networks typically treat nonlinearities as fixed primitives (e.g., ReLU), limiting both interpretability and the granularity of control over the induced function class. While rec...
- When Chains of Thought Don't Matter: Causal Bypass in Large Language Models : Abstract: Chain-of-thought (CoT) prompting is widely assumed to expose a model's reasoning process and improve transparency. We attempted to enforce this assumption by penalizing unfaithful reasoning,...
- DeXposure-FM: A Time-series, Graph Foundation Model for Credit Exposures and Stability on Decentralized Financial Networks : Abstract: Credit exposure in Decentralized Finance (DeFi) is often implicit and token-mediated, creating a dense web of inter-protocol dependencies. Thus, a shock to one token may result in significan...
- Transformers perform adaptive partial pooling : Abstract: Because language is creative, any reasonable language model must generalize, deciding what to say in novel contexts by using information from similar contexts. But what about contexts that a...
- Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors : Abstract: The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confiden...
- Structural shifts in institutional participation and collaboration within the AI arXiv preprint research ecosystem : Abstract: The emergence of large language models (LLMs) represents a significant technological shift within the scientific ecosystem, particularly within the field of artificial intelligence (AI). Thi...
- Semantic Rate Distortion and Posterior Design: Compute Constraints, Multimodality, and Strategic Inference : Abstract: We study strategic Gaussian semantic compression under rate and compute constraints, where an encoder and decoder optimize distinct quadratic objectives. A latent Gaussian state generates a ...
- Linguistic Blind Spots in Clinical Decision Extraction : Abstract: Extracting medical decisions from clinical notes is a key step for clinical decision support and patient-facing care summaries. We study how the linguistic characteristics of clinical decisi...
- First-Principles AI finds crystallization of fractional quantum Hall liquids : Abstract: When does a fractional quantum Hall (FQH) liquid crystallize? Addressing this question requires a framework that treats fractionalization and crystallization on equal footing, especially in ...
- WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling : Abstract: Deep learning has revolutionized weather and climate modeling, yet the current landscape remains fragmented: highly specialized models are typically trained individually for distinct tasks. ...
- SpecMD: A Comprehensive Study On Speculative Expert Prefetching : Abstract: Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model's parameters is used during each inference. However, to translate this sparsity into ...
- Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science : Abstract: Tokens are discrete representations that allow modern deep learning to scale by transforming high-dimensional data into sequences that can be efficiently learned, generated, and generalized ...
- Entropy-Aware Structural Alignment for Zero-Shot Handwritten Chinese Character Recognition : Abstract: Zero-shot Handwritten Chinese Character Recognition (HCCR) aims to recognize unseen characters by leveraging radical-based semantic compositions. However, existing approaches often treat cha...
- HY3D-Bench: Generation of 3D Assets : Abstract: While recent advances in neural representations and generative models have revolutionized 3D content creation, the field remains constrained by significant data processing bottlenecks. To ad...
- GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression : Abstract: Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimat...
- All-Atom GPCR-Ligand Simulation via Residual Isometric Latent Flow : Abstract: G-protein-coupled receptors (GPCRs), primary targets for over one-third of approved therapeutics, rely on intricate conformational transitions to transduce signals. While Molecular Dynamics ...
- Byzantine Machine Learning: MultiKrum and an optimal notion of robustness : Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematica...
- Vision Transformers for Zero-Shot Clustering of Animal Images: A Comparative Benchmarking Study : Abstract: Manual labeling of animal images remains a significant bottleneck in ecological research, limiting the scale and efficiency of biodiversity monitoring efforts. This study investigates whethe...
- Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation : Abstract: Language-referred audio-visual segmentation (Ref-AVS) aims to segment target objects described by natural language by jointly reasoning over video, audio, and text. Beyond generating segment...
- Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight Detection : Abstract: Audio-visual video highlight detection aims to automatically identify the most salient moments in videos by leveraging both visual and auditory cues. However, existing models often underutil...
- Explainable Computer Vision Framework for Automated Pore Detection and Criticality Assessment in Additive Manufacturing : Abstract: Internal porosity remains a critical defect mode in additively manufactured components, compromising structural performance and limiting industrial adoption. Automated defect detection metho...
- PriorProbe: Recovering Individual-Level Priors for Personalizing Neural Networks in Facial Expression Recognition : Abstract: Incorporating individual-level cognitive priors offers an important route to personalizing neural networks, yet accurately eliciting such priors remains challenging: existing methods either ...
- DiGAN: Diffusion-Guided Attention Network for Early Alzheimer's Disease Detection : Abstract: Early diagnosis of Alzheimer's disease (AD) remains a major challenge due to the subtle and temporally irregular progression of structural brain changes in the prodromal stages. Existing dee...
- TruKAN: Towards More Efficient Kolmogorov-Arnold Networks Using Truncated Power Functions : Abstract: To address the trade-off between computational efficiency and adherence to Kolmogorov-Arnold Network (KAN) principles, we propose TruKAN, a new architecture based on the KAN structure and le...
- GOPO: Policy Optimization using Ranked Rewards : Abstract: Standard reinforcement learning from human feedback (RLHF) trains a reward model on pairwise preference data and then uses it for policy optimization. However, while reward models are optimi...
- Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra : Abstract: We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network i...
- Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models : Abstract: Emotion recognition from human speech is a critical enabler for socially aware conversational AI. However, while most prior work frames emotion recognition as a categorical classification pr...
- Understanding the Impact of Differentially Private Training on Memorization of Long-Tailed Data : Abstract: Recent research shows that modern deep learning models achieve high predictive accuracy partly by memorizing individual training samples. Such memorization raises serious privacy concerns, m...
- Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts : Abstract: The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multi...
- PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG : Abstract: Transforming scientific papers into multimodal presentation content is essential for research dissemination but remains labor intensive. Existing automated solutions typically treat each for...
- Perceptions of AI-CBT: Trust and Barriers in Chinese Postgrads : Abstract: The mental well-being of graduate students is an increasing concern, yet the adoption of scalable support remains uneven. Artificial intelligence-powered cognitive behavioral therapy chatbot...
- WebAccessVL: Making an Accessible Web via Violation-Conditioned VLM : Abstract: We present a vision-language model (VLM) that automatically edits website HTML to address Web Content Accessibility Guidelines 2 (WCAG2) violations. We formulate this as a supervised image-c...
- HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions : Abstract: The "AI Scientist" paradigm is transforming scientific research by automating key stages of the research process, from idea generation to scholarly writing. This shift is expected to acceler...
- Fluid Representations in Reasoning Models : Abstract: Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allo...
- Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing : Abstract: Open-ended self-improving agents can autonomously modify their own structural designs to advance their capabilities and overcome the limits of pre-defined architectures, thus reducing relian...
- Are AI Capabilities Increasing Exponentially? A Competing Hypothesis : Abstract: Rapidly increasing AI capabilities have substantial real-world consequences, ranging from AI safety concerns to labor market consequences. The Model Evaluation & Threat Research (METR) repor...
- Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents : Abstract: Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR ana...
- WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning : Abstract: Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, ...
- Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration : Abstract: For the past decade, the trajectory of generative artificial intelligence (AI) has been dominated by a model-centric paradigm driven by scaling laws. Despite significant leaps in visual fide...
- From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums : Abstract: While Generative AI (GenAI) systems draw users away from (Q&A) forums, they also depend on the very data those forums produce to improve their performance. Addressing this paradox, we propos...
- ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control : Abstract: Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agen...
- Digital Twins & ZeroConf AI: Structuring Automated Intelligent Pipelines for Industrial Applications : Abstract: The increasing complexity of Cyber-Physical Systems (CPS), particularly in the industrial domain, has amplified the challenges associated with the effective integration of Artificial Intelli...
- From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents : Abstract: Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators' intenti...
- Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning : Abstract: Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire intera...
- Empirical-MCTS: Continuous Agent Evolution via Dual-Experience Monte Carlo Tree Search : Abstract: Inference-time scaling strategies, particularly Monte Carlo Tree Search (MCTS), have significantly enhanced the reasoning capabilities of Large Language Models (LLMs). However, current appro...
- InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons : Abstract: Imitation learning has shown success in many tasks by learning from expert demonstrations. However, most existing work relies on large-scale demonstrations from technical professionals and c...
- Steering LLMs via Scalable Interactive Oversight : Abstract: As Large Language Models increasingly automate complex, long-horizon tasks such as \emph{vibe coding}, a supervision gap has emerged. While models excel at execution, users often struggle to...
- OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows : Abstract: Data incompleteness severely impedes the reliability of multimodal systems. Existing reconstruction methods face distinct bottlenecks: conventional parametric/generative models are prone to ...
- Interfaze: The Future of AI is built on Task-Specific Small Models : Abstract: We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transfor...
- Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL : Abstract: Large language models (LLMs) achieve strong performance when all task-relevant information is available upfront, as in static prediction and instruction-following problems. However, many rea...
- Axiomatic Foundations of Counterfactual Explanations : Abstract: Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They ...
- When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making : Abstract: Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern AI systems increasingly operate within hum...
- Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning : Abstract: As Large Reasoning Models (LRMs) are increasingly deployed, auditing their chain-of-thought (CoT) traces for safety becomes critical. Recent work has reported that monitorability--the degree...
- Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure : Abstract: Test-time computation has become a primary driver of progress in large language model (LLM) reasoning, but it is increasingly bottlenecked by expensive verification. In many reasoning system...
- Active Epistemic Control for Query-Efficient Verified Planning : Abstract: Planning in interactive environments is challenging under partial observability: task-critical preconditions (e.g., object locations or container states) may be unknown at decision time, yet...
- AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent : Abstract: While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and err...
- Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation : Abstract: Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineeri...
- Knowledge Model Prompting Increases LLM Performance on Planning Tasks : Abstract: Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques have been developed to assist with LLM reasoning, notably Chain-of-Thought (CoT)...
Research Sources: 466 | Generated: 2/5/2026
