AI RESEARCH PAPERS & ACADEMIC SOURCES
- EUGens: Efficient, Unified, and General Dense Layers : Abstract: Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introdu...
- Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches : Abstract: 3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc. This paper proposes a novel deep-learni...
- Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective : Abstract: Video contrastive learning (V-CL) has emerged as a popular framework for unsupervised video representation learning, demonstrating strong results in tasks such as action classification and d...
- Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation : Abstract: Text-to-image diffusion models, such as Stable Diffusion, generate highly realistic images from text descriptions. However, the generation of certain content at such high quality raises conc...
- 3D Object Detection for Autonomous Driving: A Survey : Abstract: Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of perception stack...
- Orientation-Robust Latent Motion Trajectory Learning for Annotation-free Cardiac Phase Detection in Fetal Echocardiography : Abstract: Fetal echocardiography is essential for detecting congenital heart disease (CHD), facilitating pregnancy management, optimized delivery planning, and timely postnatal interventions. Among st...
- Think Proprioceptively: Embodied Visual Reasoning for VLA Manipulation : Abstract: Vision-language-action (VLA) models typically inject proprioception only as a late conditioning signal, which prevents robot state from shaping instruction understanding and from influencing...
- MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping : Abstract: Vision-based models for robotic grasping automate critical, repetitive, and draining industrial tasks. Existing approaches are typically limited in two ways: they either target a single grip...
- AS-Mamba: Asymmetric Self-Guided Mamba Decoupled Iterative Network for Metal Artifact Reduction : Abstract: Metal artifact significantly degrades Computed Tomography (CT) image quality, impeding accurate clinical diagnosis. However, existing deep learning approaches, such as CNN and Transformer, o...
- Zero-shot Multi-Contrast Brain MRI Registration by Intensity Randomizing T1-weighted MRI (LUMIR25) : Abstract: In this paper, we summarize the methods and results of our submission to the LUMIR25 challenge in Learn2Reg 2025, which achieved 1st place overall on the test set. Extended from LUMIR24, thi...
- ALIEN: Analytic Latent Watermarking for Controllable Generation : Abstract: Watermarking is a technical alternative to safeguarding intellectual property and reducing misuse. Existing methods focus on optimizing watermarked latent variables to balance watermark robu...
- COSMOS: Coherent Supergaussian Modeling with Spatial Priors for Sparse-View 3D Splatting : Abstract: 3D Gaussian Splatting (3DGS) has recently emerged as a promising approach for 3D reconstruction, providing explicit, point-based representations and enabling high-quality real time rendering...
- MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images : Abstract: Multimodal large language models (MLLMs) have rapidly advanced, yet their adoption in medicine remains limited by gaps in domain coverage, modality alignment, and grounded reasoning. In this...
- CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation : Abstract: Cinematic video production requires control over scene-subject composition and camera movement, but live-action shooting remains costly due to the need for constructing physical sets. To add...
- Seeing Beyond Redundancy: Task Complexity's Role in Vision Token Specialization in VLLMs : Abstract: Vision capabilities in vision large language models (VLLMs) have consistently lagged behind their linguistic capabilities. In particular, numerous benchmark studies have demonstrated that VL...
- Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers : Abstract: Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between text tokens and visual latents ...
- RFDM: Residual Flow Diffusion Model for Efficient Causal Video Editing : Abstract: Instructional video editing applies edits to an input video using only text prompts, enabling intuitive natural-language control. Despite rapid progress, most methods still require fixed-len...
- Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing : Abstract: Adapting pre-trained vision models using parameter-efficient fine-tuning (PEFT) remains challenging, as it aims to achieve performance comparable to full fine-tuning using a minimal number o...
- GaussianPOP: Principled Simplification Framework for Compact 3D Gaussian Splatting via Error Quantification : Abstract: Existing 3D Gaussian Splatting simplification methods commonly use importance scores, such as blending weights or sensitivity, to identify redundant Gaussians. However, these scores are not ...
- A Unified Formula for Affine Transformations between Calibrated Cameras : Abstract: In this technical note, we derive a closed-form expression for the affine transformation mapping local image patches between two calibrated views. We show that the transformation is a functi...
- Machine Learning for Detection and Severity Estimation of Sweetpotato Weevil Damage in Field and Lab Conditions : Abstract: Sweetpotato weevils (Cylas spp.) are considered among the most destructive pests impacting sweetpotato production, particularly in sub-Saharan Africa. Traditional methods for assessing weevi...
- Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening : Abstract: Adolescent Idiopathic Scoliosis (AIS) is a prevalent spinal deformity whose progression can be mitigated through early detection. Conventional screening methods are often subjective, difficu...
- Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction : Abstract: Fake Image Detection (FID), aiming at unified detection across four image forensic subdomains, is critical in real-world forensic scenarios. Compared with ensemble approaches, monolithic FID...
- PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks : Abstract: Unified multimodal models (UMMs) have shown impressive capabilities in generating natural images and supporting multimodal reasoning. However, their potential in supporting computer-use plan...
- CauCLIP: Bridging the Sim-to-Real Gap in Surgical Video Understanding via Causality-Inspired Vision-Language Modeling : Abstract: Surgical phase recognition is a critical component for context-aware decision support in intelligent operating rooms, yet training robust models is hindered by limited annotated clinical vid...
- An Integer Linear Programming Approach to Geometrically Consistent Partial-Partial Shape Matching : Abstract: The task of establishing correspondences between two 3D shapes is a long-standing challenge in computer vision. While numerous studies address full-full and partial-full 3D shape matching, o...
- Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance : Abstract: The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-foren...
- AdaptOVCD: Training-Free Open-Vocabulary Remote Sensing Change Detection via Adaptive Information Fusion : Abstract: Remote sensing change detection plays a pivotal role in domains such as environmental monitoring, urban planning, and disaster assessment. However, existing methods typically rely on predefi...
- MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices : Abstract: Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight archite...
- DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving : Abstract: End-to-end (E2E) autonomous driving has recently attracted increasing interest in unifying Vision-Language-Action (VLA) with World Models to enhance decision-making and forward-looking imagi...
- FloorplanVLM: A Vision-Language Model for Floorplan Vectorization : Abstract: Converting raster floorplans into engineering-grade vector graphics is challenging due to complex topology and strict geometric constraints. To address this, we present FloorplanVLM, a unifi...
- DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation : Abstract: In modern interior design, the generation of personalized spaces frequently necessitates a delicate balance between rigid architectural structural constraints and specific stylistic preferen...
- Rebenchmarking Unsupervised Monocular 3D Occupancy Prediction : Abstract: Inferring the 3D structure from a single image, particularly in occluded regions, remains a fundamental yet unsolved challenge in vision-centric autonomous driving. Existing unsupervised app...
- Instance-Free Domain Adaptive Object Detection : Abstract: While Domain Adaptive Object Detection (DAOD) has made significant strides, most methods rely on unlabeled target data that is assumed to contain sufficient foreground instances. However, in...
- LAB-Det: Language as a Domain-Invariant Bridge for Training-Free One-Shot Domain Generalization in Object Detection : Abstract: Foundation object detectors such as GLIP and Grounding DINO excel on general-domain data but often degrade in specialized and data-scarce settings like underwater imagery or industrial defec...
- Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection : Abstract: Detecting deepfakes has become increasingly challenging as forgery faces synthesized by AI-generated methods, particularly diffusion models, achieve unprecedented quality and resolution. Exi...
- What Is Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution : Abstract: Large-scale and categorical-balanced text data is essential for training effective Scene Text Recognition (STR) models, which is hard to achieve when collecting real data. Synthetic data off...
- ChatUMM: Robust Context Tracking for Conversational Interleaved Generation : Abstract: Unified multimodal models (UMMs) have achieved remarkable progress yet remain constrained by a single-turn interaction paradigm, effectively functioning as solvers for independent requests r...
- Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation for the Last Meters : Abstract: Embodied navigation holds significant promise for real-world applications such as last-mile delivery. However, most existing approaches are confined to either indoor or outdoor environments ...
- POPL-KF: A Pose-Only Geometric Representation-Based Kalman Filter for Point-Line-Based Visual-Inertial Odometry : Abstract: Mainstream Visual-inertial odometry (VIO) systems rely on point features for motion estimation and localization. However, their performance degrades in challenging scenarios. Moreover, the ...
- Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO : Abstract: Deploying GRPO on Flow Matching models has proven effective for text-to-image generation. However, existing paradigms typically propagate an outcome-based reward to all preceding denoising s...
- Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors : Abstract: Human visual attention on three-dimensional objects emerges from the interplay between bottom-up geometric processing and top-down semantic recognition. Existing 3D saliency methods rely on ...
- Point Virtual Transformer : Abstract: LiDAR-based 3D object detectors often struggle to detect far-field objects due to the sparsity of point clouds at long ranges, which limits the availability of reliable geometric cues. To ad...
- A neuromorphic model of the insect visual system for natural image processing : Abstract: Insect vision supports complex behaviors including associative learning, navigation, and object detection, and has long motivated computational models for understanding biological visual pro...
- MeDocVL: A Visual Language Model for Medical Document Understanding and Parsing : Abstract: Medical document OCR is challenging due to complex layouts, domain-specific terminology, and noisy annotations, while requiring strict field-level exact matching. Existing OCR systems and ge...
- POINTS-GUI-G: GUI-Grounding Journey : Abstract: The rapid advancement of vision-language models has catalyzed the emergence of GUI agents, which hold immense potential for automating complex tasks, from online shopping to flight booking, ...
- Robust Pedestrian Detection with Uncertain Modality : Abstract: Existing cross-modal pedestrian detection (CMPD) employs complementary information from RGB and thermal-infrared (TIR) modalities to detect pedestrians in 24h-surveillance systems.RGB captur...
- FlowConsist: Make Your Flow Consistent with Real Trajectory : Abstract: Fast flow models accelerate the iterative sampling process by learning to directly predict ODE path integrals, enabling one-step or few-step generation. However, we argue that current fast-f...
- Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering : Abstract: High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating miss...
- SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation : Abstract: Recently, Segment Anything Model (SAM) has demonstrated strong generalizability in various instance segmentation tasks. However, its performance is severely dependent on the quality of manua...
- Taming SAM3 in the Wild: A Concept Bank for Open-Vocabulary Segmentation : Abstract: The recent introduction of \texttt{SAM3} has revolutionized Open-Vocabulary Segmentation (OVS) through \textit{promptable concept segmentation}, which grounds pixel predictions in flexible c...
- Halt the Hallucination: Decoupling Signal and Semantic OOD Detection Based on Cascaded Early Rejection : Abstract: Efficient and robust Out-of-Distribution (OOD) detection is paramount for safety-critical applications.However, existing methods still execute full-scale inference on low-level statistical n...
- Adaptive and Balanced Re-initialization for Long-timescale Continual Test-time Domain Adaptation : Abstract: Continual test-time domain adaptation (CTTA) aims to adjust models so that they can perform well over time across non-stationary environments. While previous methods have made considerable e...
- Unsupervised MRI-US Multimodal Image Registration with Multilevel Correlation Pyramidal Optimization : Abstract: Surgical navigation based on multimodal image registration has played a significant role in providing intraoperative guidance to surgeons by showing the relative position of the target area ...
- MMEarth-Bench: Global Model Adaptation via Multimodal Test-Time Training : Abstract: Recent research in geospatial machine learning has demonstrated that models pretrained with self-supervised learning on Earth observation data can perform well on downstream tasks with limit...
- An Interpretable Vision Transformer as a Fingerprint-Based Diagnostic Aid for Kabuki and Wiedemann-Steiner Syndromes : Abstract: Kabuki syndrome (KS) and Wiedemann-Steiner syndrome (WSS) are rare but distinct developmental disorders that share overlapping clinical features, including neurodevelopmental delay, growth r...
- ForeHOI: Feed-forward 3D Object Reconstruction from Daily Hand-Object Interaction Videos : Abstract: The ubiquity of monocular videos capturing daily hand-object interactions presents a valuable resource for embodied intelligence. While 3D hand reconstruction from in-the-wild videos has see...
- DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images : Abstract: Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes....
- DeDPO: Debiased Direct Preference Optimization for Diffusion Models : Abstract: Direct Preference Optimization (DPO) has emerged as a predominant alignment method for diffusion models, facilitating off-policy training without explicit reward modeling. However, its relia...
- Unsupervised Anomaly Detection of Diseases in the Female Pelvis for Real-Time MR Imaging : Abstract: Pelvic diseases in women of reproductive age represent a major global health burden, with diagnosis frequently delayed due to high anatomical variability, complicating MRI interpretation. Ex...
- M3: High-fidelity Text-to-Image Generation via Multi-Modal, Multi-Agent and Multi-Round Visual Reasoning : Abstract: Generative models have achieved impressive fidelity in text-to-image synthesis, yet struggle with complex compositional prompts involving multiple constraints. We introduce \textbf{M3 (Multi...
- MetaSSP: Enhancing Semi-supervised Implicit 3D Reconstruction through Meta-adaptive EMA and SDF-aware Pseudo-label Evaluation : Abstract: Implicit SDF-based methods for single-view 3D reconstruction achieve high-quality surfaces but require large labeled datasets, limiting their scalability. We propose MetaSSP, a novel semi-su...
- Driving with DINO: Vision Foundation Features as a Unified Bridge for Sim-to-Real Generation in Autonomous Driving : Abstract: Driven by the emergence of Controllable Video Diffusion, existing Sim2Real methods for autonomous driving video generation typically rely on explicit intermediate representations to bridge t...
- MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes : Abstract: Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a...
- EgoAVU: Egocentric Audio-Visual Understanding : Abstract: Understanding egocentric videos plays a vital role for embodied intelligence. Recent multi-modal large language models (MLLMs) can accept both visual and audio inputs. However, due to the ch...
- From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors : Abstract: Creating high-fidelity, animatable 3D talking heads is crucial for immersive applications, yet often hindered by the prevalence of low-quality image or video sources, which yield poor 3D rec...
- T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation : Abstract: 2D concept art generation for 3D scenes is a crucial yet challenging task in computer graphics, as creating natural intuitive environments still demands extensive manual effort in concept de...
- Designing Computational Tools for Exploring Causal Relationships in Qualitative Data : Abstract: Exploring causal relationships for qualitative data analysis in HCI and social science research enables the understanding of user needs and theory building. However, current computational to...
- PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining : Abstract: Recent progress in large-scale CLIP-like vision-language models(VLMs) has greatly advanced medical image analysis. However, most existing medical VLMs still rely on coarse image-text contras...
- STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs : Abstract: Neural audio codecs are widely used for audio compression and can be integrated into token-based language models. Traditional codecs preserve acoustic details well but lack semantic informat...
- DAWN: Dependency-Aware Fast Inference for Diffusion LLMs : Abstract: Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed...
- SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks : Abstract: Multi-turn jailbreaks capture the real threat model for safety-aligned chatbots, where single-turn attacks are merely a special case. Yet existing approaches break under exploration complexi...
- Visual Word Sense Disambiguation with CLIP through Dual-Channel Text Prompting and Image Augmentations : Abstract: Ambiguity poses persistent challenges in natural language understanding for large language models (LLMs). To better understand how lexical ambiguity can be resolved through the visual domain...
- R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging : Abstract: Reinforcement Learning from Human Feedback (RLHF) remains indispensable for aligning large language models (LLMs) in subjective domains. To enhance robustness, recent work shifts toward Gene...
- Table-as-Search: Formulate Long-Horizon Agentic Information Seeking as Table Completion : Abstract: Current Information Seeking (InfoSeeking) agents struggle to maintain focus and coherence during long-horizon exploration, as tracking search states, including planning procedure and massive...
- Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts : Abstract: The groundbreaking capabilities of Large Language Models (LLMs) offer new opportunities for enhancing human-computer interaction through emotion-adaptive Artificial Intelligence (AI). Howeve...
- Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought : Abstract: Large Language Models (LLMs) face a fundamental safety-helpfulness trade-off due to static, one-size-fits-all safety policies that lack runtime controllabilityxf, making it difficult to tail...
- Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features : Abstract: Spoken content, such as online videos and podcasts, often spans multiple topics, which makes automatic topic segmentation essential for user navigation and downstream applications. However, ...
- FairJudge: An Adaptive, Debiased, and Consistent LLM-as-a-Judge : Abstract: Existing LLM-as-a-Judge systems suffer from three fundamental limitations: limited adaptivity to task- and domain-specific evaluation criteria, systematic biases driven by non-semantic cues ...
- Do Prompts Guarantee Safety? Mitigating Toxicity from LLM Generations through Subspace Intervention : Abstract: Large Language Models (LLMs) are powerful text generators, yet they can produce toxic or harmful content even when given seemingly harmless prompts. This presents a serious safety challenge ...
- Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning : Abstract: Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed thi...
- Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making : Abstract: We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the ...
- RelayGen: Intra-Generation Model Switching for Efficient Reasoning : Abstract: Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks by generating long, multi-step reasoning trajectories, but inference-time scaling incurs substantial deplo...
- Evaluating an evidence-guided reinforcement learning framework in aligning light-parameter large language models with decision-making cognition in psychiatric clinical reasoning : Abstract: Large language models (LLMs) hold transformative potential for medical decision support yet their application in psychiatry remains constrained by hallucinations and superficial reasoning. T...
- On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation : Abstract: Humor is a commonly used and intricate human language in daily life. Humor generation, especially in multi-modal scenarios, is a challenging task for large language models (LLMs), which is t...
- FMBench: Adaptive Large Language Model Output Formatting : Abstract: Producing outputs that satisfy both semantic intent and format constraints is essential for deploying large language models in user-facing and system-integrated workflows. In this work, we f...
- ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis : Abstract: While self-reflection can enhance language model reliability, its underlying mechanisms remain opaque, with existing analyses often yielding correlation-based insights that fail to generaliz...
- Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production : Abstract: Large language models (LLMs) such as GPT-4o and Claude Sonnet 4.5 have demonstrated strong capabilities in open-ended reasoning and generative language tasks, leading to their widespread ado...
- Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Code-Switching Beyond Standard UD Assumptions : Abstract: Spoken code-switching (CSW) challenges syntactic parsing in ways not observed in written text. Disfluencies, repetition, ellipsis, and discourse-driven structure routinely violate standard U...
- Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math : Abstract: Recent progress in reasoning models suggests that generating plausible attempts for research-level mathematics may be within reach, but verification remains a bottleneck, consuming scarce ex...
- RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution : Abstract: Explaining closed-source LLM outputs is challenging because API access prevents gradient-based attribution, while perturbation methods are costly and noisy when they depend on regenerated te...
- VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation : Abstract: Emotion recognition in speech presents a complex multimodal challenge, requiring comprehension of both linguistic content and vocal expressivity, particularly prosodic features such as funda...
- Is my model "mind blurting"? Interpreting the dynamics of reasoning tokens with Recurrence Quantification Analysis (RQA) : Abstract: Test-time compute is central to large reasoning models, yet analysing their reasoning behaviour through generated text is increasingly impractical and unreliable. Response length is often us...
- BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks : Abstract: Multiple-choice question answering (MCQA) is standard in NLP, but benchmarks lack rigorous quality control. We present BenchMarker, an education-inspired toolkit using LLM judges to flag thr...
- Uncertainty Drives Social Bias Changes in Quantized Large Language Models : Abstract: Post-training quantization reduces the computational cost of large language models but fundamentally alters their social biases in ways that aggregate metrics fail to capture. We present the...
- Quantifying and Attributing Polarization to Annotator Groups : Abstract: Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions ren...
- What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation : Abstract: Assessing research novelty is a core yet highly subjective aspect of peer review, typically based on implicit judgment and incomplete comparison to prior work. We introduce a literature-awar...
- PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models : Abstract: Recent advances in duplex speech models have enabled natural, low-latency speech-to-speech interactions. However, existing models are restricted to a fixed role and voice, limiting their abi...
- CAST: Character-and-Scene Episodic Memory for Agents : Abstract: Episodic memory is a central component of human memory, which refers to the ability to recall coherent events grounded in who, when, and where. However, most agent memory systems only emphas...
- Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering : Abstract: Despite the remarkable capabilities of Large Vision Language Models (LVLMs), they still lack detailed knowledge about specific entities. Retrieval-augmented Generation (RAG) is a widely adop...
- Ensemble Transport Filter via Optimized Maximum Mean Discrepancy : Abstract: In this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to ...
- Science-Informed Design of Deep Learning With Applications to Wireless Systems: A Tutorial : Abstract: Recent advances in computational infrastructure and large-scale data processing have accelerated the adoption of data-driven inference methods, particularly deep learning (DL), to solve prob...
- Predicting the fatigue life of asphalt concrete using neural networks : Abstract: Asphalt concrete's (AC) durability and maintenance demands are strongly influenced by its fatigue life. Traditional methods for determining this characteristic are both resource-intensive an...
- Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows : Abstract: Alongside optimization-based planners, sampling-based approaches are often used in trajectory planning for autonomous driving due to their simplicity. Model predictive path integral control ...
- STAG: Structural Test-time Alignment of Gradients for Online Adaptation : Abstract: Test-Time Adaptation (TTA) adapts pre-trained models using only unlabeled test streams, requiring real-time inference and update without access to source data. We propose StructuralTest-time...
- Nonparametric Evaluation of Noisy ICA Solutions : Abstract: Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of ...
- Forecasting with Hyper-Trees : Abstract: We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-...
- A Multi-Token Coordinate Descent Method for Semi-Decentralized Vertical Federated Learning : Abstract: Most federated learning (FL) methods use a client-server scheme, where clients communicate only with a central server. However, this scheme is prone to bandwidth bottlenecks at the server an...
- Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches : Abstract: This paper presents the design and implementation of data-driven optimal derivative feedback controllers for an active magnetic levitation system. A direct, model-free control design method ...
- Reliable Mislabel Detection for Video Capsule Endoscopy Data : Abstract: The classification performance of deep neural networks relies strongly on access to large, accurately annotated datasets. In medical imaging, however, obtaining such datasets is particularly...
- Reciprocal Latent Fields for Precomputed Sound Propagation : Abstract: Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for real-time applications. Wave ...
- Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy : Abstract: The advancement of machine learning in audio analysis has opened new possibilities for technology-enhanced music education. This paper introduces a framework for automatic singing mistake de...
- Uncovering Cross-Objective Interference in Multi-Objective Alignment : Abstract: We study a persistent failure mode in multi-objective alignment for large language models (LLMs): training improves performance on only a subset of objectives while causing others to degrade...
- Are Deep Learning Based Hybrid PDE Solvers Reliable? Why Training Paradigms and Update Strategies Matter : Abstract: Deep learning-based hybrid iterative methods (DL-HIMs) integrate classical numerical solvers with neural operators, utilizing their complementary spectral biases to accelerate convergence. D...
- RanSOM: Second-Order Momentum with Randomized Scaling for Constrained and Unconstrained Optimization : Abstract: Momentum methods, such as Polyak's Heavy Ball, are the standard for training deep networks but suffer from curvature-induced bias in stochastic settings, limiting convergence to suboptimal $...
- RAIGen: Rare Attribute Identification in Text-to-Image Generative Models : Abstract: Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two w...
- Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay : Abstract: We study optimal learning-rate schedules (LRSs) under the functional scaling law (FSL) framework introduced in Li et al. (2025), which accurately models the loss dynamics of both linear regr...
- Revisiting Emotions Representation for Recognition in the Wild : Abstract: Facial emotion recognition has been typically cast as a single-label classification problem of one out of six prototypical emotions. However, that is an oversimplification that is unsuitable...
- Fair Transit Stop Placement: A Clustering Perspective and Beyond : Abstract: We study the transit stop placement (TrSP) problem in general metric spaces, where agents travel between source-destination pairs and may either walk directly or utilize a shuttle service vi...
- Missing At Random as Covariate Shift: Correcting Bias in Iterative Imputation : Abstract: Accurate imputation of missing data is critical to downstream machine learning performance. We formulate missing data imputation as a risk minimisation problem, which highlights a covariate ...
- Taipan: A Query-free Transfer-based Multiple Sensitive Attribute Inference Attack Solely from Publicly Released Graphs : Abstract: Graph-structured data underpin a wide spectrum of modern applications. However, complex graph topologies and homophilic patterns can facilitate attribute inference attacks (AIAs) by enabling...
- Quantum Attention by Overlap Interference: Predicting Sequences from Classical and Many-Body Quantum Data : Abstract: We propose a variational quantum implementation of self-attention (QSA), the core operation in transformers and large language models, which predicts future elements of a sequence by forming...
- Makespan Minimization in Split Learning: From Theory to Practice : Abstract: Split learning recently emerged as a solution for distributed machine learning with heterogeneous IoT devices, where clients can offload part of their training to computationally-powerful he...
- CytoCrowd: A Multi-Annotator Benchmark Dataset for Cytology Image Analysis : Abstract: High-quality annotated datasets are crucial for advancing machine learning in medical image analysis. However, a critical gap exists: most datasets either offer a single, clean ground truth,...
- Infinite-dimensional generative diffusions via Doob's h-transform : Abstract: This paper introduces a rigorous framework for defining generative diffusion models in infinite dimensions via Doob's h-transform. Rather than relying on time reversal of a noising process, ...
- Confundo: Learning to Generate Robust Poison for Practical RAG Systems : Abstract: Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear trustworthy. This trust has spurred researc...
- Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning : Abstract: Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inf...
- Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms : Abstract: We present a framework for dynamic management of structured parallel processing skeletons on serverless platforms. Our goal is to bring HPC-like performance and resilience to serverless and ...
- Evolving Ranking Functions for Canonical Blow-Ups in Positive Characteristic : Abstract: Resolution of singularities in positive characteristic remains a long-standing open problem in algebraic geometry. In characteristic zero, the problem was solved by Hironaka in 1964, work fo...
- NECromancer: Breathing Life into Skeletons via BVH Animation : Abstract: Motion tokenization is a key component of generalizable motion models, yet most existing approaches are restricted to species-specific skeletons, limiting their applicability across diverse ...
- Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs : Abstract: Adversarial online linear optimization (OLO) is essentially about making performance tradeoffs with respect to the unknown difficulty of the adversary. In the setting of one-dimensional fixe...
- AlertBERT: A noise-robust alert grouping framework for simultaneous cyber attacks : Abstract: Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by int...
- Forest canopy height estimation from satellite RGB imagery using large-scale airborne LiDAR-derived training data and monocular depth estimation : Abstract: Large-scale, high-resolution forest canopy height mapping plays a crucial role in understanding regional and global carbon and water cycles. Spaceborne LiDAR missions, including the Ice, Clo...
- Diffusion-State Policy Optimization for Masked Diffusion Language Models : Abstract: Masked diffusion language models generate by iteratively filling masked tokens over multiple denoising steps, so learning only from a terminal reward on the final completion yields coarse cr...
- Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding : Abstract: Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and feed-forward blocks for every to...
- HyQuRP: Hybrid quantum-classical neural network with rotational and permutational equivariance for 3D point clouds : Abstract: We introduce HyQuRP, a hybrid quantum-classical neural network equivariant to rotational and permutational symmetries. While existing equivariant quantum machine learning models often rely o...
- A Multiplicative Neural Network Architecture: Locality and Regularity of Appriximation : Abstract: We introduce a multiplicative neural network architecture in which multiplicative interactions constitute the fundamental representation, rather than appearing as auxiliary components within...
- Advances in Battery Energy Storage Management: Control and Economic Synergies : Abstract: The existing literature on Battery Energy Storage Systems (BESS) predominantly focuses on two main areas: control system design aimed at achieving grid stability and the techno-economic anal...
- Envy-Free Allocation of Indivisible Goods via Noisy Queries : Abstract: We introduce a problem of fairly allocating indivisible goods (items) in which the agents' valuations cannot be observed directly, but instead can only be accessed via noisy queries. In the ...
- AdFL: In-Browser Federated Learning for Online Advertisement : Abstract: Since most countries are coming up with online privacy regulations, such as GDPR in the EU, online publishers need to find a balance between revenue from targeted advertisement and user priv...
- High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory : Abstract: Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great ...
- Time-uniform conformal and PAC prediction : Abstract: Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such ...
- MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs : Abstract: Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems t...
- Inheritance Between Feedforward and Convolutional Networks via Model Projection : Abstract: Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made e...
- Cross-Modal Redundancy and the Geometry of Vision-Language Embeddings : Abstract: Vision-language models (VLMs) align images and text with remarkable success, yet the geometry of their shared embedding space remains poorly understood. To probe this geometry, we begin from...
- Know Your Scientist: KYC as Biosecurity Infrastructure : Abstract: Biological AI tools for protein design and structure prediction are advancing rapidly, creating dual-use risks that existing safeguards cannot adequately address. Current model-level restric...
- Warm Starts, Cold States: Exploiting Adiabaticity for Variational Ground-States : Abstract: Reliable preparation of many-body ground states is an essential task in quantum computing, with applications spanning areas from chemistry and materials modeling to quantum optimization and ...
- Algebraic Robustness Verification of Neural Networks : Abstract: We formulate formal robustness verification of neural networks as an algebraic optimization problem. We leverage the Euclidean Distance (ED) degree, which is the generic number of complex cr...
- Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers : Abstract: The scaling of Large Language Models (LLMs) drives interest in matrix-based optimizers (e.g., Shampoo, Muon, SOAP) for their convergence efficiency; yet their requirement for holistic update...
- PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference : Abstract: Attention efficiency is critical to large language model (LLM) inference. While prior advances optimize attention execution for individual requests (e.g., FlashAttention), production LLM ser...
- Deep networks learn to parse uniform-depth context-free languages from local statistics : Abstract: Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations ...
- Deep Unfolded Fractional Optimization for Maximizing Robust Throughput in 6G Networks : Abstract: The sixth-generation (6G) of wireless communication networks aims to leverage artificial intelligence tools for efficient and robust network optimization. This is especially the case since t...
- Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine : Abstract: Addressing class imbalance is a central challenge in credit card fraud detection, as it directly impacts predictive reliability in real-world financial systems. To overcome this, the study p...
- From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows : Abstract: Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered...
- When RL Meets Adaptive Speculative Training: A Unified Training-Serving System : Abstract: Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline m...
- Continuous-time reinforcement learning: ellipticity enables model-free value function approximation : Abstract: We study off-policy reinforcement learning for controlling continuous-time Markov diffusion processes with discrete-time observations and actions. We consider model-free algorithms with func...
- Robustness Beyond Known Groups with Low-rank Adaptation : Abstract: Deep learning models trained to optimize average accuracy often exhibit systematic failures on particular subpopulations. In real world settings, the subpopulations most affected by such dis...
- Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models : Abstract: The recent surge in Time Series Foundation Models has rapidly advanced the field, yet the heterogeneous training setups across studies make it difficult to attribute improvements to architec...
- A first realization of reinforcement learning-based closed-loop EEG-TMS : Abstract: Background: Transcranial magnetic stimulation (TMS) is a powerful tool to investigate neurophysiology of the human brain and treat brain disorders. Traditionally, therapeutic TMS has been ap...
- Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory : Abstract: In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost c...
- Sample Complexity of Causal Identification with Temporal Heterogeneity : Abstract: Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes ...
- A Cycle-Consistent Graph Surrogate for Full-Cycle Left Ventricular Myocardial Biomechanics : Abstract: Image-based patient-specific simulation of left ventricular (LV) mechanics is valuable for understanding cardiac function and supporting clinical intervention planning, but conventional fini...
- Vision Transformer Finetuning Benefits from Non-Smooth Components : Abstract: The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer le...
- Decoupling Variance and Scale-Invariant Updates in Adaptive Gradient Descent for Unified Vector and Matrix Optimization : Abstract: Adaptive methods like Adam have become the $\textit{de facto}$ standard for large-scale vector and Euclidean optimization due to their coordinate-wise adaptation with a second-order nature. ...
- T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility : Abstract: Reliable short-term demand forecasting is essential for managing shared micro-mobility services and ensuring responsive, user-centered operations. This study introduces T-STAR (Two-stage Spa...
- Designing a Robust, Bounded, and Smooth Loss Function for Improved Supervised Learning : Abstract: The loss function is crucial to machine learning, especially in supervised learning frameworks. It is a fundamental component that controls the behavior and general efficacy of learning algo...
- Improved Sampling Schedules for Discrete Diffusion Models : Abstract: Discrete diffusion models have emerged as a powerful paradigm for generative modeling on sequence data; however, the information-theoretic principles governing their reverse processes remain...
- Learning Deep Hybrid Models with Sharpness-Aware Minimization : Abstract: Hybrid modeling, the combination of machine learning models and scientific mathematical models, enables flexible and robust data-driven prediction with partial interpretability. However, eff...
- Calibrating Tabular Anomaly Detection via Optimal Transport : Abstract: Tabular anomaly detection (TAD) remains challenging due to the heterogeneity of tabular data: features lack natural relationships, vary widely in distribution and scale, and exhibit diverse ...
- FlowDA: Accurate, Low-Latency Weather Data Assimilation via Flow Matching : Abstract: Data assimilation (DA) is a fundamental component of modern weather prediction, yet it remains a major computational bottleneck in machine learning (ML)-based forecasting pipelines due to re...
- Rare Event Analysis of Large Language Models : Abstract: Being probabilistic models, during inference large language models (LLMs) display rare events: behaviour that is far from typical but highly significant. By definition all rare events are ha...
- Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences : Abstract: DPO and related algorithms align language models by directly optimizing the RLHF objective: find a policy that maximizes the Bradley-Terry reward while staying close to a reference policy th...
- Weisfeiler and Lehman Go Categorical : Abstract: While lifting map has significantly enhanced the expressivity of graph neural networks, extending this paradigm to hypergraphs remains fragmented. To address this, we introduce the categoric...
- Robust Online Learning : Abstract: We study the problem of learning robust classifiers where the classifier will receive a perturbed input. Unlike robust PAC learning studied in prior work, here the clean data and its label a...
- On the Convergence of Multicalibration Gradient Boosting : Abstract: Multicalibration gradient boosting has recently emerged as a scalable method that empirically produces approximately multicalibrated predictors and has been deployed at web scale. Despite th...
- Calibrating Generative AI to Produce Realistic Essays for Data Augmentation : Abstract: Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items. This study seeks to determine how well three approaches to ...
- Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities : Abstract: Recent advancements in zero-shot reinforcement learning (RL) have facilitated the extraction of diverse behaviors from unlabeled, offline data sources. In particular, forward-backward algori...
- Disentanglement by means of action-induced representations : Abstract: Learning interpretable representations with variational autoencoders (VAEs) is a major goal of representation learning. The main challenge lies in obtaining disentangled representations, whe...
- Explaining Grokking in Transformers through the Lens of Inductive Bias : Abstract: We investigate grokking in transformers through the lens of inductive bias: dispositions arising from architecture or optimization that let the network prefer one solution over another. We f...
- Diffeomorphism-Equivariant Neural Networks : Abstract: Incorporating group symmetries via equivariance into neural networks has emerged as a robust approach for overcoming the efficiency and data demands of modern deep learning. While most exist...
- NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models : Abstract: Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) ...
- Memory-Conditioned Flow-Matching for Stable Autoregressive PDE Rollouts : Abstract: Autoregressive generative PDE solvers can be accurate one step ahead yet drift over long rollouts, especially in coarse-to-fine regimes where each step must regenerate unresolved fine scales...
- Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation : Abstract: Pruning at Initialisation methods discover sparse, trainable subnetworks before training, but their theoretical mechanisms remain elusive. Existing analyses are often limited to finite-width...
- Adaptive-CaRe: Adaptive Causal Regularization for Robust Outcome Prediction : Abstract: Accurate prediction of outcomes is crucial for clinical decision-making and personalized patient care. Supervised machine learning algorithms, which are commonly used for outcome prediction ...
- The hidden risks of temporal resampling in clinical reinforcement learning : Abstract: Offline reinforcement learning (ORL) has shown potential for improving decision-making in healthcare. However, contemporary research typically aggregates patient data into fixed time interva...
- DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters : Abstract: While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properti...
- Degradation of Feature Space in Continual Learning : Abstract: Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic feature distributions naturally...
- Learning to Allocate Resources with Censored Feedback : Abstract: We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two cond...
- Fine-Grained Model Merging via Modular Expert Recombination : Abstract: Model merging constructs versatile models by integrating task-specific models without requiring labeled data or expensive joint retraining. Although recent methods improve adaptability to he...
- Refining the Information Bottleneck via Adversarial Information Separation : Abstract: Generalizing from limited data is particularly critical for models in domains such as material science, where task-relevant features in experimental datasets are often heavily confounded by ...
- Live Knowledge Tracing: Real-Time Adaptation using Tabular Foundation Models : Abstract: Deep knowledge tracing models have achieved significant breakthroughs in modeling student learning trajectories. However, these architectures require substantial training time and are prone ...
- Topography scanning as a part of process monitoring in power cable insulation process : Abstract: We present a novel topography scanning system developed to XLPE cable core monitoring. Modern measurement technology is utilized together with embedded high-performance computing to build a ...
- Evolutionary Generation of Multi-Agent Systems : Abstract: Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks, but designing effective MAS architectures remains la...
- Can Microcanonical Langevin Dynamics Leverage Mini-Batch Gradient Noise? : Abstract: Scaling inference methods such as Markov chain Monte Carlo to high-dimensional models remains a central challenge in Bayesian deep learning. A promising recent proposal, microcanonical Lange...
- Adaptive Uncertainty-Aware Tree Search for Robust Reasoning : Abstract: Inference-time reasoning scaling has significantly advanced the capabilities of Large Language Models (LLMs) in complex problem-solving. A prevalent approach involves external search guided ...
- Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning : Abstract: Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities. However, existing reward mechanisms remain tightly coupled to final correctness and pay little at...
- Achieving Better Local Regret Bound for Online Non-Convex Bilevel Optimization : Abstract: Online bilevel optimization (OBO) has emerged as a powerful framework for many machine learning problems. Prior works have developed several algorithms that minimize the standard bilevel loc...
- The Window Dilemma: Why Concept Drift Detection is Ill-Posed : Abstract: Non-stationarity of an underlying data generating process that leads to distributional changes over time is a key characteristic of Data Streams. This phenomenon, commonly referred to as Con...
- On the Plasticity and Stability for Post-Training Large Language Models : Abstract: Training stability remains a critical bottleneck for Group Relative Policy Optimization (GRPO), often manifesting as a trade-off between reasoning plasticity and general capability retention...
- BrokenBind: Universal Modality Exploration beyond Dataset Boundaries : Abstract: Multi-modal learning combines various modalities to provide a comprehensive understanding of real-world problems. A common strategy is to directly bind different modalities together in a spe...
- Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning : Abstract: For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to condu...
- Reclaiming First Principles: A Differentiable Framework for Conceptual Hydrologic Models : Abstract: Conceptual hydrologic models remain the cornerstone of rainfall-runoff modeling, yet their calibration is often slow and numerically fragile. Most gradient-based parameter estimation methods...
- Beyond Code Contributions: How Network Position, Temporal Bursts, and Code Review Activities Shape Contributor Influence in Large-Scale Open Source Ecosystems : Abstract: Open source software (OSS) projects rely on complex networks of contributors whose interactions drive innovation and sustainability. This study presents a comprehensive analysis of OSS contr...
- Adaptive Protein Tokenization : Abstract: Tokenization is a promising path to multi-modal models capable of jointly understanding protein sequences, structure, and function. Existing protein structure tokenizers create tokens by poo...
- EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms : Abstract: Electroencephalography (EEG)-based emotion recognition plays a critical role in affective computing and emerging decision-support systems, yet remains challenging due to high-dimensional, no...
- Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach : Abstract: We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this...
- Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization : Abstract: Spectral gradient descent (SpecGD) orthogonalizes the matrix parameter updates and has inspired practical optimizers such as Muon. They often perform well in large language model (LLM) train...
- Evaluating LLM-persona Generated Distributions for Decision-making : Abstract: LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generat...
- Enhance and Reuse: A Dual-Mechanism Approach to Boost Deep Forest for Label Distribution Learning : Abstract: Label distribution learning (LDL) requires the learner to predict the degree of correlation between each sample and each label. To achieve this, a crucial task during learning is to leverage...
- Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret : Abstract: Learning to play zero-sum games is a fundamental problem in game theory and machine learning. While significant progress has been made in minimizing external regret in the self-play settings...
- Don't Break the Boundary: Continual Unlearning for OOD Detection Based on Free Energy Repulsion : Abstract: Deploying trustworthy AI in open-world environments faces a dual challenge: the necessity for robust Out-of-Distribution (OOD) detection to ensure system safety, and the demand for flexible ...
- Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics : Abstract: Reinforcement learning (RL) policies trained in simulation often suffer from severe performance degradation when deployed in real-world environments due to non-stationary dynamics. While Dom...
- How (Not) to Hybridize Neural and Mechanistic Models for Epidemiological Forecasting : Abstract: Epidemiological forecasting from surveillance data is a hard problem and hybridizing mechanistic compartmental models with neural models is a natural direction. The mechanistic structure hel...
- SOCKET: SOft Collison Kernel EsTimator for Sparse Attention : Abstract: Exploiting sparsity during long-context inference is central to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost ...
- Statistical Learning from Attribution Sets : Abstract: We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motiva...
- PurSAMERE: Reliable Adversarial Purification via Sharpness-Aware Minimization of Expected Reconstruction Error : Abstract: We propose a novel deterministic purification method to improve adversarial robustness by mapping a potentially adversarial sample toward a nearby sample that lies close to a mode of the dat...
- Swap Regret Minimization Through Response-Based Approachability : Abstract: We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and ha...
- On Randomized Algorithms in Online Strategic Classification : Abstract: Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan app...
- Adaptive Sparse M\"obius Transforms for Learning Polynomials : Abstract: We consider the problem of exactly learning an $s$-sparse real-valued Boolean polynomial of degree $d$ of the form $f:\{ 0,1\}^n \rightarrow \mathbb{R}$. This problem corresponds to decompos...
- A Fast and Generalizable Fourier Neural Operator-Based Surrogate for Melt-Pool Prediction in Laser Processing : Abstract: High-fidelity simulations of laser welding capture complex thermo-fluid phenomena, including phase change, free-surface deformation, and keyhole dynamics, however their computational cost li...
- Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution : Abstract: We introduce PEPO (Pessimistic Ensemble based Preference Optimization), a single-step Direct Preference Optimization (DPO)-like algorithm to mitigate the well-known over-optimization issue i...
- $f$-FUM: Federated Unlearning via min--max and $f$-divergence : Abstract: Federated Learning (FL) has emerged as a powerful paradigm for collaborative machine learning across decentralized data sources, preserving privacy by keeping data local. However, increasing...
- To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training : Abstract: Trainings of Large Language Models are generally bottlenecked by matrix multiplications. In the Transformer architecture, a large portion of these operations happens in the Feed Forward Netw...
- SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage : Abstract: DNA storage has matured from concept to practical stage, yet its integration with neural compression pipelines remains inefficient. Early DNA encoders applied redundancy-heavy constraint lay...
- Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering : Abstract: Diffusion models rely on a high-dimensional latent space of initial noise seeds, yet it remains unclear whether this space contains sufficient structure to predict properties of the generate...
- MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models : Abstract: Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between acc...
- Optimistic Training and Convergence of Q-Learning -- Extended Version : Abstract: In recent work it is shown that Q-learning with linear function approximation is stable, in the sense of bounded parameter estimates, under the $(\varepsilon,κ)$-tamed Gibbs policy; $κ$ is i...
- Flow Matching for Offline Reinforcement Learning with Discrete Actions : Abstract: Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continu...
- Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation : Abstract: Test-time adaptation (TTA) offers a compelling remedy for machine learning (ML) models that degrade under domain shifts, improving generalisation on-the-fly with only unlabelled samples. Thi...
- Compressing LLMs with MoP: Mixture of Pruners : Abstract: The high computational demands of Large Language Models (LLMs) motivate methods that reduce parameter count and accelerate inference. In response, model pruning emerges as an effective strat...
- Private and interpretable clinical prediction with quantum-inspired tensor train models : Abstract: Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer transparency, while neural networks (NNs)...
- Pragmatic Curiosity: A Hybrid Learning-Optimization Paradigm via Active Inference : Abstract: Many engineering and scientific workflows depend on expensive black-box evaluations, requiring decision-making that simultaneously improves performance and reduces uncertainty. Bayesian opti...
- Toward Faithful and Complete Answer Construction from a Single Document : Abstract: Modern large language models (LLMs) are powerful generators driven by statistical next-token prediction. While effective at producing fluent text, this design biases models toward high-proba...
- Agentic Workflow Using RBA$_\theta$ for Event Prediction : Abstract: Wind power ramp events are difficult to forecast due to strong variability, multi-scale dynamics, and site-specific meteorological effects. This paper proposes an event-first, frequency-awar...
- Testing Storage-System Correctness: Challenges, Fuzzing Limitations, and AI-Augmented Opportunities : Abstract: Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many stora...
- Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture : Abstract: AI-Scientist systems that use large language models to automate research risk generating spurious discoveries through uncontrolled multiple testing. We present a functional architecture that...
- Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks : Abstract: Prompt injection attacks, where untrusted data contains an injected prompt to manipulate the system, have been listed as the top security threat to LLM-integrated applications. Model-level p...
- Learning Metal Microstructural Heterogeneity through Spatial Mapping of Diffraction Latent Space Features : Abstract: To leverage advancements in machine learning for metallic materials design and property prediction, it is crucial to develop a data-reduced representation of metal microstructures that surpa...
- ExpressivityBench: Can LLMs Communicate Implicitly? : Abstract: Human communication is often implicit, conveying tone, identity, and intent beyond literal meanings. While large language models have achieved strong performance on explicit tasks such as su...
- Hyperbolic Fine-Tuning for Large Language Models : Abstract: Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choi...
- EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations : Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis o...
- Bayesian Matrix Decomposition and Applications : Abstract: The sole aim of this book is to give a self-contained introduction to concepts and mathematical tools in Bayesian matrix decomposition in order to seamlessly introduce matrix decomposition t...
- How does information access affect LLM monitors' ability to detect sabotage? : Abstract: Frontier language model agents can exhibit misaligned behaviors, including deception, exploiting reward hacks, and pursuing hidden objectives. To control potentially misaligned agents, we ca...
- Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks : Abstract: Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offl...
- Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems : Abstract: Despite initial successes and a variety of architectures, retrieval-augmented generation systems still struggle to reliably retrieve and connect the multi-step evidence required for complica...
- Human-AI Co-Embodied Intelligence for Scientific Experimentation and Manufacturing : Abstract: Scientific experimentation and manufacturing rely on prolonged protocol development and complex, multi-step implementation, which require continuous human expertise for precise execution and...
- Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics : Abstract: We present CID-GraphRAG (Conversational Intent-Driven Graph Retrieval Augmented Generation), a novel framework that addresses the limitations of existing dialogue systems in maintaining both...
- A computational framework for human values : Abstract: In the diverse array of work investigating the nature of human values from psychology, philosophy and social sciences, there is a clear consensus that values guide behaviour. More recently, ...
- Learning a Generative Meta-Model of LLM Activations : Abstract: Existing approaches for analyzing neural network activations, such as PCA and sparse autoencoders, rely on strong structural assumptions. Generative models offer an alternative: they can unc...
- InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning : Abstract: Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due...
- DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos : Abstract: Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for...
- Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay : Abstract: Tokenization is a pivotal design choice for neural language modeling in morphologically rich languages (MRLs) such as Turkish, where productive agglutination challenges both vocabulary effic...
- Endogenous Resistance to Activation Steering in Language Models : Abstract: Large language models can resist task-misaligned activation steering during inference, sometimes recovering mid-generation to produce improved responses even when steering remains active. We...
- Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics : Abstract: Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies, partial observability, and memory effects. The Bellman equation that is the central pilla...
- Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI : Abstract: Grassroots Logic Programs (GLP) is a concurrent logic programming language with variables partitioned into paired \emph{readers} and \emph{writers}, conjuring both linear logic and futures/p...
- From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers : Abstract: Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow a...
- Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs : Abstract: Hallucinations in large language models remain a persistent challenge, particularly in multilingual and generative settings where factual consistency is difficult to maintain. While recent m...
- PANC: Prior-Aware Normalized Cut for Object Segmentation : Abstract: Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literature deliver non-deterministic ...
- TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering : Abstract: As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamper resistance against unsafe modifications, whether accidental or intentional, becomes crit...
- Supercharging Simulation-Based Inference for Bayesian Optimal Experimental Design : Abstract: Bayesian optimal experimental design (BOED) seeks to maximize the expected information gain (EIG) of experiments. This requires a likelihood estimate, which in many settings is intractable. ...
- NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices : Abstract: While large-scale text-to-image diffusion models continue to improve in visual quality, their increasing scale has widened the gap between state-of-the-art models and on-device solutions. To...
- TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code : Abstract: Large Language Models (LLMs) often generate code with subtle but critical bugs, especially for complex tasks. Existing automated repair methods typically rely on superficial pass/fail signal...
- Zero-shot Generalizable Graph Anomaly Detection with Mixture of Riemannian Experts : Abstract: Graph Anomaly Detection (GAD) aims to identify irregular patterns in graph data, and recent works have explored zero-shot generalist GAD to enable generalization to unseen graph datasets. Ho...
- The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models : Abstract: Mechanistic interpretability aims to reverse-engineer the internal computations of Large Language Models (LLMs), yet separating sparse semantic signals from high-dimensional polysemantic noi...
- Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping : Abstract: While modern text-to-image models excel at prompt-based generation, they often lack the fine-grained control necessary for specific user requirements like spatial layouts or subject appearan...
- The Representational Geometry of Number : Abstract: A central question in cognitive science is whether conceptual representations converge onto a shared manifold to support generalization, or diverge into orthogonal subspaces to minimize task...
- AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models : Abstract: Reinforcement learning from human feedback (RLHF) shows promise for aligning diffusion and flow models, yet policy optimization methods such as GRPO suffer from inefficient and static sampli...
- AI-Generated Music Detection in Broadcast Monitoring : Abstract: AI music generators have advanced to the point where their outputs are often indistinguishable from human compositions. While detection methods have emerged, they are typically designed and ...
- Bridging 6G IoT and AI: LLM-Based Efficient Approach for Physical Layer's Optimization Tasks : Abstract: This paper investigates the role of large language models (LLMs) in sixth-generation (6G) Internet of Things (IoT) networks and proposes a prompt-engineering-based real-time feedback and ver...
- SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments : Abstract: We address the over-constrained planning problem in semi-static environments. The planning objective is to find a best-effort solution that avoids all hard constraint regions while minimally...
- On the Identifiability of Steering Vectors in Large Language Models : Abstract: Activation steering methods, such as persona vectors, are widely used to control large language model behavior and increasingly interpreted as revealing meaningful internal representations. ...
- Generating Data-Driven Reasoning Rubrics for Domain-Adaptive Reward Modeling : Abstract: An impediment to using Large Language Models (LLMs) for reasoning output verification is that LLMs struggle to reliably identify errors in thinking traces, particularly in long outputs, doma...
- Next-generation cyberattack detection with large language models: anomaly analysis across heterogeneous logs : Abstract: This project explores large language models (LLMs) for anomaly detection across heterogeneous log sources. Traditional intrusion detection systems suffer from high false positive rates, sema...
- AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models : Abstract: Concept erasure helps stop diffusion models (DMs) from generating harmful content; but current methods face robustness retention trade off. Robustness means the model fine-tuned by concept e...
- A Unified Framework for LLM Watermarks : Abstract: LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with d...
- Gold Exploration using Representations from a Multispectral Autoencoder : Abstract: Satellite imagery is employed for large-scale prospectivity mapping due to the high cost and typically limited availability of on-site mineral exploration data. In this work, we present a pr...
- Optimal Abstractions for Verifying Properties of Kolmogorov-Arnold Networks (KANs) : Abstract: We present a novel approach for verifying properties of Kolmogorov-Arnold Networks (KANs), a class of neural networks characterized by nonlinear, univariate activation functions typically im...
- Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding : Abstract: Multi-Agent Path Finding (MAPF) is a representative multi-agent coordination problem, where multiple agents are required to navigate to their respective goals without collisions. Solving MAP...
- GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models : Abstract: Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensif...
- F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, large group sizes are not feasible...
- SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers : Abstract: Generative models for de novo protein backbone design have achieved remarkable success in creating novel protein structures. However, these diffusion-based approaches remain computationally ...
- compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data : Abstract: Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and h...
- Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity : Abstract: Post-training improves instruction-following and helpfulness of large language models (LLMs) but often reduces generation diversity, which leads to repetitive outputs in open-ended settings,...
- Multimodal Generative Retrieval Model with Staged Pretraining for Food Delivery on Meituan : Abstract: Multimodal retrieval models are becoming increasingly important in scenarios such as food delivery, where rich multimodal features can meet diverse user needs and enable precise retrieval. M...
- RAPID: Reconfigurable, Adaptive Platform for Iterative Design : Abstract: Developing robotic manipulation policies is iterative and hypothesis-driven: researchers test tactile sensing, gripper geometries, and sensor placements through real-world data collection an...
- Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations : Abstract: Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex rewar...
- Temperature Scaling Attack Disrupting Model Confidence in Federated Learning : Abstract: Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. Whil...
- Trust Regions Sell, But Who's Buying? Overlap Geometry as an Alternative Trust Region for Policy Optimization : Abstract: Standard trust-region methods constrain policy updates via Kullback-Leibler (KL) divergence. However, KL controls only an average divergence and does not directly prevent rare, large likelih...
- DAVE: Distribution-aware Attribution via ViT Gradient Decomposition : Abstract: Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectur...
- The challenge of generating and evolving real-life like synthetic test data without accessing real-world raw data -- a Systematic Review : Abstract: Background: High-level system testing of applications that use data from e-Government services as input requires test data that is real-life-like but where the privacy of personal informatio...
- Scaling Speech Tokenizers with Diffusion Autoencoders : Abstract: Speech tokenizers are foundational to speech language models, yet existing approaches face two major challenges: (1) balancing trade-offs between encoding semantics for understanding and aco...
- Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response : Abstract: Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic analysis but suffers from non-stationarity and the need to maintain diverse populations of str...
- Personality as Relational Infrastructure: User Perceptions of Personality-Trait-Infused LLM Messaging : Abstract: Digital behaviour change systems increasingly rely on repeated, system-initiated messages to support users in everyday contexts. LLMs enable these messages to be personalised consistently ac...
- AgentStepper: Interactive Debugging of Software Development Agents : Abstract: Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and program repair. Unfortunately, und...
- ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification : Abstract: Prototypical parts-based models offer a "this looks like that" paradigm for intrinsic interpretability, yet they typically struggle with ImageNet-scale generalization and often require compu...
- Target noise: A pre-training based neural network initialization for efficient high resolution learning : Abstract: Weight initialization plays a crucial role in the optimization behavior and convergence efficiency of neural networks. Most existing initialization methods, such as Xavier and Kaiming initia...
- Exploring Sparsity and Smoothness of Arbitrary $\ell_p$ Norms in Adversarial Attacks : Abstract: Adversarial attacks against deep neural networks are commonly constructed under $\ell_p$ norm constraints, most often using $p=1$, $p=2$ or $p=\infty$, and potentially regularized for specif...
- Perturbing the Phase: Analyzing Adversarial Robustness of Complex-Valued Neural Networks : Abstract: Complex-valued neural networks (CVNNs) are rising in popularity for all kinds of applications. To safely use CVNNs in practice, analyzing their robustness against outliers is crucial. One we...
- Transformer-based Parameter Fitting of Models derived from Bloch-McConnell Equations for CEST MRI Analysis : Abstract: Chemical exchange saturation transfer (CEST) MRI is a non-invasive imaging modality for detecting metabolites. It offers higher resolution and sensitivity compared to conventional magnetic r...
- SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs : Abstract: Despite recent successes, test-time scaling - i.e., dynamically expanding the token budget during inference as needed - remains brittle for vision-language models (VLMs): unstructured chains...
- Which Graph Shift Operator? A Spectral Answer to an Empirical Question : Abstract: Graph Neural Networks (GNNs) have established themselves as the leading models for learning on graph-structured data, generally categorized into spatial and spectral approaches. Central to t...
- LIBERO-X: Robustness Litmus for Vision-Language-Action Models : Abstract: Reliable benchmarking is critical for advancing Vision-Language-Action (VLA) models, as it reveals their generalization, robustness, and alignment of perception with language-driven manipula...
- Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion : Abstract: Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode is ac...
- Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study : Abstract: Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through commu...
- MTQE.en-he: Machine Translation Quality Estimation for English-Hebrew : Abstract: We release MTQE.en-he: to our knowledge, the first publicly available English-Hebrew benchmark for Machine Translation Quality Estimation. MTQE.en-he contains 959 English segments from WMT24...
- Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks : Abstract: Information retrieval (IR) evaluation remains challenging due to incomplete IR benchmark datasets that contain unlabeled relevant chunks. While LLMs and LLM-human hybrid strategies reduce co...
- Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention : Abstract: Feedforward models for novel view synthesis (NVS) have recently advanced by transformer-based methods like LVSM, using attention among all input and target views. In this work, we argue that...
- Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning : Abstract: Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behavi...
- Revisiting the Shape Convention of Transformer Language Models : Abstract: Dense Transformer language models have largely adhered to one consistent architectural shape: each layer consists of an attention module followed by a feed-forward network (FFN) with a narro...
- Improve Large Language Model Systems with User Logs : Abstract: Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and di...
- Principle-Evolvable Scientific Discovery via Uncertainty Minimization : Abstract: Large Language Model (LLM)-based scientific agents have accelerated scientific discovery, yet they often suffer from significant inefficiencies due to adherence to fixed initial priors. Exis...
- CORE: Comprehensive Ontological Relation Evaluation for Large Language Models : Abstract: Large Language Models (LLMs) perform well on many reasoning benchmarks, yet existing evaluations rarely assess their ability to distinguish between meaningful semantic relations and genuine ...
- TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents : Abstract: We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output ...
- TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking : Abstract: Large Language Models (LLMs) have become integral to many domains, making their safety a critical priority. Prior jailbreaking research has explored diverse approaches, including prompt opti...
- A methodology for analyzing financial needs hierarchy from social discussions using LLM : Abstract: This study examines the hierarchical structure of financial needs as articulated in social media discourse, employing generative AI techniques to analyze large-scale textual data. While huma...
- Investigating the structure of emotions by analyzing similarity and association of emotion words : Abstract: In the field of natural language processing, some studies have attempted sentiment analysis on text by handling emotions as explanatory or response variables. One of the most popular emotion...
- TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction : Abstract: 3D semantic occupancy prediction enables autonomous vehicles (AVs) to perceive fine-grained geometric and semantic structure of their surroundings from onboard sensors, which is essential fo...
- ARIS-RSMA Enhanced ISAC System: Joint Rate Splitting and Beamforming Design : Abstract: This letter proposes an active reconfigurable intelligent surface (ARIS) assisted rate-splitting multiple access (RSMA) integrated sensing and communication (ISAC) system to overcome the fai...
- Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers : Abstract: Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to a...
- Generating High-quality Privacy-preserving Synthetic Data : Abstract: Synthetic tabular data enables sharing and analysis of sensitive records, but its practical deployment requires balancing distributional fidelity, downstream utility, and privacy protection....
- Revisiting Salient Object Detection from an Observer-Centric Perspective : Abstract: Salient object detection is inherently a subjective problem, as observers with different priors may perceive different objects as salient. However, existing methods predominantly formulate i...
- Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation : Abstract: Fine-tuning large language models (LLMs) for specialized domains often necessitates a trade-off between acquiring domain expertise and retaining general reasoning capabilities, a phenomenon ...
- SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass : Abstract: We propose SHINE (Scalable Hyper In-context NEtwork), a scalable hypernetwork that can map diverse meaningful contexts into high-quality LoRA adapters for large language models (LLM). By reu...
- Di3PO -- Diptych Diffusion DPO for Targeted Improvements in Image : Abstract: Existing methods for preference tuning of text-to-image (T2I) diffusion models often rely on computationally expensive generation steps to create positive and negative pairs of images. These...
- Zero-Trust Runtime Verification for Agentic Payment Protocols: Mitigating Replay and Context-Binding Failures in AP2 : Abstract: The deployment of autonomous AI agents capable of executing commercial transactions has motivated the adoption of mandate-based payment authorization protocols, including the Universal Comme...
- Action Hallucination in Generative Visual-Language-Action Models : Abstract: Robot Foundation Models such as Vision-Language-Action models are rapidly reshaping how robot policies are trained and deployed, replacing hand-designed planners with end-to-end generative a...
- Can Post-Training Transform LLMs into Causal Reasoners? : Abstract: Causal inference is essential for decision-making but remains challenging for non-experts. While large language models (LLMs) show promise in this domain, their precise causal estimation cap...
- The Condensate Theorem: Transformers are O(n), Not $O(n^2)$ : Abstract: We present the Condensate Theorem: attention sparsity is a learned topological property, not an architectural constraint. Through empirical analysis of trained language models, we find that ...
- Accelerating Vision Transformers on Brain Processing Unit : Abstract: With the advancement of deep learning technologies, specialized neural processing hardware such as Brain Processing Units (BPUs) have emerged as dedicated platforms for CNN acceleration, off...
- Toward generative machine learning for boosting ensembles of climate simulations : Abstract: Accurately quantifying uncertainty in predictions and projections arising from irreducible internal climate variability is critical for informed decision making. Such uncertainty is typicall...
- Can One-sided Arguments Lead to Response Change in Large Language Models? : Abstract: Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models (LLMs) can provide a balanced answer, but also take a single aligned viewpoint or refuse to...
- GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt : Abstract: Safety alignment is only as robust as its weakest failure mode. Despite extensive work on safety post-training, it has been shown that models can be readily unaligned through post-deployment...
- Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions : Abstract: Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for precisely controlling large language models...
- ASMa: Asymmetric Spatio-temporal Masking for Skeleton Action Representation Learning : Abstract: Self-supervised learning (SSL) has shown remarkable success in skeleton-based action recognition by leveraging data augmentations to learn meaningful representations. However, existing SSL m...
- REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop : Abstract: Machine unlearning for LLMs aims to remove sensitive or copyrighted data from trained models. However, the true efficacy of current unlearning methods remains uncertain. Standard evaluation ...
- ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks : Abstract: Counterfactual explanations offer an intuitive way to interpret graph neural networks (GNNs) by identifying minimal changes that alter a model's prediction, thereby answering "what must diff...
- RuleSmith: Multi-Agent LLMs for Automated Game Balancing : Abstract: Game balancing is a longstanding challenge requiring repeated playtesting, expert intuition, and extensive manual tuning. We introduce RuleSmith, the first framework that achieves automated ...
- SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections : Abstract: The growth of machine learning demands interpretable models for critical applications, yet most high-performing models are ``black-box'' systems that obscure input-output relationships, whil...
- Coupled Local and Global World Models for Efficient First Order RL : Abstract: World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, ...
- Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models : Abstract: End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models ...
- Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations : Abstract: Recent empirical evidence has demonstrated that the training dynamics of large-scale deep neural networks occur within low-dimensional subspaces. While this has inspired new research into lo...
- Multi-Way Representation Alignment : Abstract: The Platonic Representation Hypothesis suggests that independently trained neural networks converge to increasingly similar latent spaces. However, current strategies for mapping these repre...
- Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning : Abstract: Low-Rank Adaptation (LoRA) is a standard tool for parameter-efficient finetuning of large models. While it induces a small memory footprint, its training dynamics can be surprisingly complex...
- AnyThermal: Towards Learning Universal Representations for Thermal Perception : Abstract: We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and...
- Personagram: Bridging Personas and Product Design for Creative Ideation with Multimodal LLMs : Abstract: Product designers often begin their design process with handcrafted personas. While personas are intended to ground design decisions in consumer preferences, they often fall short in practic...
- Generics in science communication: Misaligned interpretations across laypeople, scientists, and large language models : Abstract: Scientists often use generics, that is, unquantified statements about whole categories of people or phenomena, when communicating research findings (e.g., "statins reduce cardiovascular even...
- Optimal rates for density and mode estimation with expand-and-sparsify representations : Abstract: Expand-and-sparsify representations are a class of theoretical models that capture sparse representation phenomena observed in the sensory systems of many animals. At a high level, these rep...
- Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding : Abstract: Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitiga...
- Protean Compiler: An Agile Framework to Drive Fine-grain Phase Ordering : Abstract: The phase ordering problem has been a long-standing challenge since the late 1970s, yet it remains an open problem due to having a vast optimization space and an unbounded nature, making it ...
- Hear You in Silence: Designing for Active Listening in Human Interaction with Conversational Agents Using Context-Aware Pacing : Abstract: In human conversation, empathic dialogue requires nuanced temporal cues indicating whether the conversational partner is paying attention. This type of "active listening" is overlooked in th...
- Self-Improving World Modelling with Latent Actions : Abstract: Internal modelling of the world -- predicting transitions between previous states $X$ and next states $Y$ under actions $Z$ -- is essential to reasoning and planning for LLMs and VLMs. Learn...
- Urban Spatio-Temporal Foundation Models for Climate-Resilient Housing: Scaling Diffusion Transformers for Disaster Risk Prediction : Abstract: Climate hazards increasingly disrupt urban transportation and emergency-response operations by damaging housing stock, degrading infrastructure, and reducing network accessibility. This pape...
- Coding Agents with Environment Interaction: A Theoretical Perspective : Abstract: Coding agents are increasingly utilized in test-driven software development, yet the theoretical mechanisms behind their environment-interaction strategies remain underexplored. We provide a...
- NanoNet: Parameter-Efficient Learning with Label-Scarce Supervision for Lightweight Text Mining Model : Abstract: The lightweight semi-supervised learning (LSL) strategy provides an effective approach of conserving labeled samples and minimizing model inference costs. Prior research has effectively appl...
- SVRepair: Structured Visual Reasoning for Automated Program Repair : Abstract: Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic si...
- Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments : Abstract: We introduce a Transformer-based Reinforcement Learning framework for autonomous orbital collision avoidance that explicitly models the effects of partial observability and imperfect monitor...
- Communication Enhances LLMs' Stability in Strategic Thinking : Abstract: Large Language Models (LLMs) often exhibit pronounced context-dependent variability that undermines predictable multi-agent behavior in tasks requiring strategic thinking. Focusing on models...
- Allocate Marginal Reviews to Borderline Papers Using LLM Comparative Ranking : Abstract: This paper argues that large ML conferences should allocate marginal review capacity primarily to papers near the acceptance boundary, rather than spreading extra reviews via random or affin...
- HQP: Sensitivity-Aware Hybrid Quantization and Pruning for Ultra-Low-Latency Edge AI Inference : Abstract: The escalating demand for high-fidelity, real-time inference in distributed edge-cloud environments necessitates aggressive model optimization to counteract severe latency and energy constra...
- iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems : Abstract: Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the...
- Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space : Abstract: Embedding models are a fundamental component of modern AI systems such as semantic search and retrieval-augmented generation. Recent advances in large foundation models have substantially ac...
- Rethinking Memory Mechanisms of Foundation Agents in the Second Half : Abstract: The research of artificial intelligence is undergoing a paradigm shift from prioritizing model innovations over benchmark scores towards emphasizing problem definition and rigorous real-worl...
- Recontextualizing Famous Quotes for Brand Slogan Generation : Abstract: Slogans are concise and memorable catchphrases that play a crucial role in advertising by conveying brand identity and shaping public perception. However, advertising fatigue reduces the eff...
- Git for Sketches: An Intelligent Tracking System for Capturing Design Evolution : Abstract: During product conceptualization, capturing the non-linear history and cognitive intent is crucial. Traditional sketching tools often lose this context. We introduce DIMES (Design Idea Manag...
- Agentic Uncertainty Reveals Agentic Overconfidence : Abstract: Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhib...
- AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents : Abstract: LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced f...
- From Features to Actions: Explainability in Traditional and Agentic AI Systems : Abstract: Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision ...
- An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization : Abstract: Federated learning enables collaborative model training across distributed clients while preserving data privacy. However, in practical deployments, device heterogeneity, non-independent, an...
- LLM Active Alignment: A Nash Equilibrium Perspective : Abstract: We develop a game-theoretic framework for predicting and steering the behavior of populations of large language models (LLMs) through Nash equilibrium (NE) analysis. To avoid the intractabil...
- POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models : Abstract: Large foundation models (LFMs) achieve strong performance through scaling, yet current structural pruning methods derive fixed pruning decisions during inference, overlooking sparsity patter...
- ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training : Abstract: Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and ex...
- Wild Guesses and Mild Guesses in Active Concept Learning : Abstract: Human concept learning is typically active: learners choose which instances to query or test in order to reduce uncertainty about an underlying rule or category. Active concept learning must...
- Towards Understanding What State Space Models Learn About Code : Abstract: State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding task...
- Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions : Abstract: We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, universal policy capable of generalising to arbitrary, possibly unseen tasks. We consider tasks ...
- Autoregressive Models for Knowledge Graph Generation : Abstract: Knowledge Graph (KG) generation requires models to learn complex semantic dependencies between triples while maintaining domain validity constraints. Unlike link prediction, which scores tri...
- Same Answer, Different Representations: Hidden instability in VLMs : Abstract: The robustness of Vision Language Models (VLMs) is commonly assessed through output-level invariance, implicitly assuming that stable predictions reflect stable multimodal processing. In thi...
- SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees : Abstract: Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM)-based AI agents. However, existing backbone RL algorithms lack verified convergenc...
- AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research : Abstract: Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most e...
- LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models : Abstract: Large language models have demonstrated notable performance across various logical reasoning benchmarks. However, it remains unclear which core logical skills they truly master. To address t...
- HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction : Abstract: Scaling test-time compute with multi-path chain-of-thought improves reasoning accuracy, but its effectiveness depends critically on the exploration-exploitation trade-off. Existing approache...
- Progress Constraints for Reinforcement Learning in Behavior Trees : Abstract: Behavior Trees (BTs) provide a structured and reactive framework for decision-making, commonly used to switch between sub-controllers based on environmental conditions. Reinforcement Learnin...
- JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks : Abstract: Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommo...
- AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents : Abstract: While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabiliti...
- Intrinsic Stability Limits of Autoregressive Reasoning: Structural Consequences for Long-Horizon Execution : Abstract: Large language models (LLMs) demonstrate remarkable reasoning capabilities, yet their performance often deteriorates sharply in long-horizon tasks, exhibiting systematic breakdown beyond cer...
- Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization : Abstract: Current tokenization methods process sequential data without accounting for signal quality, limiting their effectiveness on noisy real-world corpora. We present QA-Token (Quality-Aware Token...
- Difficulty-Estimated Policy Optimization : Abstract: Recent advancements in Large Reasoning Models (LRMs), exemplified by DeepSeek-R1, have underscored the potential of scaling inference-time compute through Group Relative Policy Optimization ...
- Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion : Abstract: GUI grounding maps natural language instructions to the correct interface elements, serving as the perception foundation for GUI agents. Existing approaches predominantly rely on fine-tuning...
- Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems : Abstract: Large Reasoning Models (LRMs) have advanced rapidly; however, existing benchmarks in mathematics, code, and common-sense reasoning remain limited. They lack long-context evaluation, offer in...
- Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making : Abstract: Large language models (LLMs) are increasingly deployed as agents in high-stakes domains where optimal actions depend on both uncertainty about the world and consideration of utilities of dif...
- Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version) : Abstract: In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear T...
- Large Language Model Reasoning Failures : Abstract: Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failur...
- Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning : Abstract: Reinforcement learning (RL) for large language models (LLMs) remains expensive, particularly because the rollout is expensive. Decoupling rollout generation from policy optimization (e.g., l...
Research Sources: 392 | Generated: 2/9/2026
