AI RESEARCH PAPERS & ACADEMIC SOURCES
- From Implicit Ambiguity to Explicit Solidity: Diagnosing Interior Geometric Degradation in Neural Radiance Fields for Dense 3D Scene Understanding : Abstract: Neural Radiance Fields (NeRFs) have emerged as a powerful paradigm for multi-view reconstruction, complementing classical photogrammetric pipelines based on Structure-from-Motion (SfM) and M...
- Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention for AI-Generated Image Detection : Abstract: The rapid advancement of Generative Adversarial Networks (GANs) and diffusion models has enabled the creation of highly realistic synthetic images, presenting significant societal risks, suc...
- Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment : Abstract: Human visual perception naturally evaluates image quality across multiple scales, a hierarchical process that existing blind image quality assessment (BIQA) algorithms struggle to replicate ...
- Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement : Abstract: Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhanc...
- UPDA: Unsupervised Progressive Domain Adaptation for No-Reference Point Cloud Quality Assessment : Abstract: While no-reference point cloud quality assessment (NR-PCQA) approaches have achieved significant progress over the past decade, their performance often degrades substantially when a distribu...
- Learning Perceptual Representations for Gaming NR-VQA with Multi-Task FR Signals : Abstract: No-reference video quality assessment (NR-VQA) for gaming videos is challenging due to limited human-rated datasets and unique content characteristics including fast motion, stylized graphic...
- U-DAVI: Uncertainty-Aware Diffusion-Prior-Based Amortized Variational Inference for Image Reconstruction : Abstract: Ill-posed imaging inverse problems remain challenging due to the ambiguity in mapping degraded observations to clean images. Diffusion-based generative priors have recently shown promise, bu...
- Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering : Abstract: Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which it...
- Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching : Abstract: Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a ...
- Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching : Abstract: We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via task-specific low-rank ad...
- EO-VAE: Towards A Multi-sensor Tokenizer for Earth Observation Data : Abstract: State-of-the-art generative image and video models rely heavily on tokenizers that compress high-dimensional inputs into more efficient latent representations. While this paradigm has revolu...
- DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation : Abstract: Recent advancements in foundation models have revolutionized joint audio-video generation. However, existing approaches typically treat human-centric tasks including reference-based audio-vi...
- TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation : Abstract: High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations eithe...
- FAIL: Flow Matching Adversarial Imitation Learning for Image Generation : Abstract: Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to imitation learning. While Supervised Fine-Tuning mimics expe...
- PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback : Abstract: Image-to-poster generation is a high-demand task requiring not only local adjustments but also high-level design understanding. Models must generate text, layout, style, and visual elements ...
- AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer : Abstract: The digital industry demands high-quality, diverse modular 3D assets, especially for user-generated content~(UGC). In this work, we introduce AssetFormer, an autoregressive Transformer-based...
- GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning : Abstract: Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future ...
- A DMD-Based Adaptive Modulation Method for High Dynamic Range Imaging in High-Glare Environments : Abstract: Background The accuracy of photomechanics measurements critically relies on image quality,particularly under extreme illumination conditions such as welding arc monitoring and polished metal...
- Projected Representation Conditioning for High-fidelity Novel View Synthesis : Abstract: We propose a novel framework for diffusion-based novel view synthesis in which we leverage external representations as conditions, harnessing their geometric and semantic correspondence prop...
- Can Local Vision-Language Models improve Activity Recognition over Vision Transformers? -- Case Study on Newborn Resuscitation : Abstract: Accurate documentation of newborn resuscitation is essential for quality improvement and adherence to clinical guidelines, yet remains underutilized in practice. Previous work using 3D-CNNs ...
- Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation : Abstract: While diffusion models have shown exceptional capabilities in aesthetic image synthesis, they often struggle with complex spatial understanding and reasoning. Existing approaches resort to M...
- DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition : Abstract: Generative models have advanced significantly in realistic image synthesis, with diffusion models excelling in quality and stability. Recent multi-view diffusion models improve 3D-aware stre...
- WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains : Abstract: Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient m...
- JEPA-VLA: Video Predictive Embedding is Needed for VLA Models : Abstract: Recent vision-language-action (VLA) models built upon pretrained vision-language models (VLMs) have achieved significant improvements in robotic manipulation. However, current VLAs still suf...
- Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data : Abstract: Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants r...
- Light4D: Training-Free Extreme Viewpoint 4D Video Relighting : Abstract: Recent advances in diffusion-based generative models have established a new paradigm for image and video relighting. However, extending these capabilities to 4D relighting remains challengin...
- Code2Worlds: Empowering Coding LLMs for 4D World Generation : Abstract: Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, ext...
- Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation : Abstract: Mainstream Test-Time Adaptation (TTA) methods for adapting vision-language models, e.g., CLIP, typically rely on Shannon Entropy (SE) at test time to measure prediction uncertainty and incon...
- STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning : Abstract: In vision-language models (VLMs), misalignment between textual descriptions and visual coordinates often induces hallucinations. This issue becomes particularly severe in dense prediction ta...
- GSO-SLAM: Bidirectionally Coupled Gaussian Splatting and Direct Visual Odometry : Abstract: We propose GSO-SLAM, a real-time monocular dense SLAM system that leverages Gaussian scene representation. Unlike existing methods that couple tracking and mapping with a unified scene, incu...
- TG-Field: Geometry-Aware Radiative Gaussian Fields for Tomographic Reconstruction : Abstract: 3D Gaussian Splatting (3DGS) has revolutionized 3D scene representation with superior efficiency and quality. While recent adaptations for computed tomography (CT) show promise, they struggl...
- RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval : Abstract: 3D assets have rapidly expanded in quantity and diversity due to the growing popularity of virtual reality and gaming. As a result, text-to-shape retrieval has become essential in facilitati...
- U-Net with Hadamard Transform and DCT Latent Spaces for Next-day Wildfire Spread Prediction : Abstract: We developed a lightweight and computationally efficient tool for next-day wildfire spread prediction using multimodal satellite data as input. The deep learning model, which we call Transfo...
- Egocentric Gaze Estimation via Neck-Mounted Camera : Abstract: This paper introduces neck-mounted view gaze estimation, a new task that estimates user gaze from the neck-mounted camera perspective. Prior work on egocentric gaze estimation, which predict...
- Clutt3R-Seg: Sparse-view 3D Instance Segmentation for Language-grounded Grasping in Cluttered Scenes : Abstract: Reliable 3D instance segmentation is fundamental to language-grounded robotic manipulation. Its critical application lies in cluttered environments, where occlusions, limited viewpoints, and...
- EmoSpace: Fine-Grained Emotion Prototype Learning for Immersive Affective Content Generation : Abstract: Emotion is important for creating compelling virtual reality (VR) content. Although some generative methods have been applied to lower the barrier to creating emotionally rich content, they ...
- GR-Diffusion: 3D Gaussian Representation Meets Diffusion in Whole-Body PET Reconstruction : Abstract: Positron emission tomography (PET) reconstruction is a critical challenge in molecular imaging, often hampered by noise amplification, structural blurring, and detail loss due to sparse samp...
- Electrostatics-Inspired Surface Reconstruction (EISR): Recovering 3D Shapes as a Superposition of Poisson's PDE Solutions : Abstract: Implicit shape representation, such as SDFs, is a popular approach to recover the surface of a 3D shape as the level sets of a scalar field. Several methods approximate SDFs using machine le...
- A Large Language Model for Disaster Structural Reconnaissance Summarization : Abstract: Artificial Intelligence (AI)-aided vision-based Structural Health Monitoring (SHM) has emerged as an effective approach for monitoring and assessing structural condition by analyzing image a...
- Move What Matters: Parameter-Efficient Domain Adaptation via Optimal Transport Flow for Collaborative Perception : Abstract: Fast domain adaptation remains a fundamental challenge for deploying multi-agent systems across diverse environments in Vehicle-to-Everything (V2X) collaborative perception. Despite the succ...
- LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts : Abstract: Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded di...
- Supervise-assisted Multi-modality Fusion Diffusion Model for PET Restoration : Abstract: Positron emission tomography (PET) offers powerful functional imaging but involves radiation exposure. Efforts to reduce this exposure by lowering the radiotracer dose or scan time can degra...
- Vascular anatomy-aware self-supervised pre-training for X-ray angiogram analysis : Abstract: X-ray angiography is the gold standard imaging modality for cardiovascular diseases. However, current deep learning approaches for X-ray angiogram analysis are severely constrained by the sc...
- What if Agents Could Imagine? Reinforcing Open-Vocabulary HOI Comprehension through Generation : Abstract: Multimodal Large Language Models have shown promising capabilities in bridging visual and textual reasoning, yet their reasoning capabilities in Open-Vocabulary Human-Object Interaction (OV-...
- Arbitrary Ratio Feature Compression via Next Token Prediction : Abstract: Feature compression is increasingly important for improving the efficiency of downstream tasks, especially in applications involving large-scale or multi-modal data. While existing methods t...
- A Dual-Branch Framework for Semantic Change Detection with Boundary and Temporal Awareness : Abstract: Semantic Change Detection (SCD) aims to detect and categorize land-cover changes from bi-temporal remote sensing images. Existing methods often suffer from blurred boundaries and inadequate ...
- Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation : Abstract: Object-level manipulation, relocating or reorienting objects in images or videos while preserving scene realism, is central to film post-production, AR, and creative editing. Yet existing me...
- ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model : Abstract: Many Art History articles discuss artworks in general as well as specific parts of works, such as layout, iconography, or material culture. However, when viewing an artwork, it is not trivia...
- Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content : Abstract: Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly ...
- MDE-VIO: Enhancing Visual-Inertial Odometry Using Learned Depth Priors : Abstract: Traditional monocular Visual-Inertial Odometry (VIO) systems struggle in low-texture environments where sparse visual features are insufficient for accurate pose estimation. To address this,...
- Selective Prior Synchronization via SYNC Loss : Abstract: Prediction under uncertainty is a critical requirement for the deep neural network to succeed responsibly. This paper focuses on selective prediction, which allows DNNs to make informed deci...
- Advancing Digital Twin Generation Through a Novel Simulation Framework and Quantitative Benchmarking : Abstract: The generation of 3D models from real-world objects has often been accomplished through photogrammetry, i.e., by taking 2D photos from a variety of perspectives and then triangulating matche...
- Stress Tests REVEAL Fragile Temporal and Visual Grounding in Video-Language Models : Abstract: This work investigates a fundamental question: Do Video-Language Models (VidLMs) robustly account for video content, temporal sequence, and motion? Our investigation shows that, surprisingly...
- ReTracing: An Archaeological Approach Through Body, Machine, and Generative Systems : Abstract: We present ReTracing, a multi-agent embodied performance art that adopts an archaeological approach to examine how artificial intelligence shapes, constrains, and produces bodily movement. D...
- DD-MDN: Human Trajectory Forecasting with Diffusion-Based Dual Mixture Density Networks and Uncertainty Self-Calibration : Abstract: Human Trajectory Forecasting (HTF) predicts future human movements from past trajectories and environmental context, with applications in Autonomous Driving, Smart Surveillance, and Human-Ro...
- Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization : Abstract: LLM-powered embodied agents have shown success on conventional object-rearrangement tasks, but providing personalized assistance that leverages user-specific knowledge from past interactions...
- Artificial intelligence is creating a new global linguistic hierarchy : Abstract: Artificial intelligence (AI) has the potential to transform healthcare, education, governance and socioeconomic equity, but its benefits remain concentrated in a small number of languages (B...
- More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles : Abstract: Watermarking has emerged as a crucial technique for detecting and attributing content generated by large language models. While recent advancements have utilized watermark ensembles to enhan...
- Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding : Abstract: We study object hallucination in Multimodal Large Language Models (MLLMs) and improve visual contrastive decoding (VCD) by constructing an object-aligned auxiliary view. We leverage object-c...
- Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models : Abstract: Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems. Adversarial users exploit these models thr...
- ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning : Abstract: Building general-purpose embodied agents across diverse hardware remains a central challenge in robotics, often framed as the ''one-brain, many-forms'' paradigm. Progress is hindered by frag...
- Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation : Abstract: We present Agent-Diff, a novel benchmarking framework for evaluating agentic Large Language Models (LLMs) on real-world tasks that execute code via external APIs. Agentic LLM performance var...
- Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning : Abstract: The web's information ecosystem demands fact-checking systems that are both scalable and epistemically trustworthy. Automated approaches offer efficiency but often lack transparency, while h...
- On-Policy Context Distillation for Language Models : Abstract: Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (OPCD), a framework that bridge...
- Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation : Abstract: Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures p...
- ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images : Abstract: Enterprise documents, such as forms and reports, embed critical information for downstream applications like data archiving, automated workflows, and analytics. Although generalist Vision La...
- Query-focused and Memory-aware Reranker for Long Context Processing : Abstract: Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the a...
- CitiLink-Minutes: A Multilayer Annotated Dataset of Municipal Meeting Minutes : Abstract: City councils play a crucial role in local governance, directly influencing citizens' daily lives through decisions made during municipal meetings. These deliberations are formally documente...
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models : Abstract: With the rapid integration of advanced reasoning capabilities into spoken dialogue models, the field urgently demands benchmarks that transcend simple interactions to address real-world comp...
- A Rule-based Computational Model for Gaidhlig Morphology : Abstract: Language models and software tools are essential to support the continuing vitality of lesser-used languages; however, currently popular neural models require considerable data for training,...
- P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling : Abstract: Personalized alignment of large language models seeks to adapt responses to individual user preferences, typically via reinforcement learning. A key challenge is obtaining accurate, user-spe...
- Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models : Abstract: Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. R...
- Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study : Abstract: Deploying large language models for clinical Text-to-SQL requires distinguishing two qualitatively different causes of output diversity: (i) input ambiguity that should trigger clarification...
- LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss : Abstract: Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. E...
- Automatic Simplification of Common Vulnerabilities and Exposures Descriptions : Abstract: Understanding cyber security is increasingly important for individuals and organizations. However, a lot of information related to cyber security can be difficult to understand to those not ...
- DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling : Abstract: In this resource paper, we present DHPLT, an open collection of diachronic corpora in 41 diverse languages. DHPLT is based on the web-crawled HPLT datasets; we use web crawl timestamps as th...
- Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models : Abstract: Open large language models (LLMs) have demonstrated improving multilingual capabilities in recent years. In this paper, we present a study of open LLMs for multilingual machine translation (...
- Do Large Language Models Adapt to Language Variation across Socioeconomic Status? : Abstract: Humans adjust their linguistic style to the audience they are addressing. However, the extent to which LLMs adapt to different social contexts is largely unknown. As these models increasingl...
- Who is the richest club in the championship? Detecting and Rewriting Underspecified Questions Improve QA Performance : Abstract: Large language models (LLMs) perform well on well-posed questions, yet standard question-answering (QA) benchmarks remain far from solved. We argue that this gap is partly due to underspecif...
- Cross-Modal Robustness Transfer (CMRT): Training Robust Speech Translation Models Using Adversarial Text : Abstract: End-to-End Speech Translation (E2E-ST) has seen significant advancements, yet current models are primarily benchmarked on curated, "clean" datasets. This overlooks critical real-world challe...
- Benchmark Illusion: Disagreement among LLMs and Its Scientific Consequences : Abstract: Benchmarks underpin how progress in large language models (LLMs) is measured and trusted. Yet our analyses reveal that apparent convergence in benchmark accuracy can conceal deep epistemic d...
- LLM-based Triplet Extraction from Financial Reports : Abstract: Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. ...
- A Subword Embedding Approach for Variation Detection in Luxembourgish User Comments : Abstract: This paper presents an embedding-based approach to detecting variation without relying on prior normalisation or predefined variant lists. The method trains subword embeddings on raw text an...
- Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning : Abstract: Achieving effective test-time scaling requires models to engage in In-Context Exploration -- the intrinsic ability to generate, verify, and refine multiple reasoning hypotheses within a sing...
- Thinking with Drafting: Optical Decompression via Logical Reconstruction : Abstract: Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: o...
- Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models : Abstract: Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely an...
- Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles : Abstract: Large language models (LLMs) show promise for automatically generating feedback in education settings. However, it remains unclear how specific feedback elements, such as tone and informatio...
- PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning : Abstract: Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from ``overthinking'', producing excessively long reasoning traces that increase...
- Scene-Aware Memory Discrimination: Deciding Which Personal Knowledge Stays : Abstract: Intelligent devices have become deeply integrated into everyday life, generating vast amounts of user interactions that form valuable personal knowledge. Efficient organization of this knowl...
- PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering : Abstract: While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency...
- SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent : Abstract: Reinforcement Learning (RL) has empowered Large Language Models (LLMs) to master autonomous search for complex question answering. However, particularly within multi-turn search scenarios, t...
- Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm : Abstract: Pretraining large language models (LLMs) typically requires centralized clusters with thousands of high-memory GPUs (e.g., H100/A100). Recent decentralized training methods reduce communicat...
- When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration : Abstract: When audio and text conflict, speech-enabled language models follow the text 10 times more often than when arbitrating between two text sources, even when explicitly instructed to trust the ...
- ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias : Abstract: Large language models (LLMs) have shown great potential for healthcare applications. However, existing evaluation benchmarks provide minimal coverage of Alzheimer's Disease and Related Demen...
- LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation : Abstract: Looped Transformers have emerged as an efficient and powerful class of models for reasoning in the language domain. Recent studies show that these models achieve strong performance on algori...
- Advancing AI Trustworthiness Through Patient Simulation: Risk Assessment of Conversational Agents for Antidepressant Selection : Abstract: Objective: This paper introduces a patient simulator designed to enable scalable, automated evaluation of healthcare conversational agents. The simulator generates realistic, controllable pa...
- Evaluating Alignment of Behavioral Dispositions in LLMs : Abstract: As LLMs integrate into our daily lives, understanding their behavior becomes essential. In this work, we focus on behavioral dispositions$-$the underlying tendencies that shape responses in ...
- Are Aligned Large Language Models Still Misaligned? : Abstract: Misalignment in Large Language Models (LLMs) arises when model behavior diverges from human expectations and fails to simultaneously satisfy safety, value, and cultural dimensions, which mus...
- SurveyLens: A Research Discipline-Aware Benchmark for Automatic Survey Generation : Abstract: The exponential growth of scientific literature has driven the evolution of Automatic Survey Generation (ASG) from simple pipelines to multi-agent frameworks and commercial Deep Research age...
- The Automatic Verification of Image-Text Claims (AVerImaTeC) Shared Task : Abstract: The Automatic Verification of Image-Text Claims (AVerImaTeC) shared task aims to advance system development for retrieving evidence and verifying real-world image-text claims. Participants w...
- Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning : Abstract: Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the mo...
- MetaMem: Evolving Meta-Memory for Knowledge Utilization through Self-Reflective Symbolic Optimization : Abstract: Existing memory systems enable Large Language Models (LLMs) to support long-horizon human-LLM interactions by persisting historical interactions beyond limited context windows. However, whil...
- Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs : Abstract: Code-mixing and code-switching (CSW) remain challenging phenomena for large language models (LLMs). Despite recent advances in multilingual modeling, LLMs often struggle in mixed-language se...
- Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions : Abstract: Large language models (LLMs) have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque. Mechanistic interpretability (i.e...
- Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth : Abstract: Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications, underpinning state-of-the-art systems in natural language processing, vision, a...
- Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review : Abstract: Author response (rebuttal) writing is a critical stage of scientific peer review that demands substantial author effort. Recent work frames this task as automatic text generation, underusing...
- Synthesizing the Virtual Advocate: A Multi-Persona Speech Generation Framework for Diverse Linguistic Jurisdictions in Indic Languages : Abstract: Legal advocacy requires a unique combination of authoritative tone, rhythmic pausing for emphasis, and emotional intelligence. This study investigates the performance of the Gemini 2.5 Flash...
- PRIME: Policy-Reinforced Iterative Multi-agent Execution for Algorithmic Reasoning in Large Language Models : Abstract: Large language models have demonstrated remarkable capabilities across diverse reasoning tasks, yet their performance on algorithmic reasoning remains limited. To handle this limitation, we ...
- Retrieval Heads are Dynamic : Abstract: Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static stati...
- Response-Based Knowledge Distillation for Multilingual Jailbreak Prevention Unwittingly Compromises Safety : Abstract: Large language models (LLMs) are increasingly deployed worldwide, yet their safety alignment remains predominantly English-centric. This allows for vulnerabilities in non-English contexts, e...
- Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging : Abstract: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpop...
- SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures : Abstract: Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant ...
- Controlling Dynamical Systems into Unseen Target States Using Machine Learning : Abstract: We present a novel, model-free, and data-driven methodology for controlling complex dynamical systems into previously unseen target states, including those with significantly different and c...
- Feature-Based Interpretable Surrogates for Optimization : Abstract: For optimization models to be used in practice, it is crucial that users trust the results. A key factor in this aspect is the interpretability of the solution process. A previous framework ...
- Accelerating Large Language Model Inference with Self-Supervised Early Exits : Abstract: This paper presents a modular approach to accelerate inference in large language models (LLMs) by adding early exit heads at intermediate transformer layers. Each head is trained in a self-s...
- Hyperparameter Transfer with Mixture-of-Expert Layers : Abstract: Mixture-of-Experts (MoE) layers have emerged as an important tool in scaling up modern neural networks by decoupling total trainable parameters from activated parameters in the forward pass ...
- Fine-tuning Quantized Neural Networks with Zeroth-order Optimization : Abstract: As the size of large language models grows exponentially, GPU memory has become a bottleneck for adapting these models to downstream tasks. In this paper, we aim to push the limits of memory...
- SeqRisk: Transformer-augmented latent variable model for robust survival prediction with longitudinal data : Abstract: In healthcare, risk assessment of patient outcomes has been based on survival analysis for a long time, i.e. modeling time-to-event associations. However, conventional approaches rely on dat...
- Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion Generative Models : Abstract: Magnetic resonance imaging (MRI) is a powerful medical imaging modality, but long acquisition times limit throughput, patient comfort, and clinical accessibility. Diffusion-based generative ...
- Accelerating nuclear-norm regularized low-rank matrix optimization through Burer-Monteiro decomposition : Abstract: This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems. BM-Global efficiently decreases the objective value via low-co...
- Learning to Control: The iUzawa-Net for Nonsmooth Optimal Control of Linear PDEs : Abstract: We propose an optimization-informed deep neural network approach, named iUzawa-Net, aiming for the first solver that enables real-time solutions for a class of nonsmooth optimal control prob...
- MonarchRT: Efficient Attention for Real-Time Video Generation : Abstract: Real-time video generation with Diffusion Transformers is bottlenecked by the quadratic cost of 3D self-attention, especially in real-time regimes that are both few-step and autoregressive, ...
- T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization : Abstract: Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constra...
- Is Online Linear Optimization Sufficient for Strategic Robustness? : Abstract: We consider bidding in repeated Bayesian first-price auctions. Bidding algorithms that achieve optimal regret have been extensively studied, but their strategic robustness to the seller's ma...
- Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications : Abstract: Latency-critical speech applications (e.g., live transcription, voice commands, and real-time translation) demand low time-to-first-token (TTFT) and high transcription accuracy, particularly...
- Convex Markov Games and Beyond: New Proof of Existence, Characterization and Learning Algorithms for Nash Equilibria : Abstract: Convex Markov Games (cMGs) were recently introduced as a broad class of multi-agent learning problems that generalize Markov games to settings where strategic agents optimize general utiliti...
- Towards Personalized Bangla Book Recommendation: A Large-Scale Multi-Entity Book Graph Dataset : Abstract: Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-s...
- Iskra: A System for Inverse Geometry Processing : Abstract: We propose a system for differentiating through solutions to geometry processing problems. Our system differentiates a broad class of geometric algorithms, exploiting existing fast problem-s...
- Safety Beyond the Training Data: Robust Out-of-Distribution MPC via Conformalized System Level Synthesis : Abstract: We present a novel framework for robust out-of-distribution planning and control using conformal prediction (CP) and system level synthesis (SLS), addressing the challenge of ensuring safety...
- The Implicit Bias of Logit Regularization : Abstract: Logit regularization, the addition a convex penalty directly in logit space, is widely used in modern classifiers, with label smoothing as a prominent example. While such methods often impro...
- Calibrated Bayesian Deep Learning for Explainable Decision Support Systems Based on Medical Imaging : Abstract: In critical decision support systems based on medical imaging, the reliability of AI-assisted decision-making is as relevant as predictive accuracy. Although deep learning models have demons...
- Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion : Abstract: This report evaluates PDF-to-Markdown conversion using recent Vision-Language Models (VLMs) on challenging French documents. Document parsing is a critical step for Retrieval-Augmented Gener...
- Insights on Muon from Simple Quadratics : Abstract: Muon updates weight matrices along (approximate) polar factors of the gradients and has shown strong empirical performance in large-scale training. Existing attempts at explaining its perfor...
- TADA! Tuning Audio Diffusion Models through Activation Steering : Abstract: Audio diffusion models can synthesize high-fidelity music from text, yet their internal mechanisms for representing high-level concepts remain poorly understood. In this work, we use activat...
- Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning : Abstract: The maturation of Large Audio Language Models (LALMs) has raised growing expectations for them to comprehend complex audio much like humans. Current efforts primarily replicate text-based re...
- DMAP: A Distribution Map for Text : Abstract: Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting th...
- Scale-Invariant Fast Convergence in Games : Abstract: Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility ...
- Free Lunch for Stabilizing Rectified Flow Inversion : Abstract: Rectified-Flow (RF)-based generative models have recently emerged as strong alternatives to traditional diffusion models, demonstrating state-of-the-art performance across various tasks. By ...
- EqDeepRx: Learning a Scalable MIMO Receiver : Abstract: While machine learning (ML)-based receiver algorithms have received a great deal of attention in the recent literature, they often suffer from poor scaling with increasing spatial multiplexi...
- A Comparative Study of MAP and LMMSE Estimators for Blind Inverse Problems : Abstract: Maximum-a-posteriori (MAP) approaches are an effective framework for inverse problems with known forward operators, particularly when combined with expressive priors and careful parameter se...
- How to Sample High Quality 3D Fractals for Action Recognition Pre-Training? : Abstract: Synthetic datasets are being recognized in the deep learning realm as a valuable alternative to exhaustively labeled real data. One such synthetic data generation method is Formula Driven Su...
- Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance : Abstract: Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice,...
- Aggregate Models, Not Explanations: Improving Feature Importance Estimation : Abstract: Feature-importance methods show promise in transforming machine learning models from predictive engines into tools for scientific discovery. However, due to data sampling and algorithmic sto...
- PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers : Abstract: Classical PAC generalization bounds on the prediction risk of a classifier are insufficient to provide theoretical guarantees on fairness when the goal is to learn models balancing predictiv...
- Estimation of instrument and noise parameters for inverse problem based on prior diffusion model : Abstract: This article addresses the issue of estimating observation parameters (response and error parameters) in inverse problems. The focus is on cases where regularization is introduced in a Bayes...
- LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training : Abstract: Expert parallelism is vital for effectively training Mixture-of-Experts (MoE) models, enabling different devices to host distinct experts, with each device processing different input data. H...
- Enforcing Reciprocity in Operator Learning for Seismic Wave Propagation : Abstract: Accurate and efficient wavefield modeling underpins seismic structure and source studies. Traditional methods comply with physical laws but are computationally intensive. Data-driven methods...
- PLESS: Pseudo-Label Enhancement with Spreading Scribbles for Weakly Supervised Segmentation : Abstract: Weakly supervised learning with scribble annotations uses sparse user-drawn strokes to indicate segmentation labels on a small subset of pixels. This annotation reduces the cost of dense pix...
- HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds : Abstract: 4D mmWave radar provides weather-robust, velocity-aware measurements and is more cost-effective than LiDAR. However, radar-only 3D detection still trails LiDAR-based systems because radar po...
- Calibration and Evaluation of Car-Following Models for Autonomous Shuttles Using a Novel Multi-Criteria Framework : Abstract: Autonomous shuttles (AS) are fully autonomous transit vehicles with operating characteristics distinct from conventional autonomous vehicles (AV). Developing dedicated car-following models f...
- Adaptive Power Iteration Method for Differentially Private PCA : Abstract: We study $(ε,δ)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data...
- Surface impedance inference via neural fields and sparse acoustic data obtained by a compact array : Abstract: Standardized laboratory characterizations for absorbing materials rely on idealized sound field assumptions, which deviate largely from real-life conditions. Consequently, \emph{in-situ} aco...
- Optimizing Agent Planning for Security and Autonomy : Abstract: Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by en...
- The Cost of Learning under Multiple Change Points : Abstract: We consider an online learning problem in environments with multiple change points. In contrast to the single change point problem that is widely studied using classical "high confidence" de...
- Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation : Abstract: Latent diffusion models excel at generating high-quality images but lose the benefits of end-to-end modeling. They discard information during image encoding, require a separately trained dec...
- Traffic Flow Reconstruction from Limited Collected Data : Abstract: We propose an efficient method for reconstructing traffic density with low penetration rate of probe vehicles. Specifically, we rely on measuring only the initial and final positions of a sm...
- Sample-Free Safety Assessment of Neural Network Controllers via Taylor Methods : Abstract: In recent years, artificial neural networks have been increasingly studied as feedback controllers for guidance problems. While effective in complex scenarios, they lack the verification gua...
- Amortised and provably-robust simulation-based inference : Abstract: Complex simulator-based models are now routinely used to perform inference across the sciences and engineering, but existing inference methods are often unable to account for outliers and ot...
- Hierarchical Testing of a Hybrid Machine Learning-Physics Global Atmosphere Model : Abstract: Machine learning (ML)-based models have demonstrated high skill and computational efficiency, often outperforming conventional physics-based models in weather and subseasonal predictions. Wh...
- Unlearnable phases of matter : Abstract: We identify fundamental limitations in machine learning by demonstrating that non-trivial mixed-state phases of matter are computationally hard to learn. Focusing on unsupervised learning of...
- Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration : Abstract: Self-play has enabled large language models to autonomously improve through self-generated challenges. However, existing self-play methods for vision-language models rely on passive interact...
- Generative AI-Driven Phase Control for RIS-Aided Cell-Free Massive MIMO Systems : Abstract: This work investigates a generative artificial intelligence (GenAI) model to optimize the reconfigurable intelligent surface (RIS) phase shifts in RIS-aided cell-free massive multiple-input ...
- When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification : Abstract: Large language models (LLMs) often respond even when prompts omit critical details or include misleading information, leading to hallucinations or reinforced misconceptions. We study how to ...
- Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage : Abstract: Accurate characterization of subsurface flow is critical for Carbon Capture and Storage (CCS) but remains challenged by the ill-posed nature of inverse problems with sparse observations. We ...
- Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data : Abstract: Self-supervised learning (SSL) is a powerful paradigm for learning from unlabeled time-series data. However, popular methods such as masked autoencoders (MAEs) rely on reconstructing inputs ...
- Community Concealment from Unsupervised Graph Learning-Based Clustering : Abstract: Graph neural networks (GNNs) are designed to use attributed graphs to learn representations. Such representations are beneficial in the unsupervised learning of clusters and community detect...
- Categorical Flow Maps : Abstract: We introduce Categorical Flow Maps, a flow-matching method for accelerated few-step generation of categorical data via self-distillation. Building on recent variational formulations of flow ...
- Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser : Abstract: Diffusion alignment adapts pretrained diffusion models to sample from reward-tilted distributions along the denoising trajectory. This process naturally admits a Sequential Monte Carlo (SMC)...
- Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction : Abstract: Hybrid architectures combining state-space models with attention have achieved strong efficiency-quality tradeoffs, yet existing approaches either apply attention uniformly or learn static s...
- WaveFormer: Wavelet Embedding Transformer for Biomedical Signals : Abstract: Biomedical signal classification presents unique challenges due to long sequences, complex temporal dynamics, and multi-scale frequency patterns that are poorly captured by standard transfor...
- How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics : Abstract: Standard methods for aligning large language models with human preferences learn from pairwise comparisons among sampled candidate responses and regularize toward a reference policy. Despite...
- Amortized Molecular Optimization via Group Relative Policy Optimization : Abstract: Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "I...
- SafeNeuron: Neuron-Level Safety Alignment for Large Language Models : Abstract: Large language models (LLMs) and multimodal LLMs are typically safety-aligned before release to prevent harmful content generation. However, recent studies show that safety behaviors are con...
- It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks : Abstract: Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation. However, we contend that existing benchma...
- Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions : Abstract: Transformers excel at time series modelling through attention mechanisms that capture long-term temporal patterns. However, they assume uniform time intervals and therefore struggle with irr...
- Capability-Oriented Training Induced Alignment Risk : Abstract: While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging: capability-oriented training induced exploitation. W...
- Few-Shot Design Optimization by Exploiting Auxiliary Information : Abstract: Many real-world design problems involve optimizing an expensive black-box function $f(x)$, such as hardware design or drug discovery. Bayesian Optimization has emerged as a sample-efficient ...
- Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL : Abstract: Estimating the state of an environment from high-dimensional, multimodal, and noisy observations is a fundamental challenge in reinforcement learning (RL). Traditional approaches rely on pro...
- Empirical Gaussian Processes : Abstract: Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel func...
- PathCRF: Ball-Free Soccer Event Detection via Possession Path Inference from Player Trajectories : Abstract: Despite recent advances in AI, event data collection in soccer still relies heavily on labor-intensive manual annotation. Although prior work has explored automatic event detection using pla...
- Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards : Abstract: Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train...
- PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving : Abstract: Multi-agent systems increasingly orchestrate multiple specialized language models to solve complex real-world problems, often invoking them over a shared context. This execution pattern repe...
- Protein Circuit Tracing via Cross-layer Transcoders : Abstract: Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understoo...
- Improved state mixing in higher-order and block diagonal linear recurrent networks : Abstract: Linear recurrent networks (LRNNs) and linear state space models (SSMs) promise computational and memory efficiency on long-sequence modeling tasks, yet their diagonal state transitions limit...
- FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client : Abstract: One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server-side foundation model. Existing metho...
- Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret : Abstract: In large-scale data processing scenarios, data often arrive in sequential streams generated by complex systems that exhibit drifting distributions and time-varying system parameters. This no...
- RAM-Net: Expressive Linear Attention with Selectively Addressable Memory : Abstract: While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address ...
- Are Two LLMs Better Than One? A Student-Teacher Dual-Head LLMs Architecture for Pharmaceutical Content Optimization : Abstract: Large language models (LLMs) are increasingly used to create content in regulated domains such as pharmaceuticals, where outputs must be scientifically accurate and legally compliant. Manual...
- Using predictive multiplicity to measure individual performance within the AI Act : Abstract: When building AI systems for decision support, one often encounters the phenomenon of predictive multiplicity: a single best model does not exist; instead, one can construct many models with...
- Temporally Unified Adversarial Perturbations for Time Series Forecasting : Abstract: While deep learning models have achieved remarkable success in time series forecasting, their vulnerability to adversarial examples remains a critical security concern. However, existing att...
- Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration : Abstract: Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We exten...
- Learning Conditional Averages : Abstract: We introduce the problem of learning conditional averages in the PAC framework. The learner receives a sample labeled by an unknown target concept from a known concept class, as in standard ...
- Universal Diffusion-Based Probabilistic Downscaling : Abstract: We introduce a universal diffusion-based downscaling framework that lifts deterministic low-resolution weather forecasts into probabilistic high-resolution predictions without any model-spec...
- In-Context Function Learning in Large Language Models : Abstract: Large language models (LLMs) can learn from a few demonstrations provided at inference time. We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs). We bui...
- A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production : Abstract: Building upon recent structural disentanglement frameworks for sign language production, we propose A$^{2}$V-SLP, an alignment-aware variational framework that learns articulator-wise disent...
- Robust Optimization Approach and Learning Based Hide-and-Seek Game for Resilient Network Design : Abstract: We study the design of resilient and reliable communication networks in which a signal can be transferred only up to a limited distance before its quality falls below an acceptable threshold...
- Towards Sustainable Investment Policies Informed by Opponent Shaping : Abstract: Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfare, resulting in social dilemmas. InvestESG is a re...
- CAAL: Confidence-Aware Active Learning for Heteroscedastic Atmospheric Regression : Abstract: Quantifying the impacts of air pollution on health and climate relies on key atmospheric particle properties such as toxicity and hygroscopicity. However, these properties typically require ...
- Deep Kernel Fusion for Transformers : Abstract: Agentic LLM inference with long contexts is increasingly limited by memory bandwidth rather than compute. In this setting, SwiGLU MLP blocks, whose large weights exceed cache capacity, becom...
- From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL : Abstract: Path signatures embed trajectories into tensor algebra and constitute a universal, non-parametric representation of paths; however, in the standard form, they collapse temporal structure int...
- TopoFair: Linking Topological Bias to Fairness in Link Prediction Benchmarks : Abstract: Graph link prediction (LP) plays a critical role in socially impactful applications, such as job recommendation and friendship formation. Ensuring fairness in this task is thus essential. Wh...
- SpaTeoGL: Spatiotemporal Graph Learning for Interpretable Seizure Onset Zone Analysis from Intracranial EEG : Abstract: Accurate localization of the seizure onset zone (SOZ) from intracranial EEG (iEEG) is essential for epilepsy surgery but is challenged by complex spatiotemporal seizure dynamics. We propose ...
- Temporal Difference Learning with Constrained Initial Representations : Abstract: Recently, there have been numerous attempts to enhance the sample efficiency of off-policy reinforcement learning (RL) agents when interacting with the environment, including architecture im...
- Latent-Variable Learning of SPDEs via Wiener Chaos : Abstract: We study the problem of learning the law of linear stochastic partial differential equations (SPDEs) with additive Gaussian forcing from spatiotemporal observations. Most existing deep learn...
- Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning : Abstract: Temperature is a crucial hyperparameter in large language models (LLMs), controlling the trade-off between exploration and exploitation during text generation. High temperatures encourage di...
- MUSE: Multi-Tenant Model Serving With Seamless Model Updates : Abstract: In binary classification systems, decision thresholds translate model scores into actions. Choosing suitable thresholds relies on the specific distribution of the underlying model scores but...
- TUBO: A Tailored ML Framework for Reliable Network Traffic Forecasting : Abstract: Traffic forecasting based network operation optimization and management offers enormous promise but also presents significant challenges from traffic forecasting perspective. While deep lear...
- U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series : Abstract: Probabilistic forecasting of irregularly sampled time series is crucial in domains such as healthcare and finance, yet it remains a formidable challenge. Existing Neural Controlled Different...
- Dopamine: Brain Modes, Not Brains : Abstract: Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechan...
- DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels : Abstract: Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particul...
- Potential-energy gating for robust state estimation in bistable stochastic systems : Abstract: We introduce potential-energy gating, a method for robust state estimation in systems governed by double-well stochastic dynamics. The observation noise covariance of a Bayesian filter is mo...
- SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion : Abstract: Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement...
- Explainable Machine-Learning based Detection of Knee Injuries in Runners : Abstract: Running is a widely practiced activity but shows a high incidence of knee injuries, especially Patellofemoral Pain Syndrome (PFPS) and Iliotibial Band Syndrome (ITBS). Identifying gait patte...
- Fully First-Order Algorithms for Online Bilevel Optimization : Abstract: In this work, we study non-convex-strongly-convex online bilevel optimization (OBO). Existing OBO algorithms are mainly based on hypergradient descent, which requires access to a Hessian-vec...
- UMAP Is Spectral Clustering on the Fuzzy Nearest-Neighbor Graph : Abstract: UMAP (Uniform Manifold Approximation and Projection) is among the most widely used algorithms for non linear dimensionality reduction and data visualisation. Despite its popularity, and desp...
- Both Topology and Text Matter: Revisiting LLM-guided Out-of-Distribution Detection on Text-attributed Graphs : Abstract: Text-attributed graphs (TAGs) associate nodes with textual attributes and graph structure, enabling GNNs to jointly model semantic and structural information. While effective on in-distribut...
- TIP: Resisting Gradient Inversion via Targeted Interpretable Perturbation in Federated Learning : Abstract: Federated Learning (FL) facilitates collaborative model training while preserving data locality; however, the exchange of gradients renders the system vulnerable to Gradient Inversion Attack...
- GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks : Abstract: Graph Prompt Learning (GPL) has recently emerged as a promising paradigm for downstream adaptation of pre-trained graph models, mitigating the misalignment between pre-training objectives an...
- TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees : Abstract: We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality...
- How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks? : Abstract: Chemical Language Models (CLMs) pre-trained on large scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources suc...
- SkillRater: Untangling Capabilities in Multimodal Data : Abstract: Data curation methods typically assign samples a single quality score. We argue this scalar framing is fundamentally limited: when training requires multiple distinct capabilities, a monolit...
- Learn from Your Mistakes: Self-Correcting Masked Diffusion Models : Abstract: Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these adv...
- Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal : Abstract: Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neurosci...
- The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient : Abstract: A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the im...
- Real-Time Proactive Anomaly Detection via Forward and Backward Forecast Modeling : Abstract: Reactive anomaly detection methods, which are commonly deployed to identify anomalies after they occur based on observed deviations, often fall short in applications that demand timely inter...
- PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models : Abstract: The emergence of reasoning-based LLMs leveraging Chain-of-Thought (CoT) inference introduces new serving challenges, as their extended reasoning phases delay user-visible output and inflate ...
- Unifying Stable Optimization and Reference Regularization in RLHF : Abstract: Reinforcement Learning from Human Feedback (RLHF) has advanced alignment capabilities significantly but remains hindered by two core challenges: \textbf{reward hacking} and \textbf{stable op...
- Calibrating an Imperfect Auxiliary Predictor for Unobserved No-Purchase Choice : Abstract: Firms typically cannot observe key consumer actions: whether customers buy from a competitor, choose not to buy, or even fully consider the firm's offer. This missing outside-option informat...
- A Generic Framework for Fair Consensus Clustering in Streams : Abstract: Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent envir...
- Partial GFlowNet: Accelerating Convergence in Large State Spaces via Strategic Partitioning : Abstract: Generative Flow Networks (GFlowNets) have shown promising potential to generate high-scoring candidates with probability proportional to their rewards. As existing GFlowNets freely explore i...
- Exploring Multiple High-Scoring Subspaces in Generative Flow Networks : Abstract: As a probabilistic sampling framework, Generative Flow Networks (GFlowNets) show strong potential for constructing complex combinatorial objects through the sequential composition of element...
- External Division of Two Bregman Proximity Operators for Poisson Inverse Problems : Abstract: This paper presents a novel method for recovering sparse vectors from linear models corrupted by Poisson noise. The contribution is twofold. First, an operator defined via the external divis...
- PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling : Abstract: Understanding how anatomical shapes evolve in response to developmental covariates and quantifying their spatially varying uncertainties is critical in healthcare research. Existing approach...
- Assessing Low Back Movement with Motion Tape Sensor Data Through Deep Learning : Abstract: Back pain is a pervasive issue affecting a significant portion of the population, often worsened by certain movements of the lower back. Assessing these movements is important for helping cl...
- Hierarchical Concept Embedding & Pursuit for Interpretable Image Classification : Abstract: Interpretable-by-design models are gaining traction in computer vision because they provide faithful explanations for their predictions. In image classification, these models typically recov...
- Multi-Level Strategic Classification: Incentivizing Improvement through Promotion and Relegation Dynamics : Abstract: Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turnin...
- TimeSynth: A Framework for Uncovering Systematic Biases in Time Series Forecasting : Abstract: Time series forecasting is a fundamental tool with wide ranging applications, yet recent debates question whether complex nonlinear architectures truly outperform simple linear models. Prior...
- CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer : Abstract: Click-through rate (CTR) prediction is fundamental to online advertising systems. While Deep Learning Recommendation Models (DLRMs) with explicit feature interactions have long dominated thi...
- Sparse Semantic Dimension as a Generalization Certificate for LLMs : Abstract: Standard statistical learning theory predicts that Large Language Models (LLMs) should overfit because their parameter counts vastly exceed the number of training tokens. Yet, in practice, t...
- Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization : Abstract: We study robust Markov decision processes (RMDPs) with general policy parameterization under s-rectangular and non-rectangular uncertainty sets. Prior work is largely limited to tabular poli...
- WSBD: Freezing-Based Optimizer for Quantum Neural Networks : Abstract: The training of Quantum Neural Networks (QNNs) is hindered by the high computational cost of gradient estimation and the barren plateau problem, where optimization landscapes become intracta...
- Toward Adaptive Non-Intrusive Reduced-Order Models: Design and Challenges : Abstract: Projection-based Reduced Order Models (ROMs) are often deployed as static surrogates, which limits their practical utility once a system leaves the training manifold. We formalize and study ...
- Structured Hybrid Mechanistic Models for Robust Estimation of Time-Dependent Intervention Outcomes : Abstract: Estimating intervention effects in dynamical systems is crucial for outcome optimization. In medicine, such interventions arise in physiological regulation (e.g., cardiovascular system under...
- Efficient Analysis of the Distilled Neural Tangent Kernel : Abstract: Neural tangent kernel (NTK) methods are computationally limited by the need to evaluate large Jacobians across many data points. Existing approaches reduce this cost primarily through projec...
- Evaluating Memory Structure in LLM Agents : Abstract: Modern LLM-based agents and chat assistants rely on long-term memory frameworks to store reusable knowledge, recall user preferences, and augment reasoning. As researchers create more comple...
- Learning Glioblastoma Tumor Heterogeneity Using Brain Inspired Topological Neural Networks : Abstract: Accurate prognosis for Glioblastoma (GBM) using deep learning (DL) is hindered by extreme spatial and structural heterogeneity. Moreover, inconsistent MRI acquisition protocols across instit...
- Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT : Abstract: Large language models (LLMs) have made rapid progress, yet adapting them to downstream scenarios still commonly relies on supervised fine-tuning (SFT). When downstream data exhibit a substan...
- The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-Tuning : Abstract: Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we inve...
- Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators : Abstract: Molecular dynamics (MD) is a central computational tool in physics, chemistry, and biology, enabling quantitative prediction of experimental observables as expectations over high-dimensional...
- Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning : Abstract: While large language models (LLMs) have achieved strong performance through fine-tuning within individual scientific domains, their learning dynamics in multi-disciplinary contexts remains p...
- Towards Compressive and Scalable Recurrent Memory : Abstract: Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often...
- Adaptive Physics Transformer with Fused Global-Local Attention for Subsurface Energy Systems : Abstract: The Earth's subsurface is a cornerstone of modern society, providing essential energy resources like hydrocarbons, geothermal, and minerals while serving as the primary reservoir for $CO_2$ ...
- AM-FM: A Foundation Model for Ambient Intelligence Through WiFi : Abstract: Ambient intelligence, continuously understanding human presence, activity, and physiology in physical spaces, is fundamental to smart environments, health monitoring, and human-computer inte...
- Predicting the post-wildfire mudflow onset using machine learning models on multi-parameter experimental data : Abstract: Post-wildfire mudflows are increasingly hazardous due to the prevalence of wildfires, including those on the wildland-urban interface. Upon burning, soil on the surface or immediately beneat...
- GAC-KAN: An Ultra-Lightweight GNSS Interference Classifier for GenAI-Powered Consumer Edge Devices : Abstract: The integration of Generative AI (GenAI) into Consumer Electronics (CE)--from AI-powered assistants in wearables to generative planning in autonomous Uncrewed Aerial Vehicles (UAVs)--has rev...
- STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory : Abstract: Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as a...
- NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control : Abstract: Synthesizing coherent soundtracks for long-form videos remains a formidable challenge, currently stalled by three critical impediments: computational scalability, temporal coherence, and, mo...
- PBP: Post-training Backdoor Purification for Malware Classifiers : Abstract: In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks on ML malware classifiers. For i...
- NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in generating coherent text but often struggle with grounding language and strategic dialogue. To address this gap, we ...
- Compiling High-Level Neural Network Specifications into VNN-LIB Queries : Abstract: The formal verification of traditional software has been revolutionised by verification-orientated languages such as Dafny and F* which enable developers to write high-level specifications t...
- Phase Transition for Budgeted Multi-Agent Synergy : Abstract: Multi-agent systems can improve reliability, yet under a fixed inference budget they often help, saturate, or even collapse. We develop a minimal and calibratable theory that predicts these ...
- Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment : Abstract: The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable ...
- UniT: Unified Multimodal Chain-of-Thought Test-time Scaling : Abstract: Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. M...
- AttentionRetriever: Attention Layers are Secretly Long Document Retrievers : Abstract: Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not design...
- Creative Ownership in the Age of AI : Abstract: Copyright law focuses on whether a new work is "substantially similar" to an existing one, but generative AI can closely imitate style without copying content, a capability now central to on...
- On the implicit regularization of Langevin dynamics with projected noise : Abstract: We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new light on the effects of symmetr...
- A technical curriculum on language-oriented artificial intelligence in translation and specialised communication : Abstract: This paper presents a technical curriculum on language-oriented artificial intelligence (AI) in the language and translation (L&T) industry. The curriculum aims to foster domain-specific tec...
- ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction : Abstract: Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are increasingly deployed to automa...
- Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces : Abstract: Joint-Embedding Predictive Architectures (JEPAs) aim to learn representations by predicting target embeddings from context embeddings, inducing a scalar compatibility energy in a latent spac...
- Olmix: A Framework for Data Mixing Throughout LM Development : Abstract: Data mixing -- determining the ratios of data from different domains -- is a first-order concern for training language models (LMs). While existing mixing methods show promise, they fall sho...
- Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision : Abstract: Neuromorphic vision systems based on spiking neural networks (SNNs) offer ultra-low-power perception for event-based and frame-based cameras, yet catastrophic forgetting remains a critical b...
- Bandit Learning in Matching Markets with Interviews : Abstract: Two-sided matching markets rely on preferences from both sides, yet it is often impractical to evaluate preferences. Participants, therefore, conduct a limited number of interviews, which pr...
- Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training : Abstract: Supervised fine-tuning (SFT) is computationally efficient but often yields inferior generalization compared to reinforcement learning (RL). This gap is primarily driven by RL's use of on-pol...
- The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics : Abstract: Determining whether neural models internalize physical laws as world models, rather than exploiting statistical shortcuts, remains challenging, especially under out-of-distribution (OOD) shi...
- VIRENA: Virtual Arena for Research, Education, and Democratic Innovation : Abstract: Digital platforms shape how people communicate, deliberate, and form opinions. Studying these dynamics has become increasingly difficult due to restricted data access, ethical constraints on...
- DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing : Abstract: Current unified multimodal models for image generation and editing typically rely on massive parameter scales (e.g., >10B), entailing prohibitive training costs and deployment footprints. In...
- Visual Reasoning Benchmark: Evaluating Multimodal LLMs on Classroom-Authentic Visual Problems from Primary Education : Abstract: AI models have achieved state-of-the-art results in textual reasoning; however, their ability to reason over spatial and relational structures remains a critical bottleneck -- particularly i...
- SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization : Abstract: Search-Augmented Generative Engines (SAGE) have emerged as a new paradigm for information access, bridging web-scale retrieval with generative capabilities to deliver synthesized answers. Th...
- 3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting : Abstract: Object navigation is a core capability of embodied intelligence, enabling an agent to locate target objects in unknown environments. Recent advances in vision-language models (VLMs) have fac...
- dVoting: Fast Voting for dLLMs : Abstract: Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specif...
- On the Adoption of AI Coding Agents in Open-source Android and iOS Development : Abstract: AI coding agents are increasingly contributing to software development, yet their impact on mobile development has received little empirical attention. In this paper, we present the first ca...
- Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation : Abstract: On-policy distillation (OPD), which aligns the student with the teacher's logit distribution on student-generated trajectories, has demonstrated strong empirical gains in improving student p...
- Meta-Sel: Efficient Demonstration Selection for In-Context Learning via Supervised Meta-Learning : Abstract: Demonstration selection is a practical bottleneck in in-context learning (ICL): under a tight prompt budget, accuracy can change substantially depending on which few-shot examples are includ...
- KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite : Abstract: Tropical cyclones (TC) are among the most destructive natural disasters, causing catastrophic damage to coastal regions through extreme winds, heavy rainfall, and storm surges. Timely monito...
- On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage : Abstract: We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al....
- Multi Graph Search for High-Dimensional Robot Motion Planning : Abstract: Efficient motion planning for high-dimensional robotic systems, such as manipulators and mobile manipulators, is critical for real-time operation and reliable deployment. Although advances i...
- DeepSight: An All-in-One LM Safety Toolkit : Abstract: As the development of Large Models (LMs) progresses rapidly, their safety is also a priority. In current Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) safety work...
- Choose Your Agent: Tradeoffs in Adopting AI Advisors, Coaches, and Delegates in Multi-Party Negotiation : Abstract: As AI usage becomes more prevalent in social contexts, understanding agent-user interaction is critical to designing systems that improve both individual and group outcomes. We present an on...
- ModelWisdom: An Integrated Toolkit for TLA+ Model Visualization, Digest and Repair : Abstract: Model checking in TLA+ provides strong correctness guarantees, yet practitioners continue to face significant challenges in interpreting counterexamples, understanding large state-transition...
- Fourier Transformers for Latent Crystallographic Diffusion and Generative Modeling : Abstract: The discovery of new crystalline materials calls for generative models that handle periodic boundary conditions, crystallographic symmetries, and physical constraints, while scaling to large...
- An Empirical Study of the Imbalance Issue in Software Vulnerability Detection : Abstract: Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability...
- On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy : Abstract: Federated Neuromorphic Learning (FNL) enables energy-efficient and privacy-preserving learning on devices without centralizing data. However, real-world deployments require additional privac...
- Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? : Abstract: A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although...
- Accelerating Robotic Reinforcement Learning with Agent Guidance : Abstract: Reinforcement Learning (RL) offers a powerful paradigm for autonomous robots to master generalist manipulation skills through trial-and-error. However, its real-world application is stifled ...
- Manifold-Aware Temporal Domain Generalization for Large Language Models : Abstract: Temporal distribution shifts are pervasive in real-world deployments of Large Language Models (LLMs), where data evolves continuously over time. While Temporal Domain Generalization (TDG) se...
- TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex : Abstract: The brain interprets visual information through learned regularities, a computation formalized as probabilistic inference under a prior. The visual cortex establishes priors for this inferen...
- Towards Performance-Enhanced Model-Contrastive Federated Learning using Historical Information in Heterogeneous Scenarios : Abstract: Federated Learning (FL) enables multiple nodes to collaboratively train a model without sharing raw data. However, FL systems are usually deployed in heterogeneous scenarios, where nodes dif...
- Synthesis of Late Gadolinium Enhancement Images via Implicit Neural Representations for Cardiac Scar Segmentation : Abstract: Late gadolinium enhancement (LGE) imaging is the clinical standard for myocardial scar assessment, but limited annotated datasets hinder the development of automated segmentation methods. We...
- IncompeBench: A Permissively Licensed, Fine-Grained Benchmark for Music Information Retrieval : Abstract: Multimodal Information Retrieval has made significant progress in recent years, leveraging the increasingly strong multimodal abilities of deep pre-trained models to represent information ac...
- AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection : Abstract: Evolutionary agentic systems intensify the trade-off between computational efficiency and reasoning capability by repeatedly invoking large language models (LLMs) during inference. This sett...
- Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making : Abstract: LLMs are increasingly supporting decision-making across high-stakes domains, requiring critical reflection on the socio-technical factors that shape how humans and LLMs are assigned roles an...
- DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target : Abstract: Most existing hand motion generation benchmarks for hand-object interaction (HOI) focus on static objects, leaving dynamic scenarios with moving targets and time-critical coordination largel...
- Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs: A Systematic Evaluation : Abstract: Software languages evolve over time for reasons such as feature additions. When grammars evolve, textual instances that originally conformed to them may become outdated. While model-driven e...
- Mitigating Mismatch within Reference-based Preference Optimization : Abstract: Direct Preference Optimization (DPO) has become the de facto standard for offline preference alignment of large language models, but its reliance on a reference policy introduces a critical ...
- Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy : Abstract: Contemporary AI-driven cybersecurity systems are predominantly architected as model-centric detection and automation pipelines optimized for task-level performance metrics such as accuracy a...
- Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning : Abstract: Efficient spatial reasoning requires world models that remain reliable under tight precision budgets. We study whether low-bit planning behavior is determined mostly by total bitwidth or by ...
- SynthRAR: Ring Artifacts Reduction in CT with Unrolled Network and Synthetic Data Training : Abstract: Defective and inconsistent responses in CT detectors can cause ring and streak artifacts in the reconstructed images, making them unusable for clinical purposes. In recent years, several rin...
- Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems : Abstract: Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Exis...
- Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception : Abstract: Multimodal Large Language Models (MLLMs) excel at broad visual understanding but still struggle with fine-grained perception, where decisive evidence is small and easily overwhelmed by globa...
- Resource-Aware Deployment Optimization for Collaborative Intrusion Detection in Layered Networks : Abstract: Collaborative Intrusion Detection Systems (CIDS) are increasingly adopted to counter cyberattacks, as their collaborative nature enables them to adapt to diverse scenarios across heterogeneo...
- Improving Neural Retrieval with Attribution-Guided Query Rewriting : Abstract: Neural retrievers are effective but brittle: underspecified or ambiguous queries can misdirect ranking even when relevant documents exist. Existing approaches address this brittleness only p...
- ULTRA:Urdu Language Transformer-based Recommendation Architecture : Abstract: Urdu, as a low-resource language, lacks effective semantic content recommendation systems, particularly in the domain of personalized news retrieval. Existing approaches largely rely on lexi...
- Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing : Abstract: Traditional benchmarks for large language models (LLMs) primarily assess safety risk through breadth-oriented evaluation across diverse tasks. However, real-world deployment exposes a differ...
- Safe Fairness Guarantees Without Demographics in Classification: Spectral Uncertainty Set Perspective : Abstract: As automated classification systems become increasingly prevalent, concerns have emerged over their potential to reinforce and amplify existing societal biases. In the light of this issue, m...
- MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling : Abstract: The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture....
- Cooperation Breakdown in LLM Agents Under Communication Delays : Abstract: LLM-based multi-agent systems (LLM-MAS), in which autonomous AI agents cooperate to solve tasks, are gaining increasing attention. For such systems to be deployed in society, agents must be ...
- AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild : Abstract: Benchmarks are paramount for gauging progress in the domain of Mobile GUI Agents. In practical scenarios, users frequently fail to articulate precise directives containing full task details ...
- Adapting Vision-Language Models for E-commerce Understanding at Scale : Abstract: E-commerce product understanding demands by nature, strong multimodal comprehension from text, images, and structured attributes. General-purpose Vision-Language Models (VLMs) enable general...
- LLM-Driven 3D Scene Generation of Agricultural Simulation Environments : Abstract: Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language ...
- Semantically Conditioned Diffusion Models for Cerebral DSA Synthesis : Abstract: Digital subtraction angiography (DSA) plays a central role in the diagnosis and treatment of cerebrovascular disease, yet its invasive nature and high acquisition cost severely limit large-s...
- TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction : Abstract: Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models...
- OMEGA-Avatar: One-shot Modeling of 360{\deg} Gaussian Avatars : Abstract: Creating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-...
- ANML: Attribution-Native Machine Learning with Guaranteed Robustness : Abstract: Frontier AI systems increasingly train on specialized expert data, from clinical records to proprietary research to curated datasets, yet current training pipelines treat all samples identic...
- DRACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity : Abstract: We present DRACO (Deep Research Accuracy, Completeness, and Objectivity), a benchmark of complex deep research tasks. These tasks, which span 10 domains and draw on information sources from ...
- PatientHub: A Unified Framework for Patient Simulation : Abstract: As Large Language Models increasingly power role-playing applications, simulating patients has become a valuable tool for training counselors and scaling therapeutic assessment. However, pri...
- Provable Offline Reinforcement Learning for Structured Cyclic MDPs : Abstract: We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the ...
- SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving : Abstract: In autonomous driving, end-to-end (E2E) driving systems that predict control commands directly from sensor data have achieved significant advancements. For safe driving in unexpected scenari...
- LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection : Abstract: The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under strict computational and memory constraint...
- DMind-3: A Sovereign Edge--Local--Cloud AI System with Controlled Deliberation and Correction-Based Tuning for Safe, Low-Latency Transaction Execution : Abstract: This paper introduces DMind-3, a sovereign Edge-Local-Cloud intelligence stack designed to secure irreversible financial execution in Web3 environments against adversarial risks and strict l...
- Brain Tumor Classifiers Under Attack: Robustness of ResNet Variants Against Transferable FGSM and PGD Attacks : Abstract: Adversarial robustness in deep learning models for brain tumor classification remains an underexplored yet critical challenge, particularly for clinical deployment scenarios involving MRI da...
- ViTaS: Visual Tactile Soft Fusion Contrastive Learning for Visuomotor Learning : Abstract: Tactile information plays a crucial role in human manipulation tasks and has recently garnered increasing attention in robotic manipulation. However, existing approaches mostly focus on the ...
- Variation-aware Flexible 3D Gaussian Editing : Abstract: Indirect editing methods for 3D Gaussian Splatting (3DGS) have recently witnessed significant advancements. These approaches operate by first applying edits in the rendered 2D space and subs...
- ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning : Abstract: Large-scale Visual Instruction Tuning (VIT) has become a key paradigm for advancing the performance of vision-language models (VLMs) across various multimodal tasks. However, training on the...
- ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning : Abstract: Learning solution operators for systems with complex, varying geometries and parametric physical settings is a central challenge in scientific machine learning. In many-query regimes such as...
- PLOT-CT: Pre-log Voronoi Decomposition Assisted Generation for Low-dose CT Reconstruction : Abstract: Low-dose computed tomography (LDCT) reconstruction is fundamentally challenged by severe noise and compromised data fidelity under reduced radiation exposure. Most existing methods operate e...
- ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation : Abstract: Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification...
- Gradient Compression May Hurt Generalization: A Remedy by Synthetic Data Guided Sharpness Aware Minimization : Abstract: It is commonly believed that gradient compression in federated learning (FL) enjoys significant improvement in communication efficiency with negligible performance degradation. In this paper...
- Analytical Search : Abstract: Analytical information needs, such as trend analysis and causal impact assessment, are prevalent across various domains including law, finance, science, and much more. However, existing info...
- ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles : Abstract: Visual navigation models often struggle in real-world dynamic environments due to limited robustness to the sim-to-real gap and the difficulty of training policies tailored to target deploym...
- Perception-based Image Denoising via Generative Compression : Abstract: Image denoising aims to remove noise while preserving structural details and perceptual realism, yet distortion-driven methods often produce over-smoothed reconstructions, especially under s...
- TS-Memory: Plug-and-Play Memory for Time Series Foundation Models : Abstract: Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challeng...
- Native Reasoning Models: Training Language Models to Reason on Unverifiable Data : Abstract: The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained b...
- Krause Synchronization Transformers : Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern...
- AltTS: A Dual-Path Framework with Alternating Optimization for Multivariate Time Series Forecasting : Abstract: Multivariate time series forecasting involves two qualitatively distinct factors: (i) stable within-series autoregressive (AR) dynamics, and (ii) intermittent cross-dimension interactions th...
- Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs : Abstract: Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-s...
- Adaptive Milestone Reward for GUI Agents : Abstract: Reinforcement Learning (RL) has emerged as a mainstream paradigm for training Mobile GUI Agents, yet it struggles with the temporal credit assignment problem inherent in long-horizon tasks. ...
- Locally Interpretable Individualized Treatment Rules for Black-Box Decision Models : Abstract: Individualized treatment rules (ITRs) aim to optimize healthcare by tailoring treatment decisions to patient-specific characteristics. Existing methods typically rely on either interpretable...
- How Smart Is Your GUI Agent? A Framework for the Future of Software Interaction : Abstract: GUI agents are rapidly becoming a new interaction to software, allowing people to navigate web, desktop and mobile rather than execute them click by click. Yet ``agent'' is described with ra...
- Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt : Abstract: Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devi...
- Multimodal Fact-Level Attribution for Verifiable Reasoning : Abstract: Multimodal large language models (MLLMs) are increasingly used for real-world tasks involving multi-step reasoning and long-form generation, where reliability requires grounding model output...
- RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis : Abstract: The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware. Ho...
- Understanding Persuasive Interactions between Generative Social Agents and Humans: The Knowledge-based Persuasion Model (KPM) : Abstract: Generative social agents (GSAs) use artificial intelligence to autonomously communicate with human users in a natural and adaptive manner. Currently, there is a lack of theorizing regarding ...
- Compiler-Guided Inference-Time Adaptation: Improving GPT-5 Programming Performance in Idris : Abstract: GPT-5, a state of the art large language model from OpenAI, demonstrates strong performance in widely used programming languages such as Python, C++, and Java; however, its ability to operat...
- EM-Aware Physical Synthesis: Neural Inductor Modeling and Intelligent Placement & Routing for RF Circuits : Abstract: This paper presents an ML-driven framework for automated RF physical synthesis that transforms circuit netlists into manufacturable GDSII layouts. While recent ML approaches demonstrate succ...
- From Noise to Order: Learning to Rank via Denoising Diffusion : Abstract: In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document b...
- Enhanced Portable Ultra Low-Field Diffusion Tensor Imaging with Bayesian Artifact Correction and Deep Learning-Based Super-Resolution : Abstract: Portable, ultra-low-field (ULF) magnetic resonance imaging has the potential to expand access to neuroimaging but currently suffers from coarse spatial and angular resolutions and low signal...
- Towards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and Safety : Abstract: Machine Translation (MT) plays a pivotal role in cross-lingual information access, public policy communication, and equitable knowledge dissemination. However, critical meaning errors, such ...
- Fighting MRI Anisotropy: Learning Multiple Cardiac Shapes From a Single Implicit Neural Representation : Abstract: The anisotropic nature of short-axis (SAX) cardiovascular magnetic resonance imaging (CMRI) limits cardiac shape analysis. To address this, we propose to leverage near-isotropic, higher reso...
- Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives : Abstract: Standard negative log-likelihood (NLL) for Supervised Fine-Tuning (SFT) applies uniform token-level weighting. This rigidity creates a two-fold failure mode: (i) overemphasizing low-probabil...
- When Visibility Outpaces Verification: Delayed Verification and Narrative Lock-in in Agentic AI Discourse : Abstract: Agentic AI systems-autonomous entities capable of independent planning and execution-reshape the landscape of human-AI trust. Long before direct system exposure, user expectations are mediat...
- Can We Really Learn One Representation to Optimize All Rewards? : Abstract: As machine learning has moved towards leveraging large models as priors for downstream tasks, the community has debated the right form of prior for solving reinforcement learning (RL) proble...
- General and Efficient Steering of Unconditional Diffusion : Abstract: Guiding unconditional diffusion models typically requires either retraining with conditional inputs or per-step gradient computations (e.g., classifier-based guidance), both of which incur s...
- Retrieval-Aware Distillation for Transformer-SSM Hybrids : Abstract: State-space models (SSMs) offer efficient sequence modeling but lag behind Transformers on benchmarks that require in-context retrieval. Prior work links this gap to a small set of attention...
- The Manifold of the Absolute: Religious Perennialism as Generative Inference : Abstract: This paper formalizes religious epistemology through the mathematics of Variational Autoencoders. We model religious traditions as distinct generative mappings from a shared, low-dimensional...
- The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods : Abstract: Large Language Models (LLMs) frequently hallucinate plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when models are confidently wrong. We propose Diff...
- Finding the Cracks: Improving LLMs Reasoning with Paraphrastic Probing and Consistency Verification : Abstract: Large language models have demonstrated impressive performance across a variety of reasoning tasks. However, their problem-solving ability often declines on more complex tasks due to halluci...
- Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models : Abstract: Clinical prediction models are increasingly used to support patient care, yet many deep learning-based approaches remain unstable, as their predictions can vary substantially when trained on...
- When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing : Abstract: Large language models produce rich introspective language when prompted for self-examination, but whether this language reflects internal computation or sophisticated confabulation has remai...
- Divide and Learn: Multi-Objective Combinatorial Optimization at Scale : Abstract: Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical gua...
- Situated, Dynamic, and Subjective: Envisioning the Design of Theory-of-Mind-Enabled Everyday AI with Industry Practitioners : Abstract: Theory of Mind (ToM) -- the ability to infer what others are thinking (e.g., intentions) from observable cues -- is traditionally considered fundamental to human social interactions. This ha...
- MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation : Abstract: Deploying robots at scale demands robustness to the long tail of everyday situations. The countless variations in scene layout, object geometry, and task specifications that characterize rea...
- Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP : Abstract: The rapid development of the AI agent communication protocols, including the Model Context Protocol (MCP), Agent2Agent (A2A), Agora, and Agent Network Protocol (ANP), is reshaping how AI age...
- Predictive Associative Memory: Retrieval Beyond Similarity Through Temporal Co-occurrence : Abstract: Current approaches to memory in neural systems rely on similarity-based retrieval: given a query, find the most representationally similar stored state. This assumption -- that useful memori...
- CryptoAnalystBench: Failures in Multi-Tool Long-Form LLM Analysis : Abstract: Modern analyst agents must reason over complex, high token inputs, including dozens of retrieved documents, tool outputs, and time sensitive data. While prior work has produced tool calling ...
- HiFloat4 Format for Language Model Inference : Abstract: This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averagin...
- DeepRed: an architecture for redshift estimation : Abstract: Estimating redshift is a central task in astrophysics, but its measurement is costly and time-consuming. In addition, current image-based methods are often validated on homogeneous datasets....
- How Many Features Can a Language Model Store Under the Linear Representation Hypothesis? : Abstract: We introduce a mathematical framework for the linear representation hypothesis (LRH), which asserts that intermediate layers of language models store features linearly. We separate the hypot...
- Toward Reliable Tea Leaf Disease Diagnosis Using Deep Learning Model: Enhancing Robustness With Explainable AI and Adversarial Training : Abstract: Tea is a valuable asset for the economy of Bangladesh. So, tea cultivation plays an important role to boost the economy. These valuable plants are vulnerable to various kinds of leaf infecti...
- AI-Driven Clinical Decision Support System for Enhanced Diabetes Diagnosis and Management : Abstract: Identifying type 2 diabetes mellitus can be challenging, particularly for primary care physicians. Clinical decision support systems incorporating artificial intelligence (AI-CDSS) can assis...
- Credal Concept Bottleneck Models: Structural Separation of Epistemic and Aleatoric Uncertainty : Abstract: Decomposing predictive uncertainty into epistemic (model ignorance) and aleatoric (data ambiguity) components is central to reliable decision making, yet most methods estimate both from the ...
- SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents : Abstract: Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pr...
- UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra : Abstract: Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate g...
- Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders : Abstract: The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversaria...
- interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors : Abstract: We present a test-time verification framework, interwhen, that ensures that the output of a reasoning model is valid wrt. a given set of verifiers. Verified reasoning is an important goal in...
- DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task : Abstract: Multi-agent frameworks promise to simplify LLM-driven software development, yet there is no principled way to evaluate their developer experience in a controlled setting. We introduce DDL2Pr...
- Hybrid operator learning of wave scattering maps in high-contrast media : Abstract: Surrogate modeling of wave propagation and scattering (i.e. the wave speed and source to wave field map) in heterogeneous media has significant potential in applications such as seismic imag...
- Position-Aware Self-supervised Representation Learning for Cross-mode Radar Signal Recognition : Abstract: Radar signal recognition in open electromagnetic environments is challenging due to diverse operating modes and unseen radar types. Existing methods often overlook position relations in puls...
- MELINOE: Fine-Tuning Enables Memory-Efficient Inference for Mixture-of-Experts Models : Abstract: Mixture-of-Experts (MoE) model architectures can significantly reduce the number of activated parameters per token, enabling computationally efficient training and inference. However, their ...
- Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting : Abstract: Time series forecasting is crucial for the World Wide Web and represents a core technical challenge in ensuring the stable and efficient operation of modern web services, such as intelligent...
- MuCO: Generative Peptide Cyclization Empowered by Multi-stage Conformation Optimization : Abstract: Modeling peptide cyclization is critical for the virtual screening of candidate peptides with desirable physical and pharmaceutical properties. This task is challenging because a cyclic pept...
- TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning : Abstract: The rapid growth of electronics has accelerated the adoption of 2.5D integrated circuits, where effective automated chiplet placement is essential as systems scale to larger and more heterog...
- Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy : Abstract: Gradient signals in LLM training are highly anisotropic: recurrent linguistic structure concentrates energy into a small set of dominant spectral directions, while context specific informati...
- KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models : Abstract: Mixture of Experts (MoE) models have achieved great success by significantly improving performance while maintaining computational efficiency through sparse expert activation. However, their...
- From Instruction to Output: The Role of Prompting in Modern NLG : Abstract: Prompt engineering has emerged as an integral technique for extending the strengths and abilities of Large Language Models (LLMs) to gain significant performance gains in various Natural Lan...
- What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection : Abstract: Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to limited availability of labeled data. While large language models (LLMs) have shown strong transfer c...
- Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments : Abstract: Anticipating human activities and their durations is essential in applications such as smart-home automation, simulation-based architectural and urban design, activity-based transportation s...
- The Script Tax: Measuring Tokenization-Driven Efficiency and Latency Disparities in Multilingual Language Models : Abstract: Pretrained multilingual language models are often assumed to be script-agnostic, yet their tokenizers can impose systematic costs on certain writing systems. We quantify this script tax by c...
- Efficient Hyper-Parameter Search for LoRA via Language-aided Bayesian Optimization : Abstract: Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) enables resource-efficient personalization or specialization, but it comes at the expense of additional hyperparamete...
- Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis : Abstract: Transformer hidden states encode information as high-dimensional vectors, yet whether direction (orientation in representational space) and magnitude (vector norm) serve distinct functional ...
- Enhancing SDG-Text Classification with Combinatorial Fusion Analysis and Generative AI : Abstract: (Natural Language Processing) NLP techniques such as text classification and topic discovery are very useful in many application areas including information retrieval, knowledge discovery, p...
- Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering : Abstract: Large Language Models (LLMs) often hallucinate, generating nonsensical or false information that can be especially harmful in sensitive fields such as medicine or law. To study this phenomen...
- Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ? : Abstract: Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt large language models (LLMs) to downstream tasks and are often assumed to improve factual correctness. However, how th...
- Assessing LLM Reliability on Temporally Recent Open-Domain Questions : Abstract: Large Language Models (LLMs) are increasingly deployed for open-domain question answering, yet their alignment with human perspectives on temporally recent information remains underexplored....
- Automated Optimization Modeling via a Localizable Error-Driven Perspective : Abstract: Automated optimization modeling via Large Language Models (LLMs) has emerged as a promising approach to assist complex human decision-making. While post-training has become a pivotal techniq...
- Nested Named Entity Recognition in Plasma Physics Research Articles : Abstract: Named Entity Recognition (NER) is an important task in natural language processing that aims to identify and extract key entities from unstructured text. We present a novel application of NE...
- BIRD: A Museum Open Dataset Combining Behavior Patterns and Identity Types to Better Model Visitors' Experience : Abstract: Lack of data is a recurring problem in Artificial Intelligence, as it is essential for training and validating models. This is particularly true in the field of cultural heritage, where the ...
- Methodological Variation in Studying Staff and Student Perceptions of AI : Abstract: In this paper, we compare methodological approaches for comparing student and staff perceptions, and ask: how much do these measures vary across different approaches? We focus on the case of...
- HybridRAG: A Practical LLM-based ChatBot Framework based on Pre-Generated Q&A over Raw Unstructured Documents : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for grounding Large Language Model (LLM)-based chatbot responses on external knowledge. However, existing RAG studies ...
- Agentic Test-Time Scaling for WebAgents : Abstract: Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-unders...
- CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use : Abstract: AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforcement learning to such setting...
- Think like a Scientist: Physics-guided LLM Agent for Equation Discovery : Abstract: Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as promising tools for symbolic e...
- "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most : Abstract: Despite speech recognition systems achieving low word error rates on standard benchmarks, they often fail on short, high-stakes utterances in real-world deployments. Here, we study this fail...
- SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation : Abstract: Vision-language segmentation models such as SAM3 enable flexible, prompt-driven visual grounding, but inherit large, general-purpose text encoders originally designed for open-ended language...
- Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation : Abstract: Knowledge distillation from Large Language Models (LLMs) to smaller models has emerged as a critical technique for deploying efficient AI systems. However, current methods for distillation v...
- Statistical Parsing for Logical Information Retrieval : Abstract: In previous work (Coppola, 2024) we introduced the Quantified Boolean Bayesian Network (QBBN), a logical graphical model that implements the forward fragment of natural deduction (Prawitz, 1...
- Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision : Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scient...
- GPT-4o Lacks Core Features of Theory of Mind : Abstract: Do Large Language Models (LLMs) possess a Theory of Mind (ToM)? Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks...
- Seq2Seq2Seq: Lossless Data Compression via Discrete Latent Transformers and Reinforcement Learning : Abstract: Efficient lossless compression is essential for minimizing storage costs and transmission overhead while preserving data integrity. Traditional compression techniques, such as dictionary-bas...
- STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction : Abstract: As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods st...
- Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment : Abstract: Existing work on value alignment typically characterizes value relations statically, ignoring how interventions - such as prompting, fine-tuning, or preference optimization - reshape the bro...
- Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5 : Abstract: This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutr...
- HLA: Hadamard Linear Attention : Abstract: The attention mechanism is an important reason for the success of transformers. It relies on computing pairwise relations between tokens. To reduce the high computational cost of standard qu...
- Commencing-Student Enrolment Forecasting Under Data Sparsity with Time Series Foundation Models : Abstract: Many universities face increasing financial pressure and rely on accurate forecasts of commencing enrolments. However, enrolment forecasting in higher education is often data-sparse; annual ...
- Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty : Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning tasks by employing test-time scaling. However, they often generate over-long chains-of-thought tha...
- The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context : Abstract: In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature dat...
- Differentiable Modal Logic for Multi-Agent Diagnosis, Orchestration and Communication : Abstract: As multi-agent AI systems evolve from simple chatbots to autonomous swarms, debugging semantic failures requires reasoning about knowledge, belief, causality, and obligation, precisely what ...
- Tiny Recursive Reasoning with Mamba-2 Attention Hybrid : Abstract: Recent work on recursive reasoning models like TRM demonstrates that tiny networks (7M parameters) can achieve strong performance on abstract reasoning tasks through latent recursion -- iter...
- LawThinker: A Deep Research Legal Agent in Dynamic Environments : Abstract: Legal reasoning requires not only correct outcomes but also procedurally compliant reasoning processes. However, existing methods lack mechanisms to verify intermediate reasoning steps, allo...
- Multi UAVs Preflight Planning in a Shared and Dynamic Airspace : Abstract: Preflight planning for large-scale Unmanned Aerial Vehicle (UAV) fleets in dynamic, shared airspace presents significant challenges, including temporal No-Fly Zones (NFZs), heterogeneous veh...
- InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection : Abstract: Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs rema...
- CSEval: A Framework for Evaluating Clinical Semantics in Text-to-Image Generation : Abstract: Text-to-image generation has been increasingly applied in medical domains for various purposes such as data augmentation and education. Evaluating the quality and clinical reliability of the...
- Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments : Abstract: We introduce Gaia2, a benchmark for evaluating large language model agents in realistic, asynchronous environments. Unlike prior static or synchronous evaluations, Gaia2 introduces scenarios...
- MEME: Modeling the Evolutionary Modes of Financial Markets : Abstract: LLMs have demonstrated significant potential in quantitative finance by processing vast unstructured data to emulate human-like analytical workflows. However, current LLM-based methods prima...
- AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution : Abstract: Extracting signals through alpha factor mining is a fundamental challenge in quantitative finance. Existing automated methods primarily follow two paradigms: Decoupled Factor Generation, whi...
- When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation : Abstract: LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncer...
- From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders : Abstract: Sparse autoencoders (SAEs) have proven effective for extracting monosemantic features from large language models (LLMs), yet these features are typically identified in isolation. However, br...
- Intelligent AI Delegation : Abstract: AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and saf...
- Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models : Abstract: Dynamic maps (DM) serve as the fundamental information infrastructure for vehicle-road-cloud (VRC) cooperative autonomous driving in China and Japan. By providing comprehensive traffic scene...
- Prototype Transformer: Towards Language Model Architectures Interpretable by Design : Abstract: While state-of-the-art language models (LMs) surpass the vast majority of humans in certain domains, their reasoning remains largely opaque, undermining trust in their output. Furthermore, w...
- Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models : Abstract: Despite the advanced capabilities of Large Vision-Language Models (LVLMs), they frequently suffer from object hallucination. One reason is that visual features and pretrained textual represe...
- Predicting LLM Output Length via Entropy-Guided Representations : Abstract: The long-tailed distribution of sequence lengths in LLM serving and reinforcement learning (RL) sampling causes significant computational waste due to excessive padding in batched inference....
- PuYun-LDM: A Latent Diffusion Model for High-Resolution Ensemble Weather Forecasts : Abstract: Latent diffusion models (LDMs) suffer from limited diffusability in high-resolution (<=0.25°) ensemble weather forecasting, where diffusability characterizes how easily a latent data distrib...
- Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation : Abstract: Multi-modal recommendation has gained traction as items possess rich attributes like text and images. Semantic ID-based approaches effectively discretize this information into compact tokens...
- Detecting RLVR Training Data via Structural Convergence of Reasoning : Abstract: Reinforcement learning with verifiable rewards (RLVR) is central to training modern reasoning models, but the undisclosed training data raises concerns about benchmark contamination. Unlike ...
- Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation : Abstract: Although recent end-to-end video generation models demonstrate impressive performance in visually oriented content creation, they remain limited in scenarios that require strict logical rigo...
- FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning : Abstract: LLMs can solve complex tasks through reasoning and tool use, but accurately translating these solutions into structured workflows remains challenging. We model workflows as sequences of tool...
- RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation : Abstract: In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value. Existing industrial systems typically follow a two-stage paradigm, w...
- How to Optimize Multispecies Set Predictions in Presence-Absence Modeling ? : Abstract: Species distribution models (SDMs) commonly produce probabilistic occurrence predictions that must be converted into binary presence-absence maps for ecological inference and conservation pl...
- TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents : Abstract: Advances in large language models (LLMs) are driving a shift toward using reinforcement learning (RL) to train agents from iterative, multi-turn interactions across tasks. However, multi-tur...
- AIR: Improving Agent Safety through Incident Response : Abstract: Large Language Model (LLM) agents are increasingly deployed in practice across a wide range of autonomous applications. Yet current safety mechanisms for LLM agents focus almost exclusively ...
- Text2GQL-Bench: A Text to Graph Query Language Benchmark [Experiment, Analysis & Benchmark] : Abstract: Graph models are fundamental to data analysis in domains rich with complex relationships. Text-to-Graph-Query-Language (Text-to-GQL) systems act as a translator, converting natural language ...
- Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs : Abstract: Model diffing, the process of comparing models' internal representations to identify their differences, is a promising approach for uncovering safety-critical behaviors in new models. Howeve...
- Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging : Abstract: Model merging has emerged as a promising paradigm for composing the capabilities of large language models by directly operating in weight space, enabling the integration of specialized model...
- ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces : Abstract: Recent work explores latent reasoning to improve reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space, yet its effectiveness va...
- Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing : Abstract: Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art model...
- Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs : Abstract: Machine learning systems that are "right for the wrong reasons" achieve high performance through shortcuts that collapse under distributional shift. We show this pathology has a precise caus...
- Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs : Abstract: Large Language Models (LLMs) are advancing rapidly, yet the benchmarks used to measure this progress are becoming increasingly unreliable. Score inflation and selective reporting have eroded...
- PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics : Abstract: The deployment of autonomous agents for Computational Fluid Dynamics (CFD), is critically limited by the probabilistic nature of Large Language Models (LLMs), which struggle to enforce the s...
- Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm : Abstract: While reinforcement learning for large language model alignment has progressed rapidly in recent years, transferring these paradigms to high-stakes medical question answering reveals a funda...
- Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation : Abstract: Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity t...
- Neuro-Symbolic Multitasking: A Unified Framework for Discovering Generalizable Solutions to PDE Families : Abstract: Solving Partial Differential Equations (PDEs) is fundamental to numerous scientific and engineering disciplines. A common challenge arises from solving the PDE families, which are characteri...
- When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents : Abstract: Run the same LLM agent on the same task twice: do you get the same behavior? We find the answer is often no. In a study of 3,000 agent runs across three models (Llama 3.1 70B, GPT-4o, and Cl...
- scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery : Abstract: We present scPilot, the first systematic framework to practice omics-native reasoning: a large language model (LLM) converses in natural language while directly inspecting single-cell RNA-se...
- MAPLE: Modality-Aware Post-training and Learning Ecosystem : Abstract: Multimodal language models now integrate text, audio, and video for unified reasoning. Yet existing RL post-training pipelines treat all input signals as equally relevant, ignoring which mod...
- The Five Ws of Multi-Agent Communication: Who Talks to Whom, When, What, and Why -- A Survey from MARL to Emergent Language and LLMs : Abstract: Multi-agent sequential decision-making powers many real-world systems, from autonomous vehicles and robotics to collaborative AI assistants. In dynamic, partially observable environments, co...
- Learning to Configure Agentic AI Systems : Abstract: Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed large temp...
- SemaPop: Semantic-Persona Conditioned Population Synthesis : Abstract: Population synthesis is a critical component of individual-level socio-economic simulation, yet remains challenging due to the need to jointly represent statistical structure and latent beha...
- Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use : Abstract: We study budget-constrained tool-augmented agents, where a large language model must solve multi-step tasks by invoking external tools under a strict monetary budget. We formalize this setti...
- CausalAgent: A Conversational Multi-Agent System for End-to-End Causal Inference : Abstract: Causal inference holds immense value in fields such as healthcare, economics, and social sciences. However, traditional causal analysis workflows impose significant technical barriers, requi...
- Human-Inspired Continuous Learning of Internal Reasoning Processes: Learning How to Think for Adaptive AI Systems : Abstract: Learning internal reasoning processes is crucial for developing AI systems capable of sustained adaptation in dynamic real-world environments. However, most existing approaches primarily emp...
- AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems : Abstract: Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent message...
- Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), yet how visual evidence is integrated...
- Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization : Abstract: Cooperative multi-agent reinforcement learning (MARL) commonly adopts centralized training with decentralized execution, where value-factorization methods enforce the individual-global-maxim...
- TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning : Abstract: Estimating uncertainty for AI agents in real-world multi-turn tool-using interaction with humans is difficult because failures are often triggered by sparse critical episodes (e.g., looping,...
- GHOST: Unmasking Phantom States in Mamba2 via Grouped Hidden-state Output-aware Selection & Truncation : Abstract: While Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning metho...
- Causal-JEPA: Learning World Models through Object-Level Latent Interventions : Abstract: World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient...
- ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences : Abstract: The literature has witnessed an emerging interest in AI agents for automated assessment of scientific papers. Existing benchmarks focus primarily on the computational aspect of this task, te...
- Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization : Abstract: Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making ...
- AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition : Abstract: Recent advances in large language models have enabled LLM-based agents to achieve strong performance on a variety of benchmarks. However, their performance in real-world deployments often th...
- Bi-Level Prompt Optimization for Multimodal LLM-as-a-Judge : Abstract: Large language models (LLMs) have become widely adopted as automated judges for evaluating AI-generated content. Despite their success, aligning LLM-based evaluations with human judgments re...
- Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation : Abstract: In machine learning, "ground truth" refers to the assumed correct labels used to train and evaluate models. However, the foundational "ground truth" paradigm rests on a positivistic fallacy ...
- The PBSAI Governance Ecosystem: A Multi-Agent AI Reference Architecture for Securing Enterprise AI Estates : Abstract: Enterprises are rapidly deploying large language models, retrieval augmented generation pipelines, and tool using agents into production, often on shared high performance computing clusters ...
- Voxtral Realtime : Abstract: We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline ...
- On Decision-Valued Maps and Representational Dependence : Abstract: A computational engine applied to different representations of the same data can produce different discrete outcomes, with some representations preserving the result and others changing it e...
- Latent Generative Solvers for Generalizable Long-Term Physics Simulation : Abstract: We study long-horizon surrogate simulation across heterogeneous PDE systems. We introduce Latent Generative Solvers (LGS), a two-stage framework that (i) maps diverse PDE states into a share...
- Explaining AI Without Code: A User Study on Explainable AI : Abstract: The increasing use of Machine Learning (ML) in sensitive domains such as healthcare, finance, and public policy has raised concerns about the transparency of automated decisions. Explainable...
Research Sources: 486 | Generated: 2/13/2026
