AI Research News Feeds for January 7th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

The Journal of Prompt-Engineered Philosophy Or: How I Started to Track AI Assistance and Stopped Worrying About Slop
A Survey on Failure Analysis and Fault Injection in AI Systems
TextBO: Bayesian Optimization in Language Space for Eval-Efficient Self-Improving AI
Patient-Zero: Scaling Synthetic Patient Agents to Real-World Distributions without Real Patient Data
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
An Uncertainty-Aware Generalization Framework for Cardiovascular Image Segmentation
Topological Perspectives on Optimal Multimodal Embedding Spaces
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
The Fake Friend Dilemma: Trust and the Political Economy of Conversational AI
Recursive querying of neural networks via weighted structures
Validating Generalist Robots with Situation Calculus and STL Falsification
JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
LOST-3DSG: Lightweight Open-Vocabulary 3D Scene Graphs with Semantic Tracking in Dynamic Environments
Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation
UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction
Netflix Artwork Personalization via LLM Post-training
Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism
Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices
Privacy-Preserving AI-Enabled Decentralized Learning and Employment Records System
CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory
Multi-channel multi-speaker transformer for speech recognition
Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search
TAAF: A Trace Abstraction and Analysis Framework Synergizing Knowledge Graphs and LLMs
LAsset: An LLM-assisted Security Asset Identification Framework for System-on-Chip (SoC) Verification
LongDA: Benchmarking LLM Agents for Long-Document Data Analysis
AI-exposed jobs deteriorated before ChatGPT
Enhancing Debugging Skills with AI-Powered Assistance: A Real-Time Tool for Debugging Support
The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance
Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance
A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design
Socially-Aware Recommender Systems Mitigate Opinion Clusterization
The Vibe-Check Protocol: Quantifying Cognitive Offloading in AI Programming
ProSoftArena: Benchmarking Hierarchical Capabilities of Multimodal Agents in Professional Software Environments
AI-Native Integrated Sensing and Communications for Self-Organizing Wireless Networks: Architectures, Learning Paradigms, and System-Level Design
Tree of Preferences for Diversified Recommendation
Base Station Deployment under EMF constrain by Deep Reinforcement learning
The Refutability Gap: Challenges in Validating Reasoning by Large Language Models
Movement Primitives in Robotics: A Comprehensive Survey
LeafTutor: An AI Agent for Programming Assignment Tutoring
Permission Manifests for Web Agents
Distillation-based Scenario-Adaptive Mixture-of-Experts for the Matching Stage of Multi-scenario Recommendation
TextBridgeGNN: Pre-training Graph Neural Network for Cross-Domain Recommendation via Text-Guided Transfer
Towards Trustworthy LLM-Based Recommendation via Rationale Integration
The Impact of LLM-Generated Reviews on Recommender Systems: Textual Shifts, Performance Effects, and Strategic Platform Control
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents
A framework for assuring the accuracy and fidelity of an AI-enabled Digital Twin of en route UK airspace
Rationale-Grounded In-Context Learning for Time Series Reasoning with Multimodal Large Language Models
Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning
SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection
M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?
Sample-Efficient Neurosymbolic Deep Reinforcement Learning
Quantum-enhanced long short-term memory with attention for spatial permeability prediction in oilfield reservoirs
Causal-Enhanced AI Agents for Medical Research Screening
LLM Agent Framework for Intelligent Change Analysis in Urban Environment using Remote Sensing Imagery
The Path Ahead for Agentic AI: Challenges and Opportunities
Learning User Preferences Through Interaction for Long-Term Collaboration
Learning from Prompt itself: the Hierarchical Attribution Prompt Optimization
Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks
AWARE-US: Benchmark for Preference-Aware Resolution in Tool-Calling Agents
An Empirical Study of On-Device Translation for Real-Time Live-Stream Chat on Mobile Devices
Orchestral AI: A Framework for Agent Orchestration
SimpleMem: Efficient Lifelong Memory for LLM Agents
Textual Explanations and Their Evaluations for Reinforcement Learning Policy
FCC: Fully Connected Correlation for One-Shot Segmentation : Abstract: Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks. Therefore, having strong prior information for the target o...
How Many Images Does It Take? Estimating Imitation Thresholds in Text-to-Image Models : Abstract: Text-to-image models are trained using large datasets of image-text pairs collected from the internet. These datasets often include copyrighted and private images. Training models on such da...
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion : Abstract: Data-fusion networks have shown significant promise for RGB-thermal scene parsing. However, the majority of existing studies have relied on symmetric duplex encoders for heterogeneous featur...
Teeth3DS+: An Extended Benchmark for Intraoral 3D Scans Analysis : Abstract: Intraoral 3D scanning is now widely adopted in modern dentistry and plays a central role in supporting key tasks such as tooth segmentation, detection, labeling, and dental landmark identifi...
Transformers self-organize like newborn visual systems when trained in prenatal worlds : Abstract: Do transformers learn like brains? A key challenge in addressing this question is that transformers and brains are trained on fundamentally different data. Brains are initially "trained" on ...
DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations : Abstract: Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, su...
Lesion Segmentation in FDG-PET/CT Using Swin Transformer U-Net 3D: A Robust Deep Learning Framework : Abstract: Accurate and automated lesion segmentation in Positron Emission Tomography / Computed Tomography (PET/CT) imaging is essential for cancer diagnosis and therapy planning. This paper presents ...
Omni2Sound: Towards Unified Video-Text-to-Audio Generation : Abstract: Training a unified model integrating video-to-audio (V2A), text-to-audio (T2A), and joint video-text-to-audio (VT2A) generation offers significant application flexibility, yet faces two unex...
Loop Closure using AnyLoc Visual Place Recognition in DPV-SLAM : Abstract: Loop closure is crucial for maintaining the accuracy and consistency of visual SLAM. We propose a method to improve loop closure performance in DPV-SLAM. Our approach integrates AnyLoc, a le...
Annealed Langevin Posterior Sampling (ALPS): A Rapid Algorithm for Image Restoration with Multiscale Energy Models : Abstract: Solving inverse problems in imaging requires models that support efficient inference, uncertainty quantification, and principled probabilistic reasoning. Energy-Based Models (EBMs), with the...
Comparative Analysis of Binarization Methods For Medical Image Hashing On Odir Dataset : Abstract: In this study, we evaluated four binarization methods. Locality-Sensitive Hashing (LSH), Iterative Quantization (ITQ), Kernel-based Supervised Hashing (KSH), and Supervised Discrete Hashing ...
A Green Solution for Breast Region Segmentation Using Deep Active Learning : Abstract: Purpose: Annotation of medical breast images is an essential step toward better diagnostic but a time consuming task. This research aims to focus on different selecting sample strategies wit...
Muses: Designing, Composing, Generating Nonexistent Fantasy 3D Creatures without Training : Abstract: We present Muses, the first training-free method for fantastic 3D creature generation in a feed-forward paradigm. Previous methods, which rely on part-aware optimization, manual assembly, or...
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields : Abstract: Existing depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and h...
A Versatile Multimodal Agent for Multimedia Content Generation : Abstract: With the advancement of AIGC (AI-generated content) technologies, an increasing number of generative models are revolutionizing fields such as video editing, music generation, and even film ...
LTX-2: Efficient Joint Audio-Visual Foundation Model : Abstract: Recent text-to-video diffusion models can generate compelling video sequences, yet they remain silent -- missing the semantic, emotional, and atmospheric cues that audio provides. We introdu...
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision : Abstract: While Unified Multimodal Models (UMMs) have achieved remarkable success in cross-modal comprehension, a significant gap persists in their ability to leverage such internal knowledge for high...
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation : Abstract: Diffusion models have achieved remarkable success in image and video generation. However, their inherently multiple step inference process imposes substantial computational overhead, hinderi...
LSP-DETR: Efficient and Scalable Nuclei Segmentation in Whole Slide Images : Abstract: Precise and scalable instance segmentation of cell nuclei is essential for computational pathology, yet gigapixel Whole-Slide Images pose major computational challenges. Existing approaches ...
Unified Thinker: A General Reasoning Modular Core for Image Generation : Abstract: Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning--execution gap. Mea...
Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs : Abstract: Multimodal large language models (MLLMs) typically rely on a single late-layer feature from a frozen vision encoder, leaving the encoder's rich hierarchy of visual cues under-utilized. MLLMs...
LesionTABE: Equitable AI for Skin Lesion Detection : Abstract: Bias remains a major barrier to the clinical adoption of AI in dermatology, as diagnostic models underperform on darker skin tones. We present LesionTABE, a fairness-centric framework that c...
Understanding Multi-Agent Reasoning with Large Language Models for Cartoon VQA : Abstract: Visual Question Answering (VQA) for stylised cartoon imagery presents challenges, such as interpreting exaggerated visual abstraction and narrative-driven context, which are not adequately a...
Fine-Grained Generalization via Structuralizing Concept and Feature Space into Commonality, Specificity and Confounding : Abstract: Fine-Grained Domain Generalization (FGDG) presents greater challenges than conventional domain generalization due to the subtle inter-class differences and relatively pronounced intra-class ...
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation : Abstract: Recent research on medical MLLMs has gradually shifted its focus from image-level understanding to fine-grained, pixel-level comprehension. Although segmentation serves as the foundation for...
On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning : Abstract: Vision Transformers (ViTs) excel in semantic recognition but exhibit systematic failures in spatial reasoning tasks such as mental rotation. While often attributed to data scale, we propose ...
Motion Blur Robust Wheat Pest Damage Detection with Dynamic Fuzzy Feature Fusion : Abstract: Motion blur caused by camera shake produces ghosting artifacts that substantially degrade edge side object detection. Existing approaches either suppress blur as noise and lose discriminativ...
SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting for Next Best View Selection : Abstract: We propose Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS), a novel framework to stabilize uncertainty quantification and enhancing uncertainty-aware supervision in next-best-view (...
ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios : Abstract: Corner cases are rare or extreme scenarios that drive real-world failures, but they are difficult to curate at scale: web data are noisy, labels are brittle, and edge deployments preclude la...
Towards Efficient 3D Object Detection for Vehicle-Infrastructure Collaboration via Risk-Intent Selection : Abstract: Vehicle-Infrastructure Collaborative Perception (VICP) is pivotal for resolving occlusion in autonomous driving, yet the trade-off between communication bandwidth and feature redundancy rema...
Towards Faithful Reasoning in Comics for Small MLLMs : Abstract: Comic-based visual question answering (CVQA) poses distinct challenges to multimodal large language models (MLLMs) due to its reliance on symbolic abstraction, narrative logic, and humor, wh...
ULS+: Data-driven Model Adaptation Enhances Lesion Segmentation : Abstract: In this study, we present ULS+, an enhanced version of the Universal Lesion Segmentation (ULS) model. The original ULS model segments lesions across the whole body in CT scans given volumes ...
LAMS-Edit: Latent and Attention Mixing with Schedulers for Improved Content Preservation in Diffusion-Based Image and Style Editing : Abstract: Text-to-Image editing using diffusion models faces challenges in balancing content preservation with edit application and handling real-image editing. To address these, we propose LAMS-Edit,...
VTONQA: A Multi-Dimensional Quality Assessment Dataset for Virtual Try-on : Abstract: With the rapid development of e-commerce and digital fashion, image-based virtual try-on (VTON) has attracted increasing attention. However, existing VTON models often suffer from artifacts ...
HybridSolarNet: A Lightweight and Explainable EfficientNet-CBAM Architecture for Real-Time Solar Panel Fault Detection : Abstract: Manual inspections for solar panel systems are a tedious, costly, and error-prone task, making it desirable for Unmanned Aerial Vehicle (UAV) based monitoring. Though deep learning models ha...
PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding : Abstract: Video Anomaly Understanding (VAU) extends traditional Video Anomaly Detection (VAD) by not only localizing anomalies but also describing and reasoning about their context. Existing VAU appro...
DCG ReID: Disentangling Collaboration and Guidance Fusion Representations for Multi-modal Vehicle Re-Identification : Abstract: Multi-modal vehicle Re-Identification (ReID) aims to leverage complementary information from RGB, Near Infrared (NIR), and Thermal Infrared (TIR) modalities to retrieve the same vehicle. The...
Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning : Abstract: Image Quality Assessment (IQA) is a long-standing problem in computer vision. Previous methods typically focus on predicting numerical scores without explanation or provide low-level descrip...
Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion : Abstract: This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting th...
Breaking Self-Attention Failure: Rethinking Query Initialization for Infrared Small Target Detection : Abstract: Infrared small target detection (IRSTD) faces significant challenges due to the low signal-to-noise ratio (SNR), small target size, and complex cluttered backgrounds. Although recent DETR-ba...
DGA-Net: Enhancing SAM with Depth Prompting and Graph-Anchor Guidance for Camouflaged Object Detection : Abstract: To fully exploit depth cues in Camouflaged Object Detection (COD), we present DGA-Net, a specialized framework that adapts the Segment Anything Model (SAM) via a novel ``depth prompting" par...
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models : Abstract: Despite the empirical success of extensive, step-by-step reasoning in large multimodal models, long reasoning processes inevitably incur substantial computational overhead, i.e., in terms of...
Topology-aware Pathological Consistency Matching for Weakly-Paired IHC Virtual Staining : Abstract: Immunohistochemical (IHC) staining provides crucial molecular characterization of tissue samples and plays an indispensable role in the clinical examination and diagnosis of cancers. However...
StableDPT: Temporal Stable Monocular Video Depth Estimation : Abstract: Applying single image Monocular Depth Estimation (MDE) models to video sequences introduces significant temporal instability and flickering artifacts. We propose a novel approach that adapts...
Textile IR: A Bidirectional Intermediate Representation for Physics-Aware Fashion CAD : Abstract: We introduce Textile IR, a bidirectional intermediate representation that connects manufacturing-valid CAD, physics-based simulation, and lifecycle assessment for fashion design. Unlike exis...
DreamStyle: A Unified Framework for Video Stylization : Abstract: Video stylization, an important downstream task of video generation models, has not yet been thoroughly explored. Its input style conditions typically include text, style image, and stylized...
EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework : Abstract: Earth vision has achieved milestones in geospatial object recognition but lacks exploration in object-relational reasoning, limiting comprehensive scene understanding. To address this, a pro...
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs : Abstract: Visual abductive reasoning (VAR) is a challenging task that requires AI systems to infer the most likely explanation for incomplete visual observations. While recent MLLMs develop strong gen...
ClearAIR: A Human-Visual-Perception-Inspired All-in-One Image Restoration : Abstract: All-in-One Image Restoration (AiOIR) has advanced significantly, offering promising solutions for complex real-world degradations. However, most existing approaches rely heavily on degradati...
AnyDepth: Depth Estimation Made Easy : Abstract: Monocular depth estimation aims to recover the depth information of 3D scenes from 2D images. Recent work has made significant progress, but its reliance on large-scale datasets and complex ...
Towards Zero-Shot Point Cloud Registration Across Diverse Scales, Scenes, and Sensor Setups : Abstract: Some deep learning-based point cloud registration methods struggle with zero-shot generalization, often requiring dataset-specific hyperparameter tuning or retraining for new environments. W...
D$^3$R-DETR: DETR with Dual-Domain Density Refinement for Tiny Object Detection in Aerial Images : Abstract: Detecting tiny objects plays a vital role in remote sensing intelligent interpretation, as these objects often carry critical information for downstream applications. However, due to the ext...
Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench : Abstract: While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in tasks such as abnormality detection and report generation for anatomical modalities, their capabili...
HOLO: Homography-Guided Pose Estimator Network for Fine-Grained Visual Localization on SD Maps : Abstract: Visual localization on standard-definition (SD) maps has emerged as a promising low-cost and scalable solution for autonomous driving. However, existing regression-based approaches often ove...
Foreground-Aware Dataset Distillation via Dynamic Patch Selection : Abstract: In this paper, we propose a foreground-aware dataset distillation method that enhances patch selection in a content-adaptive manner. With the rising computational cost of training large-scal...
Robust Mesh Saliency GT Acquisition in VR via View Cone Sampling and Geometric Smoothing : Abstract: Reliable 3D mesh saliency ground truth (GT) is essential for human-centric visual modeling in virtual reality (VR). However, current 3D mesh saliency GT acquisition methods are generally con...
CAMO: Category-Agnostic 3D Motion Transfer from Monocular 2D Videos : Abstract: Motion transfer from 2D videos to 3D assets is a challenging problem, due to inherent pose ambiguities and diverse object shapes, often requiring category-specific parametric templates. We p...
GRRE: Leveraging G-Channel Removed Reconstruction Error for Robust Detection of AI-Generated Images : Abstract: The rapid progress of generative models, particularly diffusion models and GANs, has greatly increased the difficulty of distinguishing synthetic images from real ones. Although numerous det...
DreamLoop: Controllable Cinemagraph Generation from a Single Photograph : Abstract: Cinemagraphs, which combine static photographs with selective, looping motion, offer unique artistic appeal. Generating them from a single photograph in a controllable manner is particularly...
Shallow- and Deep-fake Image Manipulation Localization Using Vision Mamba and Guided Graph Neural Network : Abstract: Image manipulation localization is a critical research task, given that forged images may have a significant societal impact of various aspects. Such image manipulations can be produced usin...
MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark : Abstract: Understanding real-world videos such as movies requires integrating visual and dialogue cues to answer complex questions. Yet existing VideoQA benchmarks struggle to capture this multimodal ...
CT Scans As Video: Efficient Intracranial Hemorrhage Detection Using Multi-Object Tracking : Abstract: Automated analysis of volumetric medical imaging on edge devices is severely constrained by the high memory and computational demands of 3D Convolutional Neural Networks (CNNs). This paper d...
PatchAlign3D: Local Feature Alignment for Dense 3D Shape understanding : Abstract: Current foundation models for 3D shapes excel at global tasks (retrieval, classification) but transfer poorly to local part-level reasoning. Recent approaches leverage vision and language fo...
Don't Mind the Gaps: Implicit Neural Representations for Resolution-Agnostic Retinal OCT Analysis : Abstract: Routine clinical imaging of the retina using optical coherence tomography (OCT) is performed with large slice spacing, resulting in highly anisotropic images and a sparsely scanned retina. M...
Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative : Abstract: Multimodal large language models (MLLMs) show promising performance on medical visual question answering (VQA) and report generation, but these generation and explanation abilities do not re...
Understanding Pure Textual Reasoning for Blind Image Quality Assessment : Abstract: Textual reasoning has recently been widely adopted in Blind Image Quality Assessment (BIQA). However, it remains unclear how textual information contributes to quality prediction and to what...
Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning : Abstract: Multi-modal reasoning requires the seamless integration of visual and linguistic cues, yet existing Chain-of-Thought methods suffer from two critical limitations in cross-modal scenarios: (1...
Multimodal Sentiment Analysis based on Multi-channel and Symmetric Mutual Promotion Feature Fusion : Abstract: Multimodal sentiment analysis is a key technology in the fields of human-computer interaction and affective computing. Accurately recognizing human emotional states is crucial for facilitati...
MIAR: Modality Interaction and Alignment Representation Fuison for Multimodal Emotion : Abstract: Multimodal Emotion Recognition (MER) aims to perceive human emotions through three modes: language, vision, and audio. Previous methods primarily focused on modal fusion without adequately a...
Expert-Guided Explainable Few-Shot Learning with Active Sample Selection for Medical Image Analysis : Abstract: Medical image analysis faces two critical challenges: scarcity of labeled data and lack of model interpretability, both hindering clinical AI deployment. Few-shot learning (FSL) addresses da...
Self-Supervised Masked Autoencoders with Dense-Unet for Coronary Calcium Removal in limited CT Data : Abstract: Coronary calcification creates blooming artifacts in Computed Tomography Angiography (CTA), severely hampering the diagnosis of lumen stenosis. While Deep Convolutional Neural Networks (DCNN...
AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise : Abstract: While individual components of agentic architectures have been studied in isolation, there remains limited empirical understanding of how different design dimensions interact within complex ...
Emergence and Localisation of Semantic Role Circuits in LLMs : Abstract: Despite displaying semantic competence, large language models' internal mechanisms that ground abstract semantic structure remain insufficiently characterised. We propose a method integratin...
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation : Abstract: We show closed models possess much thematic fit knowledge and set a new state of the art, while open models also seem to capture much relevant knowledge (in semantic filtering), but yield lo...
Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers : Abstract: In enterprise search, building high-quality datasets at scale remains a central challenge due to the difficulty of acquiring labeled data. To resolve this challenge, we propose an efficient ...
Multi-Modal Data-Enhanced Foundation Models for Prediction and Control in Wireless Networks: A Survey : Abstract: Foundation models (FMs) are recognized as a transformative breakthrough that has started to reshape the future of artificial intelligence (AI) across both academia and industry. The integrat...
Accurate Table Question Answering with Accessible LLMs : Abstract: Given a table T in a database and a question Q in natural language, the table question answering (TQA) task aims to return an accurate answer to Q based on the content of T. Recent state-of-...
Automatic Prompt Engineering with No Task Cues and No Tuning : Abstract: This paper presents a system for automatic prompt engineering that is much simpler in both design and application and yet as effective as the existing approaches. It requires no tuning and n...
DNACHUNKER: Learnable Tokenization for DNA Language Models : Abstract: DNA language models have emerged as powerful tools for decoding the complex language of DNA sequences. However, the performance of these models is heavily affected by their tokenization stra...
SastBench: A Benchmark for Testing Agentic SAST Triage : Abstract: SAST (Static Application Security Testing) tools are among the most widely used techniques in defensive cybersecurity, employed by commercial and non-commercial organizations to identify pot...
Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning : Abstract: Symbolic logical reasoning is a critical yet underexplored capability of large language models (LLMs), providing reliable and verifiable decision-making in high-stakes domains such as mathem...
ReTreVal: Reasoning Tree with Validation - A Hybrid Framework for Enhanced LLM Multi-Step Reasoning : Abstract: Multi-step reasoning remains a key challenge for Large Language Models (LLMs), particularly in complex domains such as mathematics and creative writing. While recent approaches including ReA...
Time-Scaling Is What Agents Need Now : Abstract: Early artificial intelligence paradigms exhibited separated cognitive functions: Neural Networks focused on "perception-representation," Reinforcement Learning on "decision-making-behavior,"...
Dynamic Quantization Error Propagation in Encoder-Decoder ASR Quantization : Abstract: Running Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires efficient compression. While layer-wise post-training quantization is effective, it suffers from...
Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis : Abstract: Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakag...
LLM-as-evaluator in Strategy Research: A Normative, Variance-Aware Protocol : Abstract: Large language models (LLMs) are becoming essential tools for strategy scholars who need to evaluate text corpora at scale. This paper provides a systematic analysis of the reliability of LL...
Automated Semantic Rules Detection (ASRD) for Emergent Communication Interpretation : Abstract: The field of emergent communication within multi-agent systems examines how autonomous agents can independently develop communication strategies, without explicit programming, and adapt them...
STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning : Abstract: Spatio-temporal reasoning in time series involves the explicit synthesis of temporal dynamics, spatial dependencies, and textual context. This capability is vital for high-stakes decision-ma...
Multi-RADS Synthetic Radiology Report Dataset and Head-to-Head Benchmarking of 41 Open-Weight and Proprietary Language Models : Abstract: Background: Reporting and Data Systems (RADS) standardize radiology risk communication but automated RADS assignment from narrative reports is challenging because of guideline complexity, ou...
MalruleLib: Large-Scale Executable Misconception Reasoning with Step Traces for Modeling Student Thinking in Mathematics : Abstract: Student mistakes in mathematics are often systematic: a learner applies a coherent but wrong procedure and repeats it across contexts. We introduce MalruleLib, a learning-science-grounded fr...
UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward : Abstract: While Large Language Models (LLMs) have demonstrated significant potential in natural language processing , complex general-purpose reasoning requiring multi-step logic, planning, and verifi...
DIP: Dynamic In-Context Planner For Diffusion Language Models : Abstract: Diffusion language models (DLMs) have shown strong potential for general natural language tasks with in-context examples. However, due to the bidirectional attention mechanism, DLMs incur su...
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework : Abstract: Hate speech detection on social media faces challenges in both accuracy and explainability, especially for underexplored Indic languages. We propose a novel explainability-guided training fr...
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory : Abstract: The hallmark of human intelligence is the ability to master new skills through Constructive Episodic Simulation-retrieving past experiences to synthesize solutions for novel tasks. While Lar...
Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning : Abstract: Machine unlearning aims to forget sensitive knowledge from Large Language Models (LLMs) while maintaining general utility. However, existing approaches typically treat all tokens in a respon...
WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning : Abstract: Large Language Model(LLM)-based agents have shown strong capabilities in web information seeking, with reinforcement learning (RL) becoming a key optimization paradigm. However, planning rem...
Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective : Abstract: Reasoning-tuned LLMs utilizing long Chain-of-Thought (CoT) excel at single-answer tasks, yet their ability to model Human Label Variation--which requires capturing probabilistic ambiguity ra...
Self-Verification is All You Need To Pass The Japanese Bar Examination : Abstract: Despite rapid advances in large language models (LLMs), achieving reliable performance on highly professional and structured examinations remains a significant challenge. The Japanese bar ex...
Limited Linguistic Diversity in Embodied AI Datasets : Abstract: Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. I...
Improving Indigenous Language Machine Translation with Synthetic Data and Language-Specific Preprocessing : Abstract: Low-resource indigenous languages often lack the parallel corpora required for effective neural machine translation (NMT). Synthetic data generation offers a practical strategy for mitigatin...
The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs : Abstract: As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that single-turn safety evaluations fail to capture. We...
Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models : Abstract: Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first ...
Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs : Abstract: Humor preferences vary widely across individuals and cultures, complicating the evaluation of humor using large language models (LLMs). In this study, we model heterogeneity in humor prefere...
Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models : Abstract: Moral sensitivity is fundamental to human moral competence, as it guides individuals in regulating everyday behavior. Although many approaches seek to align large language models (LLMs) with...
Detecting Hallucinations in Retrieval-Augmented Generation via Semantic-level Internal Reasoning Graph : Abstract: The Retrieval-augmented generation (RAG) system based on Large language model (LLM) has made significant progress. It can effectively reduce factuality hallucinations, but faithfulness hallu...
BaseCal: Unsupervised Confidence Calibration via Base Model Signals : Abstract: Reliable confidence is essential for trusting the outputs of LLMs, yet widely deployed post-trained LLMs (PoLLMs) typically compromise this trust with severe overconfidence. In contrast, we ...
NorwAI's Large Language Models: Technical Report : Abstract: Norwegian, spoken by approximately five million people, remains underrepresented in many of the most significant breakthroughs in Natural Language Processing (NLP). To address this gap, the ...
Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning : Abstract: Preference alignment methods such as RLHF and Direct Preference Optimization (DPO) improve instruction following, but they can also reinforce hallucinations when preference judgments reward ...
LittiChoQA: Literary Texts in Indic Languages Chosen for Question Answering : Abstract: Long-context question answering (QA) over literary texts poses significant challenges for modern large language models, particularly in low-resource languages. We address the scarcity of lon...
MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models : Abstract: Medical conversational AI (AI) plays a pivotal role in the development of safer and more effective medical dialogue systems. However, existing benchmarks and evaluation frameworks for assess...
MMFormalizer: Multimodal Autoformalization in the Wild : Abstract: Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of ...
SentGraph: Hierarchical Sentence Graph for Multi-hop Retrieval-Augmented Question Answering : Abstract: Traditional Retrieval-Augmented Generation (RAG) effectively supports single-hop question answering with large language models but faces significant limitations in multi-hop question answeri...
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners : Abstract: Large reasoning models (LRMs) achieve strong performance on mathematical reasoning tasks, often attributed to their capability to generate explicit chain-of-thought (CoT) explanations. Howev...
Stable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has become a key paradigm for reducing factual hallucinations in large language models (LLMs), yet little is known about how the order of retrieved docum...
Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy : Abstract: Large language models (LLMs), despite strong performance on complex mathematical problems, exhibit systematic limitations in counting tasks. This issue arises from architectural limits of tr...
P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist : Abstract: Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approa...
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders : Abstract: Recent work in Mechanistic Interpretability (MI) has enabled the identification and intervention of internal features in Large Language Models (LLMs). However, a persistent challenge lies in...
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning : Abstract: The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of...
LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation : Abstract: This paper introduces a novel changepoint detection framework that combines ensemble statistical methods with Large Language Models (LLMs) to enhance both detection accuracy and the interpre...
Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion : Abstract: Multilingual Retrieval-Augmented Generation (mRAG) systems often exhibit a perceived preference for high-resource languages, particularly English, resulting in the widespread adoption of Eng...
Pearmut: Human Evaluation of Translation Made Trivial : Abstract: Human evaluation is the gold standard for multilingual NLP, but is often skipped in practice and substituted with automatic metrics, because it is notoriously complex and slow to set up with...
Memorization, Emergence, and Explaining Reversal Failures: A Controlled Study of Relational Semantics in LLMs : Abstract: Autoregressive LLMs perform well on relational tasks that require linking entities via relational words (e.g., father/son, friend), but it is unclear whether they learn the logical semantics...
RAL2M: Retrieval Augmented Learning-To-Match Against Hallucination in Compliance-Guaranteed Service Systems : Abstract: Hallucination is a major concern in LLM-driven service systems, necessitating explicit knowledge grounding for compliance-guaranteed responses. In this paper, we introduce Retrieval-Augmente...
Beyond the Black Box: Theory and Mechanism of Large Language Models : Abstract: The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impac...
Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration : Abstract: Multilingual speech foundation models such as Whisper are trained on web-scale data, where data for each language consists of a myriad of regional varieties. However, different regional vari...
Transparent Semantic Change Detection with Dependency-Based Profiles : Abstract: Most modern computational approaches to lexical semantic change detection (LSC) rely on embedding-based distributional word representations with neural networks. Despite the strong performan...
Revisiting Data Compression with Language Modeling : Abstract: In this report, we investigate the potential use of large language models (LLM's) in the task of data compression. Previous works have demonstrated promising results in applying LLM's toward...
LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark : Abstract: The rapid expansion of context length in large language models (LLMs) has outpaced existing evaluation benchmarks. Current long-context benchmarks often trade off scalability and realism: sy...
Training Language Models with homotokens Leads to Delayed Overfitting : Abstract: Subword tokenization introduces a computational layer in language models where many distinct token sequences decode to the same surface form and preserve meaning, yet induce different intern...
To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs : Abstract: Socio-demographic prompting (SDP) - prompting Large Language Models (LLMs) using demographic proxies to generate culturally aligned outputs - often shows LLM responses as stereotypical and b...
TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents : Abstract: Long-horizon conversational agents have to manage ever-growing interaction histories that quickly exceed the finite context windows of large language models (LLMs). Existing memory framework...
The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture : Abstract: Cultural backgrounds shape individuals' perspectives and approaches to problem-solving. Since the emergence of GPT-1 in 2018, large language models (LLMs) have undergone rapid development. T...
Punctuation-aware Hybrid Trainable Sparse Attention for Large Language Models : Abstract: Attention serves as the fundamental mechanism for long-context modeling in large language models (LLMs), yet dense attention becomes structurally prohibitive for long sequences due to its qu...
MiMo-V2-Flash Technical Report : Abstract: We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash a...
EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce : Abstract: Large Language Model (LLM)-based agents are increasingly deployed in e-commerce applications to assist customer services in tasks such as product inquiries, recommendations, and order manage...
Window-based Membership Inference Attacks Against Fine-tuned Large Language Models : Abstract: Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data. This approach, however, dilutes the subtle...
SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation : Abstract: While Large Language Models (LLMs) excel at generalized reasoning, standard retrieval-augmented approaches fail to address the disconnected nature of long-term agentic memory. To bridge this...
Language Hierarchization Provides the Optimal Solution to Human Working Memory Limits : Abstract: Language is a uniquely human trait, conveying information efficiently by organizing word sequences in sentences into hierarchical structures. A central question persists: Why is human langua...
Mitigating Prompt-Induced Hallucinations in Large Language Models via Structured Reasoning : Abstract: To address hallucination issues in large language models (LLMs), this paper proposes a method for mitigating prompt-induced hallucinations. Building on a knowledge distillation chain-style m...
Adversarial Question Answering Robustness: A Multi-Level Error Analysis and Mitigation Study : Abstract: Question answering (QA) systems achieve impressive performance on standard benchmarks like SQuAD, but remain vulnerable to adversarial examples. This project investigates the adversarial rob...
Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI : Abstract: Sentiment analysis focuses on identifying the emotional polarity expressed in textual data, typically categorized as positive, negative, or neutral. Hate speech detection, on the other hand,...
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems : Abstract: Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tas...
Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration : Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers t...
Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search : Abstract: Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query budgets. These resource limitations make jailbr...
Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking : Abstract: Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems, yet existing evaluations focus predominantly on claim verification and overlook the broader fact-c...
When Do Tools and Planning Help LLMs Think? A Cost- and Latency-Aware Benchmark : Abstract: Modern large language models (LLMs) increasingly rely on inference-time planning and external tools to improve reasoning. We benchmark this behavior on two real-world settings: event-centric...
Improved Evidence Extraction for Document Inconsistency Detection with LLMs : Abstract: Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. However, research on LLM-bas...
Scalable Construction of a Lung Cancer Knowledge Base: Profiling Semantic Reasoning in LLMs : Abstract: The integration of Large Language Models (LLMs) into biomedical research offers new opportunities for domainspecific reasoning and knowledge representation. However, their performance depend...
FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions : Abstract: Over 3.5 million patents are filed annually, with drafting patent descriptions requiring deep technical and legal expertise. Transforming scientific papers into patent descriptions is partic...
Reconstructing Item Characteristic Curves using Fine-Tuned Large Language Models : Abstract: Traditional methods for determining assessment item parameters, such as difficulty and discrimination, rely heavily on expensive field testing to collect student performance data for Item Re...
DataParasite Enables Scalable and Repurposable Online Data Curation : Abstract: Many questions in computational social science rely on datasets assembled from heterogeneous online sources, a process that is often labor-intensive, costly, and difficult to reproduce. Rece...
Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency : Abstract: Large language models (LLMs) are increasingly used in applications requiring factual accuracy, yet their outputs often contain hallucinated responses. While fact-checking can mitigate these ...
LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference : Abstract: Autoregressive large language models (LLMs) are bottlenecked by sequential decoding, where each new token typically requires executing all transformer layers. Existing dynamic-depth and laye...
ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation : Abstract: Selecting a single high-quality output from multiple stochastic generations remains a fundamental challenge for large language models (LLMs), particularly in open-ended tasks where no canoni...
Losses that Cook: Topological Optimal Transport for Structured Recipe Generation : Abstract: Cooking recipes are complex procedures that require not only a fluent and factual text, but also accurate timing, temperature, and procedural coherence, as well as the correct composition of...
PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, including software development, education, and technical assistance. Among these, software deve...
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables : Abstract: Wearable devices such as AI glasses are transforming voice assistants into always-available, hands-free collaborators that integrate seamlessly with daily life, but they also introduce chall...
Re3: Learning to Balance Relevance & Recency for Temporal Information Retrieval : Abstract: Temporal Information Retrieval (TIR) is a critical yet unresolved task for modern search systems, retrieving documents that not only satisfy a query's information need but also adhere to its...
RobotDiffuse: Diffusion-Based Motion Planning for Redundant Manipulators with the ROP Obstacle Avoidance Dataset : Abstract: Redundant manipulators, with their higher Degrees of Freedom (DoFs), offer enhanced kinematic performance and versatility, making them suitable for applications like manufacturing, surgical ...
Limits to Predicting Online Speech Using Large Language Models : Abstract: Our paper studies the predictability of online speech -- that is, how well language models learn to model the distribution of user generated content on X (previously Twitter). We define pred...
Learning mirror maps in policy mirror descent : Abstract: Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms. These algorithms are derived through the...
At the Intersection of Deep Sequential Model Framework and State-space Model Framework: Study on Option Pricing : Abstract: Inference and forecast problems of the nonlinear dynamical system have arisen in a variety of contexts. Reservoir computing and deep sequential models, on the one hand, have demonstrated eff...
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving : Abstract: When serving a single base LLM with several different LoRA adapters simultaneously, the adapters cannot simply be merged with the base model's weights as the adapter swapping would create ov...
Learning Optimal Defender Strategies for CAGE-2 using a POMDP Model : Abstract: CAGE-2 is an accepted benchmark for learning and evaluating defender strategies against cyberattacks. It reflects a scenario where a defender agent protects an IT infrastructure against vari...
Training Set Reconstruction from Differentially Private Forests: How Effective is DP? : Abstract: Recent research has shown that structured machine learning models such as tree ensembles are vulnerable to privacy attacks targeting their training data. To mitigate these risks, differentia...
Communication Compression for Tensor Parallel LLM Inference : Abstract: Large Language Models (LLMs) have pushed the frontier of artificial intelligence but are comprised of hundreds of billions of parameters and operations. For faster inference latency, LLMs ar...
Conformal Prediction for Dose-Response Models with Continuous Treatments : Abstract: Understanding the dose-response relation between a continuous treatment and the outcome for an individual can greatly drive decision-making, particularly in areas like personalized drug dosi...
A Large-Scale Analysis on the Use of Arrival Time Prediction for Automated Shuttle Services in the Real World : Abstract: Urban mobility is on the cusp of transformation with the emergence of shared, connected, and cooperative automated vehicles. Yet, for them to be accepted by customers, trust in their punctua...
Time-Transformer: Integrating Local and Global Features for Better Time Series Generation (Extended Version) : Abstract: Generating time series data is a promising approach to address data deficiency problems. However, it is also challenging due to the complex temporal properties of time series data, including...
MAST: Model-Agnostic Sparsified Training : Abstract: We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulation...
Self-Supervised Learning from Noisy and Incomplete Data : Abstract: Many important problems in science and engineering involve inferring a signal from noisy and/or incomplete observations, where the observation process is known. Historically, this problem ha...
Shallow-circuit Supervised Learning on a Quantum Processor : Abstract: Quantum computing has long promised transformative advances in data analysis, yet practical quantum machine learning has remained elusive due to fundamental obstacles such as a steep quantum...
AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation : Abstract: Multimodal medical large language models have shown impressive progress in chest X-ray interpretation but continue to face challenges in spatial reasoning and anatomical understanding. Altho...
Can Embedding Similarity Predict Cross-Lingual Transfer? A Systematic Study on African Languages : Abstract: Cross-lingual transfer is essential for building NLP systems for low-resource African languages, but practitioners lack reliable methods for selecting source languages. We systematically eva...
Finite Memory Belief Approximation for Optimal Control in Partially Observable Markov Decision Processes : Abstract: We study finite memory belief approximation for partially observable (PO) stochastic optimal control (SOC) problems. While belief states are sufficient for SOC in partially observable Markov...
LeafLife: An Explainable Deep Learning Framework with Robustness for Grape Leaf Disease Recognition : Abstract: Plant disease diagnosis is essential to farmers' management choices because plant diseases frequently lower crop yield and product quality. For harvests to flourish and agricultural producti...
Gradient descent reliably finds depth- and gate-optimal circuits for generic unitaries : Abstract: When the gate set has continuous parameters, synthesizing a unitary operator as a quantum circuit is always possible using exact methods, but finding minimal circuits efficiently remains a c...
ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation : Abstract: Augmenting toxic language data in a controllable and class-specific manner is crucial for improving robustness in toxicity classification, yet remains challenging due to limited supervision ...
Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their black-box nature raises concerns about transparency and faithfulness. Input attribution...
Do LLMs Encode Functional Importance of Reasoning Tokens? : Abstract: Large language models solve complex tasks by generating long reasoning chains, achieving higher accuracy at the cost of increased computational cost and reduced ability to isolate functional...
Explainable Fuzzy GNNs for Leak Detection in Water Distribution Networks : Abstract: Timely leak detection in water distribution networks is critical for conserving resources and maintaining operational efficiency. Although Graph Neural Networks (GNNs) excel at capturing spa...
Temporal Graph Network: Hallucination Detection in Multi-Turn Conversation : Abstract: Hallucinations can be produced by conversational AI systems, particularly in multi-turn conversations where context changes and contradictions may eventually surface. By representing the ent...
Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage : Abstract: Large language models (LLMs) demonstrate strong capabilities across a wide range of complex tasks and are increasingly deployed at scale, placing significant demands on inference efficiency....
PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms : Abstract: A fundamental requirement for full autonomy is the ability to sustain accurate navigation in the absence of external data, such as GNSS signals or visual information. In these challenging en...
Flow Matching and Diffusion Models via PointNet for Generating Fluid Fields on Irregular Geometries : Abstract: We present two novel generative geometric deep learning frameworks, termed Flow Matching PointNet and Diffusion PointNet, for predicting fluid flow variables on irregular geometries by incor...
Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis : Abstract: While Large Language Models (LLMs) have shown strong performance on clinical text understanding, they struggle with longitudinal prediction tasks such as dementia prognosis, which require re...
Learning to Act Robustly with View-Invariant Latent Actions : Abstract: Vision-based robotic policies often struggle with even minor viewpoint changes, underscoring the need for view-invariant visual representations. This challenge becomes more pronounced in rea...
Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning : Abstract: Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost. Adaptive self-consistency methods mitigate this issue by adjusting th...
Low-Resource Heuristics for Bahnaric Optical Character Recognition Improvement : Abstract: Bahnar, a minority language spoken across Vietnam, Cambodia, and Laos, faces significant preservation challenges due to limited research and data availability. This study addresses the criti...
Image, Word and Thought: A More Challenging Language Task for the Iterated Learning Model : Abstract: The iterated learning model simulates the transmission of language from generation to generation in order to explore how the constraints imposed by language transmission facilitate the emerg...
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors : Abstract: Dense video captioning aims to interpret and describe all temporally localized events throughout an input video. Recent state-of-the-art methods leverage large language models (LLMs) to prov...
Enhanced 3D Gravity Inversion Using ResU-Net with Density Logging Constraints: A Dual-Phase Training Approach : Abstract: Gravity exploration has become an important geophysical method due to its low cost and high efficiency. With the rise of artificial intelligence, data-driven gravity inversion methods based ...
STIPP: Space-time in situ postprocessing over the French Alps using proper scoring rules : Abstract: We propose Space-time in situ postprocessing (STIPP), a machine learning model that generates spatio-temporally consistent weather forecasts for a network of station locations. Gridded forec...
HAL: Inducing Human-likeness in LLMs with Alignment : Abstract: Conversational human-likeness plays a central role in human-AI interaction, yet it has remained difficult to define, measure, and optimize. As a result, improvements in human-like behavior a...
COFFEE: COdesign Framework for Feature Enriched Embeddings in Ads-Ranking Systems : Abstract: Diverse and enriched data sources are essential for commercial ads-recommendation models to accurately assess user interest both before and after engagement with content. While extended user...
Fast Conformal Prediction using Conditional Interquantile Intervals : Abstract: We introduce Conformal Interquantile Regression (CIR), a conformal regression method that efficiently constructs near-minimal prediction intervals with guaranteed coverage. CIR leverages bla...
Which Deep Learner? A Systematic Evaluation of Advanced Deep Forecasting Models Accuracy and Efficiency for Network Traffic Prediction : Abstract: Network traffic prediction is essential for automating modern network management. It is a difficult time series forecasting (TSF) problem that has been addressed by Deep Learning (DL) models...
Adversarial Contrastive Learning for LLM Quantization Attacks : Abstract: Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precis...
Extracting books from production language models : Abstract: Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those mem...
Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays : Abstract: Long context may impose challenges for encoder-only language models in text processing, specifically for automated scoring of essays. This study trained several commonly used encoder-based l...
Statistical Inference for Fuzzy Clustering : Abstract: Clustering is a central tool in biomedical research for discovering heterogeneous patient subpopulations, where group boundaries are often diffuse rather than sharply separated. Traditional ...
Hierarchical temporal receptive windows and zero-shot timescale generalization in biologically constrained scale-invariant deep networks : Abstract: Human cognition integrates information across nested timescales. While the cortex exhibits hierarchical Temporal Receptive Windows (TRWs), local circuits often display heterogeneous time con...
SWaRL: Safeguard Code Watermarking via Reinforcement Learning : Abstract: We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by embedding unique and verifiable signatures in th...
Compressed code: the hidden effects of quantization and distillation on programming tokens : Abstract: Large Language Models (LLMs) have demonstrated exceptional code generation capabilities, yet their token-level mechanisms remain underexplored, particularly in compressed models. Through sys...
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data : Abstract: Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks...
Variational (Energy-Based) Spectral Learning: A Machine Learning Framework for Solving Partial Differential Equations : Abstract: We introduce variational spectral learning (VSL), a machine learning framework for solving partial differential equations (PDEs) that operates directly in the coefficient space of spectral e...
A Spatio-Temporal Deep Learning Approach For High-Resolution Gridded Monsoon Prediction : Abstract: The Indian Summer Monsoon (ISM) is a critical climate phenomenon, fundamentally impacting the agriculture, economy, and water security of over a billion people. Traditional long-range foreca...
VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses : Abstract: The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and privacy concerns related to voice cloning. Rece...
Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss : Abstract: Anomaly detection is crucial in industrial applications for identifying rare and unseen patterns to ensure system reliability. Traditional models, trained on a single class of normal data, s...
TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers : Abstract: Vision Transformers (ViTs) have demonstrated strong performance across a wide range of vision tasks, yet their substantial computational and memory demands hinder efficient deployment on res...
Deep Learning Superresolution for 7T Knee MR Imaging: Impact on Image Quality and Diagnostic Performance : Abstract: Background: Deep learning superresolution (SR) may enhance musculoskeletal MR image quality, but its diagnostic value in knee imaging at 7T is unclear. Objectives: To compare image quality a...
Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications : Abstract: Speech-based machine learning systems are sensitive to noise, complicating reliable deployment in emotion recognition and voice pathology detection. We evaluate the robustness of a hybrid qu...
NitroGen: An Open Foundation Model for Generalist Gaming Agents : Abstract: We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key...
SpikySpace: A Spiking State Space Model for Energy-Efficient Time Series Forecasting : Abstract: Time-series forecasting often operates under tight power and latency budgets in fields like traffic management, industrial condition monitoring, and on-device sensing. These applications fre...
Spiking Heterogeneous Graph Attention Networks : Abstract: Real-world graphs or networks are usually heterogeneous, involving multiple types of nodes and relationships. Heterogeneous graph neural networks (HGNNs) can effectively handle these diverse...
How to Discover Knowledge for FutureG: Contextual RAG and LLM Prompting for O-RAN : Abstract: We present a retrieval-augmented question answering framework for 5G/6G networks, where the Open Radio Access Network (O-RAN) has become central to disaggregated, virtualized, and AI-driven ...
Cross-Platform Digital Discourse Analysis of the Israel-Hamas Conflict: Sentiment, Topics, and Event Dynamics : Abstract: The Israeli-Palestinian conflict remains one of the most polarizing geopolitical issues, with the October 2023 escalation intensifying online debate. Social media platforms, particularly Tel...
FUSE : Failure-aware Usage of Subagent Evidence for MultiModal Search and Recommendation : Abstract: Multimodal creative assistants decompose user goals and route tasks to subagents for layout, styling, retrieval, and generation. Retrieval quality is pivotal, yet failures can arise at sever...
PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters : Abstract: Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with ...
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence : Abstract: Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? ...
Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion : Abstract: Machine unlearning in text-to-image diffusion models aims to remove targeted concepts while preserving overall utility. Prior diffusion unlearning methods typically rely on supervised weight...
Counterfactual Fairness with Graph Uncertainty : Abstract: Evaluating machine learning (ML) model bias is key to building trustworthy and robust ML systems. Counterfactual Fairness (CF) audits allow the measurement of bias of ML models with a causal...
Empowering Reliable Visual-Centric Instruction Following in MLLMs : Abstract: Evaluating the instruction-following (IF) capabilities of Multimodal Large Language Models (MLLMs) is essential for rigorously assessing how faithfully model outputs adhere to user-specified...
Sparse Knowledge Distillation: A Mathematical Framework for Probability-Domain Temperature Scaling and Multi-Stage Compression : Abstract: We develop a unified theoretical framework for sparse knowledge distillation based on probability-domain softening operators. While the equivalence $p^{1/T} \propto \mathrm{softmax}(z/T)$ is...
Decentralized Autoregressive Generation : Abstract: We present a theoretical analysis of decentralization of autoregressive generation. We define the Decentralized Discrete Flow Matching objective, by expressing probability generating velocit...
Predicting Time Pressure of Powered Two-Wheeler Riders for Proactive Safety Interventions : Abstract: Time pressure critically influences risky maneuvers and crash proneness among powered two-wheeler riders, yet its prediction remains underexplored in intelligent transportation systems. We p...
Dynamic Hyperparameter Importance for Efficient Multi-Objective Optimization : Abstract: Choosing a suitable ML model is a complex task that can depend on several objectives, e.g., accuracy, model size, fairness, inference time, or energy consumption. In practice, this requires ...
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime : Abstract: Spectral bias, the tendency of neural networks to learn low frequencies first, can be both a blessing and a curse. While it enhances the generalization capabilities by suppressing high-frequ...
Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation : Abstract: Time series augmentation is critical for training robust deep learning models, particularly in domains where labelled data is scarce and expensive to obtain. However, existing augmentation l...
Prompt-Counterfactual Explanations for Generative AI System Behavior : Abstract: As generative AI systems become integrated into real-world applications, organizations increasingly need to be able to understand and interpret their behavior. In particular, decision-makers...
PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback : Abstract: Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve be...
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling : Abstract: The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing ...
Time-Aware Synthetic Control : Abstract: The synthetic control (SC) framework is widely used for observational causal inference with time-series panel data. SC has been successful in diverse applications, but existing methods typic...
From Muscle to Text with MyoText: sEMG to Text via Finger Classification and Transformer-Based Decoding : Abstract: Surface electromyography (sEMG) provides a direct neural interface for decoding muscle activity and offers a promising foundation for keyboard-free text input in wearable and mixed-reality s...
ATLAS: Adaptive Test-Time Latent Steering with External Verifiers for Enhancing LLMs Reasoning : Abstract: Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and effici...
Audit Me If You Can: Query-Efficient Active Fairness Auditing of Black-Box LLMs : Abstract: Large Language Models (LLMs) exhibit systematic biases across demographic groups. Auditing is proposed as an accountability tool for black-box LLM applications, but suffers from resource-int...
Real-Time Adaptive Anomaly Detection in Industrial IoT Environments : Abstract: To ensure reliability and service availability, next-generation networks are expected to rely on automated anomaly detection systems powered by advanced machine learning methods with the cap...
Joint Encoding of KV-Cache Blocks for Scalable LLM Serving : Abstract: Modern large language models (LLMs) drive interactive AI systems but are bottlenecked by the memory-heavy growth of key-value (KV) caches, which limits real-time throughput under concurrent ...
When the Coffee Feature Activates on Coffins: An Analysis of Feature Extraction and Steering for Mechanistic Interpretability : Abstract: Recent work by Anthropic on Mechanistic interpretability claims to understand and control Large Language Models by extracting human-interpretable features from their neural activation patter...
Causal Manifold Fairness: Enforcing Geometric Invariance in Representation Learning : Abstract: Fairness in machine learning is increasingly critical, yet standard approaches often treat data as static points in a high-dimensional space, ignoring the underlying generative structure. We...
In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior : Abstract: In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods either cannot improve beyond the training distribution...
Multi-Distribution Robust Conformal Prediction : Abstract: In many fairness and distribution robustness problems, one has access to labeled data from multiple source distributions yet the test data may come from an arbitrary member or a mixture of t...
From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures : Abstract: Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design--balancing syntactic reliability, performance, and structural n...
MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation : Abstract: Accurate Travel Time Estimation (TTE) is critical for ride-hailing platforms, where errors directly impact user experience and operational efficiency. While existing production systems excel...
ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis : Abstract: Recent advances in large language models (LLMs) have demonstrated transformative potential across diverse fields. While LLMs have been applied to molecular simplified molecular input line en...
Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control : Abstract: Controlling emergent behavioral personas (e.g., sycophancy, hallucination) in Large Language Models (LLMs) is critical for AI safety, yet remains a persistent challenge. Existing solutions f...
RPIQ: Residual-Projected Multi-Collaboration Closed-Loop and Single Instance Quantization for Visually Impaired Assistance : Abstract: Visually impaired users face significant challenges in daily information access and real-time environmental perception, and there is an urgent need for intelligent assistive systems with acc...
Domain Generalization for Time Series: Enhancing Drilling Regression Models for Stick-Slip Index Prediction : Abstract: This paper provides a comprehensive comparison of domain generalization techniques applied to time series data within a drilling context, focusing on the prediction of a continuous Stick-Sli...
Quantum-Enhanced Neural Contextual Bandit Algorithms : Abstract: Stochastic contextual bandits are fundamental for sequential decision-making but pose significant challenges for existing neural network-based algorithms, particularly when scaling to quantu...
Electricity Price Forecasting: Bridging Linear Models, Neural Networks and Online Learning : Abstract: Precise day-ahead forecasts for electricity prices are crucial to ensure efficient portfolio management, support strategic decision-making for power plant operations, enable efficient batter...
Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models : Abstract: CTMC/DTMC-based discrete generative models, including uniform-noise discrete diffusion (e.g., D3PM/CTDD) and discrete flow matching, enable non-autoregressive sequence generation by repeated...
RadioDiff-Flux: Efficient Radio Map Construction via Generative Denoise Diffusion Model Trajectory Midpoint Reuse : Abstract: Accurate radio map (RM) construction is essential to enabling environment-aware and adaptive wireless communication. However, in future 6G scenarios characterized by high-speed network entit...
Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies : Abstract: With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on rei...
Scalable Tree Ensemble Proximities in Python : Abstract: Tree ensemble methods such as Random Forests naturally induce supervised similarity measures through their decision tree structure, but existing implementations of proximities derived from t...
CRoPE: Efficient Parametrization of Rotary Positional Embedding : Abstract: Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear alge...
Scaling Laws of Machine Learning for Optimal Power Flow : Abstract: Optimal power flow (OPF) is one of the fundamental tasks for power system operations. While machine learning (ML) approaches such as deep neural networks (DNNs) have been widely studied to e...
Topology-Independent Robustness of the Weighted Mean under Label Poisoning Attacks in Heterogeneous Decentralized Learning : Abstract: Robustness to malicious attacks is crucial for practical decentralized signal processing and machine learning systems. A typical example of such attacks is label poisoning, meaning that some...
Uni-FinLLM: A Unified Multimodal Large Language Model with Modular Task Heads for Micro-Level Stock Prediction and Macro-Level Systemic Risk Assessment : Abstract: Financial institutions and regulators require systems that integrate heterogeneous data to assess risks from stock fluctuations to systemic vulnerabilities. Existing approaches often treat t...
MAFS: Multi-head Attention Feature Selection for High-Dimensional Data via Deep Fusion of Filter Methods : Abstract: Feature selection is essential for high-dimensional biomedical data, enabling stronger predictive performance, reduced computational cost, and improved interpretability in precision medicine...
When Prompting Meets Spiking: Graph Sparse Prompting via Spiking Graph Prompt Learning : Abstract: Graph Prompt Feature (GPF) learning has been widely used in adapting pre-trained GNN model on the downstream task. GPFs first introduce some prompt atoms and then learns the optimal prompt v...
Prioritized Replay for RL Post-training : Abstract: We introduce a problem-level prioritization framework for RL post-training of large language models. Building on insights from prioritized replay in deep RL, as well as prior observations th...
Credit Assignment via Neural Manifold Noise Correlation : Abstract: Credit assignment--how changes in individual neurons and synapses affect a network's output--is central to learning in brains and machines. Noise correlation, which estimates gradients by co...
Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth : Abstract: Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-...
Threat Detection in Social Media Networks Using Machine Learning Based Network Analysis : Abstract: The accelerated development of social media websites has posed intricate security issues in cyberspace, where these sites have increasingly become victims of criminal activities including at...
LendNova: Towards Automated Credit Risk Assessment with Language Models : Abstract: Credit risk assessment is essential in the financial sector, but has traditionally depended on costly feature-based models that often fail to utilize all available information in raw credit ...
CutisAI: Deep Learning Framework for Automated Dermatology and Cancer Screening : Abstract: The rapid growth of dermatological imaging and mobile diagnostic tools calls for systems that not only demonstrate empirical performance but also provide strong theoretical guarantees. Deep ...
Normalized Conditional Mutual Information Surrogate Loss for Deep Neural Classifiers : Abstract: In this paper, we propose a novel information theoretic surrogate loss; normalized conditional mutual information (NCMI); as a drop in alternative to the de facto cross-entropy (CE) for trai...
Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction : Abstract: We present Connection-Aware Motif Sequencing (CamS), a graph-to-sequence representation that enables decoder-only Transformers to learn molecular graphs via standard next-token prediction (N...
LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection : Abstract: Detecting anomalies in time series data is crucial for finance, healthcare, sensor networks, and industrial monitoring applications. However, time series anomaly detection often suffers from...
hdlib 2.0: Extending Machine Learning Capabilities of Vector-Symbolic Architectures : Abstract: Following the initial publication of hdlib, a Python library for designing Vector-Symbolic Architectures (VSA), we introduce a major extension that significantly enhances its machine learnin...
GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA : Abstract: Full fine-tuning of Large Language Models (LLMs) is computationally costly, motivating Continual Learning (CL) approaches that utilize parameter-efficient adapters. We revisit Gradient Episo...
Polynomial Convergence of Riemannian Diffusion Models : Abstract: Diffusion models have demonstrated remarkable empirical success in the recent years and are considered one of the state-of-the-art generative models in modern AI. These models consist of a f...
mHC-GNN: Manifold-Constrained Hyper-Connections for Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) suffer from over-smoothing in deep architectures and expressiveness bounded by the 1-Weisfeiler-Leman (1-WL) test. We adapt Manifold-Constrained Hyper-Connection...
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks : Abstract: We present WebGym, the largest-to-date open-source environment for training realistic visual web agents. Real websites are non-stationary and diverse, making artificial or small-scale task s...
Physical Transformer : Abstract: Digital AI systems spanning large language models, vision models, and generative architectures that operate primarily in symbolic, linguistic, or pixel domains. They have achieved striking p...

Research Sources: 344 | Generated: 1/7/2026