AI Research News Feeds for January 19th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

Controllable Video Generation: A Survey : Abstract: With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation...
ProSGNeRF: Progressive Dynamic Neural Scene Graph with Frequency Modulated Foundation Model in Urban Scenes : Abstract: Implicit neural representation has demonstrated promising results in 3D reconstruction on various scenes. However, existing approaches either struggle to model fast-moving objects or are inc...
VidLeaks: Membership Inference Attacks Against Text-to-Video Models : Abstract: The proliferation of powerful Text-to-Video (T2V) models, trained on massive web-scale datasets, raises urgent concerns about copyright and privacy violations. Membership inference attacks (...
Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals : Abstract: Understanding how neural activity gives rise to perception is a central challenge in neuroscience. We address the problem of decoding visual information from high-density intracortical recor...
Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset : Abstract: Recently, computer-aided diagnosis systems have been developed to support diagnosis, but their performance depends heavily on the quality and quantity of training data. However, in clinical ...
Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations : Abstract: Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructe...
Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment : Abstract: No-reference video quality assessment (NR-VQA) estimates perceptual quality without a reference video, which is often challenging. While recent techniques leverage saliency or transformer at...
KOCOBrain: Kuramoto-Guided Graph Network for Uncovering Structure-Function Coupling in Adolescent Prenatal Drug Exposure : Abstract: Exposure to psychoactive substances during pregnancy, such as cannabis, can disrupt neurodevelopment and alter large-scale brain networks, yet identifying their neural signatures remains cha...
Differentiating through binarized topology changes: Second-order subpixel-smoothed projection : Abstract: A key challenge in topology optimization (TopOpt) is that manufacturable structures, being inherently binary, are non-differentiable, creating a fundamental tension with gradient-based optim...
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation : Abstract: Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus ...
ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes : Abstract: Indoor environments evolve as objects move, appear, or disappear. Capturing these dynamics requires maintaining temporally consistent instance identities across intermittently captured 3D sc...
Generative Scenario Rollouts for End-to-End Autonomous Driving : Abstract: Vision-Language-Action (VLA) models are emerging as highly effective planning models for end-to-end autonomous driving systems. However, current works mostly rely on imitation learning from ...
SME-YOLO: A Real-Time Detector for Tiny Defect Detection on PCB Surfaces : Abstract: Surface defects on Printed Circuit Boards (PCBs) directly compromise product reliability and safety. However, achieving high-precision detection is challenging because PCB defects are typica...
SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction : Abstract: As autonomous driving moves toward full scene understanding, 3D semantic occupancy prediction has emerged as a crucial perception task, offering voxel-level semantics beyond traditional dete...
Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning : Abstract: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the mode...
Assessing Building Heat Resilience Using UAV and Street-View Imagery with Coupled Global Context Vision Transformer : Abstract: Climate change is intensifying human heat exposure, particularly in densely built urban centers of the Global South. Low-cost construction materials and high thermal-mass surfaces further ex...
Enhancing Vision Language Models with Logic Reasoning for Situational Awareness : Abstract: Vision-Language Models (VLMs) offer the ability to generate high-level, interpretable descriptions of complex activities from images and videos, making them valuable for situational awarenes...
Context-Aware Semantic Segmentation via Stage-Wise Attention : Abstract: Semantic ultra high resolution image (UHR) segmentation is essential in remote sensing applications such as aerial mapping and environmental monitoring. Transformer-based models struggle in ...
SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2 : Abstract: Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising...
Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping : Abstract: Effective disaster response relies on rapid disaster response, where oblique aerial video is the primary modality for initial scouting due to its ability to maximize spatial coverage and sit...
FTDMamba: Frequency-Assisted Temporal Dilation Mamba for Unmanned Aerial Vehicle Video Anomaly Detection : Abstract: Recent advances in video anomaly detection (VAD) mainly focus on ground-based surveillance or unmanned aerial vehicle (UAV) videos with static backgrounds, whereas research on UAV videos wit...
Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval : Abstract: Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offe...
Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification : Abstract: We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a singl...
Bio-inspired fine-tuning for selective transfer learning in image classification : Abstract: Deep learning has significantly advanced image analysis across diverse domains but often depends on large, annotated datasets for success. Transfer learning addresses this challenge by utili...
ATATA: One Algorithm to Align Them All : Abstract: We suggest a new multi-modal algorithm for joint inference of paired structurally aligned samples with Rectified Flow models. While some existing methods propose a codependent generation pro...
Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring : Abstract: The rapid evolution of satellite-borne Earth Observation (EO) systems has revolutionized terrestrial monitoring, yielding petabyte-scale archives. However, the immense computational and stor...
SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention : Abstract: Standard softmax self-attention excels in vision tasks but incurs quadratic complexity O(N^2), limiting high-resolution deployment. Linear attention reduces the cost to O(N), yet its compres...
Graph Smoothing for Enhanced Local Geometry Learning in Point Cloud Analysis : Abstract: Graph-based methods have proven to be effective in capturing relationships among points for 3D point cloud analysis. However, these methods often suffer from suboptimal graph structures, par...
CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation : Abstract: Character image animation is gaining significant importance across various domains, driven by the demand for robust and flexible multi-subject rendering. While existing methods excel in sing...
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models : Abstract: Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation. This gap highlights a critical limitation in ren...
M3DDM+: An improved video outpainting by a modified masking strategy : Abstract: M3DDM provides a computationally efficient framework for video outpainting via latent diffusion modeling. However, it exhibits significant quality degradation -- manifested as spatial blur a...
MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement : Abstract: Medical Vision-Language Models (MedVLMs) excel at perception tasks but struggle with complex clinical reasoning required in real-world scenarios. While reinforcement learning (RL) has been e...
Classification of Chest XRay Diseases through image processing and analysis techniques : Abstract: Multi-Classification Chest X-Ray Images are one of the most prevalent forms of radiological examination used for diagnosing thoracic diseases. In this study, we offer a concise overview of s...
FrankenMotion: Part-level Human Motion Generation and Composition : Abstract: Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to ...
Effects of Different Attention Mechanisms Applied on 3D Models in Video Classification : Abstract: Human action recognition has become an important research focus in computer vision due to the wide range of applications where it is used. 3D Resnet-based CNN models, particularly MC3, R3D, ...
One Model, Many Behaviors: Training-Induced Effects on Out-of-Distribution Detection : Abstract: Out-of-distribution (OOD) detection is crucial for deploying robust and reliable machine-learning systems in open-world settings. Despite steady advances in OOD detectors, their interplay wi...
A Unified 3D Object Perception Framework for Real-Time Outside-In Multi-Camera Systems : Abstract: Accurate 3D object perception and multi-target multi-camera (MTMC) tracking are fundamental for the digital transformation of industrial infrastructure. However, transitioning "inside-out" a...
ICONIC-444: A 3.1-Million-Image Dataset for OOD Detection Research : Abstract: Current progress in out-of-distribution (OOD) detection is limited by the lack of large, high-quality datasets with clearly defined OOD categories across varying difficulty levels (near- to ...
Future Optical Flow Prediction Improves Robot Control & Video Generation : Abstract: Future motion representations, such as optical flow, offer immense value for control and generative tasks. However, forecasting generalizable spatially dense motion representations remains a...
Isotropy-Optimized Contrastive Learning for Semantic Course Recommendation : Abstract: This paper presents a semantic course recommendation system for students using a self-supervised contrastive learning approach built upon BERT (Bidirectional Encoder Representations from Tra...
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning : Abstract: Recent end-to-end spoken dialogue systems leverage speech tokenizers and neural audio codecs to enable LLMs to operate directly on discrete speech representations. However, these models ofte...
SonicBench: Dissecting the Physical Perception Bottleneck in Large Audio Language Models : Abstract: Large Audio Language Models (LALMs) excel at semantic and paralinguistic tasks, yet their ability to perceive the fundamental physical attributes of audio such as pitch, loudness, and spatia...
AJAR: Adaptive Jailbreak Architecture for Red-teaming : Abstract: As Large Language Models (LLMs) evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action securit...
How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers : Abstract: Frontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the to...
CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation : Abstract: In the generative AI era, where even critical medical tasks are increasingly automated, radiology report generation (RRG) continues to rely on suboptimal metrics for quality assessment. Deve...
Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for enhancing large language models' question-answering capabilities through the integration of external knowledge. Ho...
The unreasonable effectiveness of pattern matching : Abstract: We report on an astonishing ability of large language models (LLMs) to make sense of "Jabberwocky" language in which most or all content words have been randomly replaced by nonsense strings...
Reward Modeling for Scientific Writing Evaluation : Abstract: Scientific writing is an expert-domain task that demands deep domain knowledge, task-specific requirements and reasoning capabilities that leverage the domain knowledge to satisfy the task s...
Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models : Abstract: Chain-of-Thought reasoning has significantly enhanced the problem-solving capabilities of Large Language Models. Unfortunately, current models generate reasoning steps sequentially without f...
Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming : Abstract: Large Language Models (LLMs) increasingly succeed on competitive programming problems, yet existing evaluations conflate algorithmic reasoning with code-level implementation. We argue that c...
F-Actor: Controllable Conversational Behaviour in Full-Duplex Models : Abstract: Spoken conversational systems require more than accurate speech generation to have human-like conversations: to feel natural and engaging, they must produce conversational behaviour that ada...
Membership Inference on LLMs in the Wild : Abstract: Membership Inference Attacks (MIAs) act as a crucial auditing tool for the opaque training data of Large Language Models (LLMs). However, existing techniques predominantly rely on inaccessib...
One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking : Abstract: Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary model...
Language of Thought Shapes Output Diversity in Large Language Models : Abstract: Output diversity is crucial for Large Language Models as it underpins pluralism and creativity. In this work, we reveal that controlling the language used during model thinking-the language ...
MultiCaption: Detecting disinformation using multilingual visual claims : Abstract: Online disinformation poses an escalating threat to society, driven increasingly by the rapid spread of misleading content across both multimedia and multilingual platforms. While automated ...
T$^\star$: Progressive Block Scaling for MDM Through Trajectory Aware RL : Abstract: We present T$^\star$, a simple \textsc{TraceRL}-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small...
DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction : Abstract: Document-Level Relation Extraction (DocRE) presents significant challenges due to its reliance on cross-sentence context and the long-tail distribution of relation types, where many relation...
The Growing Gains and Pains of Iterative Web Corpora Crawling: Insights from South Slavic CLASSLA-web 2.0 Corpora : Abstract: Crawling national top-level domains has proven to be highly effective for collecting texts in less-resourced languages. This approach has been recently used for South Slavic languages and re...
Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments : Abstract: Large Language Models (LLMs) can now solve entire exams directly from uploaded PDF assessments, raising urgent concerns about academic integrity and the reliability of grades and credentials...
Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data : Abstract: We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than e...
From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models : Abstract: Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the rol...
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems : Abstract: Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopte...
Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies : Abstract: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. W...
ZPD Detector: Data Selection via Capability-Difficulty Alignment for Large Language Models : Abstract: As the cost of training large language models continues to increase and high-quality training data become increasingly scarce, selecting high-value samples or synthesizing effective training...
Massively Multilingual Joint Segmentation and Glossing : Abstract: Automated interlinear gloss prediction with neural networks is a promising approach to accelerate language documentation efforts. However, while state-of-the-art models like GlossLM achieve ...
Neural Induction of Finite-State Transducers : Abstract: Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transd...
DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference : Abstract: LLMs are increasingly used as third-party judges, yet their reliability when evaluating speakers in dialogue remains poorly understood. We show that LLMs judge identical claims differently d...
EncodeRec: An Embedding Backbone for Recommendation Systems : Abstract: Recent recommender systems increasingly leverage embeddings from large pre-trained language models (PLMs). However, such embeddings exhibit two key limitations: (1) PLMs are not explicitly o...
A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents : Abstract: Style features such as friendly, helpful, or concise are widely used in prompts to steer the behavior of Large Language Model (LLM) conversational agents, yet their unintended side effects r...
BYOL: Bring Your Own Language Into LLMs : Abstract: Large Language Models (LLMs) exhibit strong multilingual capabilities, yet remain fundamentally constrained by the severe imbalance in global language resources. While over 7,000 languages a...
Conditional Distribution Compression via the Kernel Conditional Mean Embedding : Abstract: Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribu...
A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations : Abstract: We propose a scalable preconditioned primal-dual hybrid gradient algorithm for solving partial differential equations (PDEs). We multiply the PDE with a dual test function to obtain an inf-s...
High-Dimensional Tail Index Regression : Abstract: Motivated by the empirical observation of power-law distributions in the credits (e.g., ``likes'') of viral posts in social media, we introduce a high-dimensional tail index regression model...
Detecting Toxic Flow : Abstract: This paper develops a framework to predict toxic trades that a broker receives from her clients. Toxic trades are predicted with a novel online learning Bayesian method which we call the pro...
UCB-type Algorithm for Budget-Constrained Expert Learning : Abstract: In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments...
ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs : Abstract: Robust model-editing techniques are essential for deploying large language models (LLMs) in practical applications, as they enable cost-effective ways to deal with challenges such as privacy...
ShapeR: Robust Conditional 3D Shape Generation from Casual Captures : Abstract: Recent advances in 3D shape generation have achieved impressive results, but most existing methods rely on clean, unoccluded, and well-segmented inputs. Such conditions are rarely met in rea...
On the Probability of First Success in Differential Evolution: Hazard Identities and Tail Bounds : Abstract: We study first-hitting times in Differential Evolution (DE) through a conditional hazard frame work. Instead of analyzing convergence via Markov-chain transition kernels or drift arguments, ...
A Probabilistic Approach to Trajectory-Based Optimal Experimental Design : Abstract: We present a novel probabilistic approach for optimal path experimental design. In this approach a discrete path optimization problem is defined on a static navigation mesh, and trajectories...
Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations : Abstract: Learning structured task representations from human demonstrations is essential for understanding long-horizon manipulation behaviors, particularly in bimanual settings where action ordering...
IMS: Intelligent Hardware Monitoring System for Secure SoCs : Abstract: In the modern Systems-on-Chip (SoC), the Advanced eXtensible Interface (AXI) protocol exhibits security vulnerabilities, enabling partial or complete denial-of-service (DoS) through protocol...
Near-Optimal Decentralized Stochastic Nonconvex Optimization with Heavy-Tailed Noise : Abstract: This paper studies decentralized stochastic nonconvex optimization problem over row-stochastic networks. We consider the heavy-tailed gradient noise which is empirically observed in many pop...
PubMed-OCR: PMC Open Access OCR Annotations : Abstract: PubMed-OCR is an OCR-centric corpus of scientific articles derived from PubMed Central Open Access PDFs. Each page image is annotated with Google Cloud Vision and released in a compact JSON ...
Statistical Robustness of Interval CVaR Based Regression Models under Perturbation and Contamination : Abstract: Robustness under perturbation and contamination is a prominent issue in statistical learning. We address the robust nonlinear regression based on the so-called interval conditional value-at-...
Zero-Shot Detection of Elastic Transient Morphology Across Physical Systems : Abstract: We test whether a representation learned from interferometric strain transients in gravitational-wave observatories can act as a frozen morphology-sensitive operator for unseen sensors, prov...
New Adaptive Mechanism for Large Neighborhood Search using Dual Actor-Critic : Abstract: Adaptive Large Neighborhood Search (ALNS) is a widely used heuristic method for solving combinatorial optimization problems. ALNS explores the solution space by iteratively using destroy and...
Beer-Lambert Autoencoder for Unsupervised Stain Representation Learning and Deconvolution in Multi-immunohistochemical Brightfield Histology Images : Abstract: Separating the contributions of individual chromogenic stains in RGB histology whole slide images (WSIs) is essential for stain normalization, quantitative assessment of marker expression, a...
Information Theoretic Perspective on Representation Learning : Abstract: An information-theoretic framework is introduced to analyze last-layer embedding, focusing on learned representations for regression tasks. We define representation-rate and derive limits on...
Scalable Music Cover Retrieval Using Lyrics-Aligned Audio Embeddings : Abstract: Music Cover Retrieval, also known as Version Identification, aims to recognize distinct renditions of the same underlying musical work, a task central to catalog management, copyright enforc...
Effects of Introducing Synaptic Scaling on Spiking Neural Network Learning : Abstract: Spiking neural networks (SNNs) employing unsupervised learning methods inspired by neural plasticity are expected to be a new framework for artificial intelligence. In this study, we investi...
Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering : Abstract: Retrieval-Augmented Generation (RAG) has demonstrated significant effectiveness in enhancing large language models (LLMs) for complex multi-hop question answering (QA). For multi-hop QA task...
How DDAIR you? Disambiguated Data Augmentation for Intent Recognition : Abstract: Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with reg...
Model-free policy gradient for discrete-time mean-field control : Abstract: We study model-free policy learning for discrete-time mean-field control (MFC) problems with finite state space and compact action space. In contrast to the extensive literature on value-bas...
Comprehensive Robust Dynamic Mode Decomposition from Mode Extraction to Dimensional Reduction : Abstract: We propose Comprehensive Robust Dynamic Mode Decomposition (CR-DMD), a novel framework that robustifies the entire DMD process - from mode extraction to dimensional reduction - against mixed...
KANHedge: Efficient Hedging of High-Dimensional Options Using Kolmogorov-Arnold Network-Based BSDE Solver : Abstract: High-dimensional option pricing and hedging present significant challenges in quantitative finance, where traditional PDE-based methods struggle with the curse of dimensionality. The BSDE fr...
Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series : Abstract: In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The da...
CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs : Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities but often grapple with reliability challenges like hallucinations. While Knowledge Graphs (KGs) offer explici...
Exact Constraint Enforcement in Physics-Informed Extreme Learning Machines using Null-Space Projection Framework : Abstract: Physics-informed extreme learning machines (PIELMs) typically impose boundary and initial conditions through penalty terms, yielding only approximate satisfaction that is sensitive to user-s...
Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection : Abstract: Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regim...
Depression Detection Based on Electroencephalography Using a Hybrid Deep Neural Network CNN-GRU and MRMR Feature Selection : Abstract: This study investigates the detection and classification of depressive and non-depressive states using deep learning approaches. Depression is a prevalent mental health disorder that substan...
A PAC-Bayesian Analysis of Channel-Induced Degradation in Edge Inference : Abstract: In the emerging paradigm of edge inference, neural networks (NNs) are partitioned across distributed edge devices that collaboratively perform inference via wireless transmission. However, s...
Learning collision operators from plasma phase space data using differentiable simulators : Abstract: We propose a methodology to infer collision operators from phase space data of plasma dynamics. Our approach combines a differentiable kinetic simulator, whose core component in this work is...
Reasoning Models Generate Societies of Thought : Abstract: Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable in...
LLMs for Game Theory: Entropy-Guided In-Context Learning and Adaptive CoT Reasoning : Abstract: We propose a novel LLM-based framework for reasoning in discrete, game-theoretic tasks, illustrated with \emph{Tic-Tac-Toe}. The method integrates in-context learning with entropy-guided cha...
Physically constrained unfolded multi-dimensional OMP for large MIMO systems : Abstract: Sparse recovery methods are essential for channel estimation and localization in modern communication systems, but their reliability relies on accurate physical models, which are rarely perf...
Mass Distribution versus Density Distribution in the Context of Clustering : Abstract: This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto des...
Sensor Placement for Urban Traffic Interpolation: A Data-Driven Evaluation to Inform Policy : Abstract: Data on citywide street-segment traffic volumes are essential for urban planning and sustainable mobility management. Yet such data are available only for a limited subset of streets due to ...
UBiGTLoc: A Unified BiLSTM-Graph Transformer Localization Framework for IoT Sensor Networks : Abstract: Sensor nodes localization in wireless Internet of Things (IoT) sensor networks is crucial for the effective operation of diverse applications, such as smart cities and smart agriculture. Exi...
SSC-UNet: UNet with Self-Supervised Contrastive Learning for Phonocardiography Noise Reduction : Abstract: Congenital Heart Disease (CHD) remains a significant global health concern affecting approximately 1\% of births worldwide. Phonocardiography has emerged as a supplementary tool to diagnose ...
QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid : Abstract: Smart grid infrastructures have revolutionized energy distribution, but their day-to-day operations require robust anomaly detection methods to counter risks associated with cyber-physical t...
Extractive summarization on a CMOS Ising machine : Abstract: Extractive summarization (ES) aims to generate a concise summary by selecting a subset of sentences from a document while maximizing relevance and minimizing redundancy. Although modern ES s...
Low-Rank Key Value Attention : Abstract: Transformer pretraining is increasingly constrained by memory and compute requirements, with the key-value (KV) cache emerging as a dominant bottleneck during training and autoregressive dec...
When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models : Abstract: Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its appli...
Inter-patient ECG Arrhythmia Classification with LGNs and LUTNs : Abstract: Deep Differentiable Logic Gate Networks (LGNs) and Lookup Table Networks (LUTNs) are demonstrated to be suitable for the automatic classification of electrocardiograms (ECGs) using the inter...
Forcing and Diagnosing Failure Modes of Fourier Neural Operators Across Diverse PDE Families : Abstract: Fourier Neural Operators (FNOs) have shown strong performance in learning solution maps of partial differential equations (PDEs), but their robustness under distribution shifts, long-horizon...
Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning : Abstract: Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision process...
Latent Space Inference via Paired Autoencoders : Abstract: This work describes a novel data-driven latent space inference framework built on paired autoencoders to handle observational inconsistencies when solving inverse problems. Our approach uses...
Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency : Abstract: Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporati...
Unlocking the Potentials of Retrieval-Augmented Generation for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. However, the potential of Retrieval-Augmented Generation (RAG), whic...
FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning : Abstract: Tabular data high-stakes critical decision-making in domains such as finance, healthcare, and scientific discovery. Yet, learning effectively from tabular data in few-shot settings, where la...
Metabolomic Biomarker Discovery for ADHD Diagnosis Using Interpretable Machine Learning : Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder with limited objective diagnostic tools, highlighting the urgent need for objective, biology-based ...
Sample-Near-Optimal Agnostic Boosting with Improved Running Time : Abstract: Boosting is a powerful method that turns weak learners, which perform only slightly better than random guessing, into strong learners with high accuracy. While boosting is well understood in...
Latent Dynamics Graph Convolutional Networks for model order reduction of parameterized time-dependent PDEs : Abstract: Graph Neural Networks (GNNs) are emerging as powerful tools for nonlinear Model Order Reduction (MOR) of time-dependent parameterized Partial Differential Equations (PDEs). However, existing...
Operator learning on domain boundary through combining fundamental solution-based artificial data and boundary integral techniques : Abstract: For linear partial differential equations with known fundamental solutions, this work introduces a novel operator learning framework that relies exclusively on domain boundary data, includin...
TimeMar: Multi-Scale Autoregressive Modeling for Unconditional Time Series Generation : Abstract: Generative modeling offers a promising solution to data scarcity and privacy challenges in time series analysis. However, the structural complexity of time series, characterized by multi-sca...
LSTM VS. Feed-Forward Autoencoders for Unsupervised Fault Detection in Hydraulic Pumps : Abstract: Unplanned failures in industrial hydraulic pumps can halt production and incur substantial costs. We explore two unsupervised autoencoder (AE) schemes for early fault detection: a feed-forwa...
GMM-COMET: Continual Source-Free Universal Domain Adaptation via a Mean Teacher and Gaussian Mixture Model-Based Pseudo-Labeling : Abstract: Unsupervised domain adaptation tackles the problem that domain shifts between training and test data impair the performance of neural networks in many real-world applications. Thereby, in re...
Theoretically and Practically Efficient Resistance Distance Computation on Large Graphs : Abstract: The computation of resistance distance is pivotal in a wide range of graph analysis applications, including graph clustering, link prediction, and graph neural networks. Despite its foundati...
Assesing the Viability of Unsupervised Learning with Autoencoders for Predictive Maintenance in Helicopter Engines : Abstract: Unplanned engine failures in helicopters can lead to severe operational disruptions, safety hazards, and costly repairs. To mitigate these risks, this study compares two predictive maintenan...
FSL-BDP: Federated Survival Learning with Bayesian Differential Privacy for Credit Risk Modeling : Abstract: Credit risk models are a critical decision-support tool for financial institutions, yet tightening data-protection rules (e.g., GDPR, CCPA) increasingly prohibit cross-border sharing of borr...
Shape-morphing programming of soft materials on complex geometries via neural operator : Abstract: Shape-morphing soft materials can enable diverse target morphologies through voxel-level material distribution design, offering significant potential for various applications. Despite progre...
Optimized Algorithms for Text Clustering with LLM-Generated Constraints : Abstract: Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have inc...
Differentially Private Subspace Fine-Tuning for Large Language Models : Abstract: Fine-tuning large language models on downstream tasks is crucial for realizing their cross-domain potential but often relies on sensitive data, raising privacy concerns. Differential privacy...
Soft Bayesian Context Tree Models for Real-Valued Time Series : Abstract: This paper proposes the soft Bayesian context tree model (Soft-BCT), which is a novel BCT model for real-valued time series. The Soft-BCT considers soft (probabilistic) splits of the context...
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spur...
OpFML: Pipeline for ML-based Operational Forecasting : Abstract: Machine learning is finding its application in a multitude of areas in science and research, and Climate and Earth Sciences is no exception to this trend. Operational forecasting systems bas...
Self-Augmented Mixture-of-Experts for QoS Prediction : Abstract: Quality of Service (QoS) prediction is one of the most fundamental problems in service computing and personalized recommendation. In the problem, there is a set of users and services, each a...
AVP-Pro: An Adaptive Multi-Modal Fusion and Contrastive Learning Approach for Comprehensive Two-Stage Antiviral Peptide Identification : Abstract: The accurate identification of antiviral peptides (AVPs) is crucial for novel drug development. However, existing methods still have limitations in capturing complex sequence dependencies an...
Matching High-Dimensional Geometric Quantiles for Test-Time Adaptation of Transformers and Convolutional Networks Alike : Abstract: Test-time adaptation (TTA) refers to adapting a classifier for the test data when the probability distribution of the test data slightly differs from that of the training data of the model. ...
Backdoor Attacks on Multi-modal Contrastive Learning : Abstract: Contrastive learning has become a leading self- supervised approach to representation learning across domains, including vision, multimodal settings, graphs, and federated learning. However,...
Constant Metric Scaling in Riemannian Computation : Abstract: Constant rescaling of a Riemannian metric appears in many computational settings, often through a global scale parameter that is introduced either explicitly or implicitly. Although this ope...
Reasoning Distillation for Lightweight Automated Program Repair : Abstract: We study whether lightweight symbolic reasoning supervision can improve fix type classification in compact automated program repair models. Small code models are attractive for resource-cons...
Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration : Abstract: Restoring critical loads after extreme events demands adaptive control to maintain distribution-grid resilience, yet uncertainty in renewable generation, limited dispatchable resources, and ...
Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent : Abstract: Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD le...
Multivariate LSTM-Based Forecasting for Renewable Energy: Enhancing Climate Change Mitigation : Abstract: The increasing integration of renewable energy sources (RESs) into modern power systems presents significant opportunities but also notable challenges, primarily due to the inherent variabil...
HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training : Abstract: Split learning (SL) enables collaborative training of large language models (LLMs) between resource-constrained edge devices and compute-rich servers by partitioning model computation across...
FAConvLSTM: Factorized-Attention ConvLSTM for Efficient Feature Extraction in Multivariate Climate Data : Abstract: Learning physically meaningful spatiotemporal representations from high-resolution multivariate Earth observation data is challenging due to strong local dynamics, long-range teleconnections...
Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation : Abstract: Sustainability is becoming increasingly critical in the maritime transport, encompassing both environmental and social impacts, such as Greenhouse Gas (GHG) emissions and navigational safety...
Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning : Abstract: Numerous offline and model-based reinforcement learning systems incorporate world models to emulate the inherent environments. A world model is particularly important in scenarios where dire...
Unit-Consistent (UC) Adjoint for GSD and Backprop in Deep Learning Applications : Abstract: Deep neural networks constructed from linear maps and positively homogeneous nonlinearities (e.g., ReLU) possess a fundamental gauge symmetry: the network function is invariant to node-wise ...
Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting : Abstract: Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event...
AI-Guided Human-In-the-Loop Inverse Design of High Performance Engineering Structures : Abstract: Inverse design tools such as Topology Optimization (TO) can achieve new levels of improvement for high-performance engineered structures. However, widespread use is hindered by high computat...
Mugi: Value Level Parallelism For Efficient LLMs : Abstract: Value level parallelism (VLP) has been proposed to improve the efficiency of large-batch, low-precision general matrix multiply (GEMM) between symmetric activations and weights. In transform...
Towards Tensor Network Models for Low-Latency Jet Tagging on FPGAs : Abstract: We present a systematic study of Tensor Network (TN) models $\unicode{x2013}$ Matrix Product States (MPS) and Tree Tensor Networks (TTN) $\unicode{x2013}$ for real-time jet tagging in high-e...
Analytic Bijections for Smooth and Interpretable Normalizing Flows : Abstract: A key challenge in designing normalizing flows is finding expressive scalar bijections that remain invertible with tractable Jacobians. Existing approaches face trade-offs: affine transforma...
Vendor-Aware Industrial Agents: RAG-Enhanced LLMs for Secure On-Premise PLC Code Generation : Abstract: Programmable Logic Controllers are operated by proprietary code dialects; this makes it challenging to train coding assistants. Current LLMs are trained on large code datasets and are capabl...
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training : Abstract: Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with...
Policy alone is probably not the solution: A large-scale experiment on how developers struggle to design meaningful end-user explanations : Abstract: Developers play a central role in determining how machine learning systems are explained in practice, yet they are rarely trained to design explanations for non-technical audiences. Despite ...
Balanced Edge Pruning for Graph Anomaly Detection with Noisy Labels : Abstract: Graph anomaly detection (GAD) is widely applied in many areas, such as financial fraud detection and social spammer detection. Anomalous nodes in the graph not only impact their own communit...
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning : Abstract: Offline reinforcement learning (RL) provides a promising solution to learning an agent fully relying on a data-driven paradigm. However, constrained by the limited quality of the offline dat...
Utilizing Class Separation Distance for the Evaluation of Corruption Robustness of Machine Learning Classifiers : Abstract: Robustness is a fundamental pillar of Machine Learning (ML) classifiers, substantially determining their reliability. Methods for assessing classifier robustness are therefore essential. In ...
Theorem Prover as a Judge for Synthetic Data Generation : Abstract: The demand for synthetic data in mathematical reasoning has increased due to its potential to enhance the mathematical capabilities of large language models (LLMs). However, ensuring the val...
Feature Propagation on Knowledge Graphs using Cellular Sheaves : Abstract: Many inference tasks on knowledge graphs, including relation prediction, operate on knowledge graph embeddings -- vector representations of the vertices (entities) and edges (relations) that...
Do explanations generalize across large reasoning models? : Abstract: Large reasoning models (LRMs) produce a textual chain of thought (CoT) in the process of solving a problem, which serves as a potentially powerful tool to understand the problem by surfacing...
Building Production-Ready Probes For Gemini : Abstract: Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful systems. Prior work has shown that activation ...
MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management : Abstract: Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substanti...
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents : Abstract: The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available tec...
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models : Abstract: As vision-language models (VLMs) tackle increasingly complex and multimodal tasks, the rapid growth of Key-Value (KV) cache imposes significant memory and computational bottlenecks during in...
Interactive Narrative Analytics: Bridging Computational Narrative Extraction and Human Sensemaking : Abstract: Information overload and misinformation create significant challenges in extracting meaningful narratives from large news collections. This paper defines the nascent field of Interactive Nar...
PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs : Abstract: Large-scale livestock operations pose significant risks to human health and the environment, while also being vulnerable to threats such as infectious diseases and extreme weather events. As...
Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps : Abstract: We propose Map2Thought, a framework that enables explicit and interpretable spatial reasoning for 3D VLMs. The framework is grounded in two key components: Metric Cognitive Map (Metric-CogMa...
Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models : Abstract: Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate th...
GenDA: Generative Data Assimilation on Complex Urban Areas via Classifier-Free Diffusion Guidance : Abstract: Urban wind flow reconstruction is essential for assessing air quality, heat dispersion, and pedestrian comfort, yet remains challenging when only sparse sensor data are available. We propose...
Relational Linearity is a Predictor of Hallucinations : Abstract: Hallucination is a central failure mode in large language models (LLMs). We focus on hallucinations of answers to questions like: "Which instrument did Glenn Gould play?", but we ask these q...
The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents : Abstract: Recently, with the rapid development of robot learning and imitation learning, numerous datasets and methods have emerged. However, these datasets and their task designs often lack systemati...
Topology-Guaranteed Image Segmentation: Enforcing Connectivity, Genus, and Width Constraints : Abstract: Existing research highlights the crucial role of topological priors in image segmentation, particularly in preserving essential structures such as connectivity and genus. Accurately capturin...
Wetland mapping from sparse annotations with satellite image time series and temporal-aware segment anything model : Abstract: Accurate wetland mapping is essential for ecosystem monitoring, yet dense pixel-level annotation is prohibitively expensive and practical applications usually rely on sparse point labels, un...
Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences : Abstract: General-purpose Large Language Models (LLMs) show significant potential in recruitment applications, where decisions require reasoning over unstructured text, balancing multiple criteria, an...
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs : Abstract: Multi-agent LLM ensembles can converge on coordinated, socially harmful equilibria. This paper advances an experimental framework for evaluating Institutional AI, our system-level approach t...
Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding : Abstract: Recent progress in multi-modal large language models (MLLMs) has significantly advanced video understanding. However, their performance on long-form videos remains limited by computational c...
FEATHer: Fourier-Efficient Adaptive Temporal Hierarchy Forecaster for Time-Series Forecasting : Abstract: Time-series forecasting is fundamental in industrial domains like manufacturing and smart factories. As systems evolve toward automation, models must operate on edge devices (e.g., PLCs, mic...
How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting : Abstract: Large language models (LLMs) show promise in drafting responses to patient portal messages, yet their integration into clinical workflows raises various concerns, including whether they woul...
From SERPs to Sound: How Search Engine Result Pages and AI-generated Podcasts Interact to Influence User Attitudes on Controversial Topics : Abstract: Compared to search engine result pages (SERPs), AI-generated podcasts represent a relatively new and relatively more passive modality of information consumption, delivering narratives in a n...
X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning : Abstract: Visuomotor policies often leverage large pre-trained Vision Transformers (ViTs) for their powerful generalization capabilities. However, their significant data requirements present a major c...
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation : Abstract: Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT...
FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models : Abstract: Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses. A promising approach to rectify these flaws is correcting L...
SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients : Abstract: Federated learning (FL) for large language models (LLMs) has attracted increasing attention as a way to enable privacy-preserving adaptation over distributed data. Parameter-efficient method...
LoRA as Oracle : Abstract: Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detectio...
Epistemic Control and the Normativity of Machine Learning-Based Science : Abstract: The past few years have witnessed an increasing use of machine learning (ML) systems in science. Paul Humphreys has argued that, because of specific characteristics of ML systems, human scie...
FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization : Abstract: Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativenes...
SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained thr...
Artificial Intelligence and the US Economy: An Accounting Perspective on Investment and Production : Abstract: Artificial intelligence (AI) has moved to the center of policy, market, and academic debates, but its macroeconomic footprint is still only partly understood. This paper provides an overview...
Clustering High-dimensional Data: Balancing Abstraction and Representation Tutorial at AAAI 2026 : Abstract: How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous deta...
Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation : Abstract: Multimedia recommendation systems leverage user-item interactions and multimodal information to capture user preferences, enabling more accurate and personalized recommendations. Despite not...
Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) frameworks face a trade-off between the comprehensiveness of global search and the efficiency of local search. Existing methods are ofte...
Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model : Abstract: The simulation-to-reality (sim-to-real) transfer of large-scale hydraulic robots presents a significant challenge in robotics because of the inherent slow control response and complex fluid ...
Context-aware Graph Causality Inference for Few-Shot Molecular Property Prediction : Abstract: Molecular property prediction is becoming one of the major applications of graph learning in Web-based services, e.g., online protein structure prediction and drug discovery. A key challenge...
Learn Before Represent: Bridging Generative and Contrastive Learning for Domain-Specific LLM Embeddings : Abstract: Large Language Models (LLMs) adapted via contrastive learning excel in general representation learning but struggle in vertical domains like chemistry and law, primarily due to a lack of dom...
Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning : Abstract: Vision-as-inverse-graphics, the concept of reconstructing an image as an editable graphics program is a long-standing goal of computer vision. Yet even strong VLMs aren't able to achieve thi...
Efficient Multilingual Name Type Classification Using Convolutional Networks : Abstract: We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-se...
Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments : Abstract: Marker-based landing is widely used in drone delivery and return-to-base systems for its simplicity and reliability. However, most approaches assume idealized landing site visibility and sen...
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development : Abstract: The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven pro...
A3D: Adaptive Affordance Assembly with Dual-Arm Manipulation : Abstract: Furniture assembly is a crucial yet challenging task for robots, requiring precise dual-arm coordination where one arm manipulates parts while the other provides collaborative support and st...
Bridging Cognitive Neuroscience and Graph Intelligence: Hippocampus-Inspired Multi-View Hypergraph Learning for Web Finance Fraud : Abstract: Online financial services constitute an essential component of contemporary web ecosystems, yet their openness introduces substantial exposure to fraud that harms vulnerable users and weaken...
Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage : Abstract: Fairness in automated decision-making has become a critical concern, particularly in high-pressure healthcare scenarios such as emergency triage, where fast and equitable decisions are essen...
H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning : Abstract: In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (...
Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings : Abstract: We examine whether large language models (LLMs) can predict biased decision-making in conversational settings, and whether their predictions capture not only human cognitive biases but also ...
Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse : Abstract: Sequential knowledge editing in large language models often causes catastrophic collapse of the model's general abilities, especially for parameter-modifying methods. Existing approaches mit...
Your One-Stop Solution for AI-Generated Video Detection : Abstract: Recent advances in generative modeling can create remarkably realistic synthetic videos, making it increasingly difficult for humans to distinguish them from real ones and necessitating reli...
IDDR-NGP: Incorporating Detectors for Distractor Removal with Instant Neural Radiance Field : Abstract: This paper presents the first unified distractor removal method, named IDDR-NGP, which directly operates on Instant-NPG. The method is able to remove a wide range of distractors in 3D scenes...
Combating Spurious Correlations in Graph Interpretability via Self-Reflection : Abstract: Interpretable graph learning has recently emerged as a popular research topic in machine learning. The goal is to identify the important nodes and edges of an input graph that are crucial fo...
Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs : Abstract: Large Language Models (LLMs) frequently exhibit strong translation abilities, even without task-specific fine-tuning. However, the internal mechanisms governing this innate capability remain...
Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach : Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by develop...
When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs : Abstract: Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that...
Steering Language Models Before They Speak: Logit-Level Interventions : Abstract: Steering LLMs is essential for specialized applications such as style-sensitive text rewriting, user-adaptive communication, and toxicity mitigation. Current steering methods, such as prompt...
Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents : Abstract: The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or ...
Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions : Abstract: The simulation of realistic clinical interactions plays a pivotal role in advancing clinical Large Language Models (LLMs) and supporting medical diagnostic education. Existing approaches and...
PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis : Abstract: Traditionally, AI research in medical diagnosis has largely centered on image analysis. While this has led to notable advancements, the absence of patient-reported symptoms continues to hind...
Sparse Data Tree Canopy Segmentation: Fine-Tuning Leading Pretrained Models on Only 150 Images : Abstract: Tree canopy detection from aerial imagery is an important task for environmental monitoring, urban planning, and ecosystem analysis. Simulating real-life data annotation scarcity, the Solafu...
Selecting Language Models for Social Science: Start Small, Start Open, and Validate : Abstract: Currently, there are thousands of large pretrained language models (LLMs) available to social scientists. How do we select among them? Using validity, reliability, reproducibility, and repli...
RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions : Abstract: Robust Multi-Task Learning (MTL) is crucial for autonomous systems operating in real-world environments, where adverse weather conditions can severely degrade model performance and reliabili...
Self-learned representation-guided latent diffusion model for breast cancer classification in deep ultraviolet whole surface images : Abstract: Breast-Conserving Surgery (BCS) requires precise intraoperative margin assessment to preserve healthy tissue. Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) offers rapid, high-r...
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation : Abstract: Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicabili...
Can Vision-Language Models Understand Construction Workers? An Exploratory Study : Abstract: As robotics become increasingly integrated into construction workflows, their ability to interpret and respond to human behavior will be essential for enabling safe and effective collaborati...
Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets : Abstract: If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient a...
Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents : Abstract: Recent advances in code generation models have unlocked unprecedented opportunities for automating feature engineering, yet their adoption in real-world ML teams remains constrained by criti...
Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning -- Towards a Pure Neural Logic Core : Abstract: Large language models (LLMs) currently suffer from parameter entanglement, where general reasoning capabilities (logic) and specific factual knowledge (facts) exist in a superposition state ...
Unified Optimization of Source Weights and Transfer Quantities in Multi-Source Transfer Learning: An Asymptotic Framework : Abstract: Transfer learning plays a vital role in improving model performance in data-scarce scenarios. However, naive uniform transfer from multiple source tasks may result in negative transfer, high...
LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems : Abstract: Understanding large software systems is a challenging task, especially when code is distributed across multiple repositories and microservices. Developers often need to reason not only about...
Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers : Abstract: Traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented p...
AnyECG: Evolved ECG Foundation Model for Holistic Health Profiling : Abstract: Background: Artificial intelligence enabled electrocardiography (AI-ECG) has demonstrated the ability to detect diverse pathologies, but most existing models focus on single disease identifi...
Line-based Event Preprocessing: Towards Low-Energy Neuromorphic Computer Vision : Abstract: Neuromorphic vision made significant progress in recent years, thanks to the natural match between spiking neural networks and event data in terms of biological inspiration, energy savings, ...
Neuro-Symbolic Activation Discovery: Transferring Mathematical Structures from Physics to Ecology for Parameter-Efficient Neural Networks : Abstract: Modern neural networks rely on generic activation functions (ReLU, GELU, SiLU) that ignore the mathematical structure inherent in scientific data. We propose Neuro-Symbolic Activation Discov...
Millimeter-Wave Gesture Recognition in ISAC: Does Reducing Sensing Airtime Hamper Accuracy? : Abstract: Most Integrated Sensing and Communications (ISAC) systems require dividing airtime across their two modes. However, the specific impact of this decision on sensing performance remains unclea...
DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion : Abstract: Speech tokenizers serve as the cornerstone of discrete Speech Large Language Models (Speech LLMs). Existing tokenizers either prioritize semantic encoding, fuse semantic content with acousti...
EvidFuse: Writing-Time Evidence Learning for Consistent Text-Chart Data Reporting : Abstract: Data-driven reports communicate decision-relevant insights by tightly interleaving narrative text with charts grounded in underlying tables. However, current LLM-based systems typically gene...
Generative AI Purpose-built for Social and Mental Health: A Real-World Pilot : Abstract: Generative artificial intelligence (GAI) chatbots built for mental health could deliver safe, personalized, and scalable mental health support. We evaluate a foundation model designed for me...
BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics : Abstract: Competitive sports require sophisticated tactical analysis, yet combat disciplines like boxing remain underdeveloped in AI-driven analytics due to the complexity of action dynamics and the l...
Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning : Abstract: Ethiopia's Ministry of Health is upgrading health posts to improve access to essential services, particularly in rural areas. Limited resources, however, require careful prioritization of wh...
Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs : Abstract: Predictive Process Monitoring is a branch of process mining that aims to predict the outcome of an ongoing process. Recently, it leveraged machine-and-deep learning architectures. In this pa...
Hyperparameter Optimization of Constraint Programming Solvers : Abstract: The performance of constraint programming solvers is highly sensitive to the choice of their hyperparameters. Manually finding the best solver configuration is a difficult, time-consuming ta...
AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems : Abstract: Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks l...
XChoice: Explainable Evaluation of AI-Human Alignment in LLM-based Constrained Choice Decision Making : Abstract: We present XChoice, an explainable framework for evaluating AI-human alignment in constrained decision making. Moving beyond outcome agreement such as accuracy and F1 score, XChoice fits a m...
Beyond Model Scaling: Test-Time Intervention for Efficient Deep Reasoning : Abstract: Large Reasoning Models (LRMs) excel at multi-step reasoning but often suffer from inefficient reasoning processes like overthinking and overshoot, where excessive or misdirected reasoning in...
Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems : Abstract: This paper proposes a policy-based deep reinforcement learning hyper-heuristic framework for solving the Job Shop Scheduling Problem. The hyper-heuristic agent learns to switch scheduling ru...
TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech : Abstract: Social media platforms are increasingly dominated by long-form multimodal content, where harmful narratives are constructed through a complex interplay of audio, visual, and textual cues. Wh...
Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems : Abstract: Multi-Agent Systems (MAS) built on large language models typically solve complex tasks by coordinating multiple agents through workflows. Existing approaches generates workflows either at ta...
ReCreate: Reasoning and Creating Domain Agents Driven by Experience : Abstract: Large Language Model agents are reshaping the industrial landscape. However, most practical agents remain human-designed because tasks differ widely, making them labor-intensive to build. Th...
MiCA: A Mobility-Informed Causal Adapter for Lightweight Epidemic Forecasting : Abstract: Accurate forecasting of infectious disease dynamics is critical for public health planning and intervention. Human mobility plays a central role in shaping the spatial spread of epidemics, b...
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts : Abstract: Large Language Models (LLMs) based autonomous agents demonstrate multifaceted capabilities to contribute substantially to economic production. However, existing benchmarks remain focused on ...
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search : Abstract: RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized vi...
Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics : Abstract: The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional c...
AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing : Abstract: LLM role-playing aims to portray arbitrary characters in interactive narratives, yet existing systems often suffer from limited immersion and adaptability. They typically under-model dynamic...
What Matters in Data Curation for Multimodal Reasoning? Insights from the DCVLR Challenge : Abstract: We study data curation for multimodal reasoning through the NeurIPS 2025 Data Curation for Vision-Language Reasoning (DCVLR) challenge, which isolates dataset selection by fixing the model a...
ARC Prize 2025: Technical Report : Abstract: The ARC-AGI benchmark series serves as a critical measure of few-shot generalization on novel tasks, a core aspect of intelligence. The ARC Prize 2025 global competition targeted the newly r...
Optimisation of complex product innovation processes based on trend models with three-valued logic : Abstract: This paper investigates complex product-innovation processes using models grounded in a set of heuristics. Each heuristic is expressed through simple trends -- increasing, decreasing, or con...
Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration : Abstract: An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires ...
CTHA: Constrained Temporal Hierarchical Architecture for Stable Multi-Agent LLM Systems : Abstract: Recently, multi-time-scale agent architectures have extended the ubiquitous single-loop paradigm by introducing temporal hierarchies with distinct cognitive layers. While yielding substantia...
ORBITFLOW: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration : Abstract: Serving long-context LLMs is challenging because request lengths and batch composition vary during token generation, causing the memory footprint to fluctuate significantly at runtime. Offlo...
Building AI Agents to Improve Job Referral Requests to Strangers : Abstract: This paper develops AI agents that help job seekers write effective requests for job referrals in a professional online community. The basic workflow consists of an improver agent that rewri...
Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models : Abstract: Perceived trustworthiness underpins how users navigate online information, yet it remains unclear whether large language models (LLMs),increasingly embedded in search, recommendation, and co...
Japanese AI Agent System on Human Papillomavirus Vaccination: System Design : Abstract: Human papillomavirus (HPV) vaccine hesitancy poses significant public health challenges, particularly in Japan where proactive vaccination recommendations were suspended from 2013 to 2021. T...

Research Sources: 262 | Generated: 1/19/2026