AI RESEARCH PAPERS & ACADEMIC SOURCES
- Controllable Video Generation: A Survey : Abstract: With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation...
- ProSGNeRF: Progressive Dynamic Neural Scene Graph with Frequency Modulated Foundation Model in Urban Scenes : Abstract: Implicit neural representation has demonstrated promising results in 3D reconstruction on various scenes. However, existing approaches either struggle to model fast-moving objects or are inc...
- VidLeaks: Membership Inference Attacks Against Text-to-Video Models : Abstract: The proliferation of powerful Text-to-Video (T2V) models, trained on massive web-scale datasets, raises urgent concerns about copyright and privacy violations. Membership inference attacks (...
- Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals : Abstract: Understanding how neural activity gives rise to perception is a central challenge in neuroscience. We address the problem of decoding visual information from high-density intracortical recor...
- Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset : Abstract: Recently, computer-aided diagnosis systems have been developed to support diagnosis, but their performance depends heavily on the quality and quantity of training data. However, in clinical ...
- Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations : Abstract: Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructe...
- Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment : Abstract: No-reference video quality assessment (NR-VQA) estimates perceptual quality without a reference video, which is often challenging. While recent techniques leverage saliency or transformer at...
- KOCOBrain: Kuramoto-Guided Graph Network for Uncovering Structure-Function Coupling in Adolescent Prenatal Drug Exposure : Abstract: Exposure to psychoactive substances during pregnancy, such as cannabis, can disrupt neurodevelopment and alter large-scale brain networks, yet identifying their neural signatures remains cha...
- Differentiating through binarized topology changes: Second-order subpixel-smoothed projection : Abstract: A key challenge in topology optimization (TopOpt) is that manufacturable structures, being inherently binary, are non-differentiable, creating a fundamental tension with gradient-based optim...
- UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation : Abstract: Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus ...
- ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes : Abstract: Indoor environments evolve as objects move, appear, or disappear. Capturing these dynamics requires maintaining temporally consistent instance identities across intermittently captured 3D sc...
- Generative Scenario Rollouts for End-to-End Autonomous Driving : Abstract: Vision-Language-Action (VLA) models are emerging as highly effective planning models for end-to-end autonomous driving systems. However, current works mostly rely on imitation learning from ...
- SME-YOLO: A Real-Time Detector for Tiny Defect Detection on PCB Surfaces : Abstract: Surface defects on Printed Circuit Boards (PCBs) directly compromise product reliability and safety. However, achieving high-precision detection is challenging because PCB defects are typica...
- SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction : Abstract: As autonomous driving moves toward full scene understanding, 3D semantic occupancy prediction has emerged as a crucial perception task, offering voxel-level semantics beyond traditional dete...
- Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning : Abstract: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the mode...
- Assessing Building Heat Resilience Using UAV and Street-View Imagery with Coupled Global Context Vision Transformer : Abstract: Climate change is intensifying human heat exposure, particularly in densely built urban centers of the Global South. Low-cost construction materials and high thermal-mass surfaces further ex...
- Enhancing Vision Language Models with Logic Reasoning for Situational Awareness : Abstract: Vision-Language Models (VLMs) offer the ability to generate high-level, interpretable descriptions of complex activities from images and videos, making them valuable for situational awarenes...
- Context-Aware Semantic Segmentation via Stage-Wise Attention : Abstract: Semantic ultra high resolution image (UHR) segmentation is essential in remote sensing applications such as aerial mapping and environmental monitoring. Transformer-based models struggle in ...
- SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2 : Abstract: Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising...
- Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping : Abstract: Effective disaster response relies on rapid disaster response, where oblique aerial video is the primary modality for initial scouting due to its ability to maximize spatial coverage and sit...
- FTDMamba: Frequency-Assisted Temporal Dilation Mamba for Unmanned Aerial Vehicle Video Anomaly Detection : Abstract: Recent advances in video anomaly detection (VAD) mainly focus on ground-based surveillance or unmanned aerial vehicle (UAV) videos with static backgrounds, whereas research on UAV videos wit...
- Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval : Abstract: Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offe...
- Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification : Abstract: We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a singl...
- Bio-inspired fine-tuning for selective transfer learning in image classification : Abstract: Deep learning has significantly advanced image analysis across diverse domains but often depends on large, annotated datasets for success. Transfer learning addresses this challenge by utili...
- ATATA: One Algorithm to Align Them All : Abstract: We suggest a new multi-modal algorithm for joint inference of paired structurally aligned samples with Rectified Flow models. While some existing methods propose a codependent generation pro...
- Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring : Abstract: The rapid evolution of satellite-borne Earth Observation (EO) systems has revolutionized terrestrial monitoring, yielding petabyte-scale archives. However, the immense computational and stor...
- SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention : Abstract: Standard softmax self-attention excels in vision tasks but incurs quadratic complexity O(N^2), limiting high-resolution deployment. Linear attention reduces the cost to O(N), yet its compres...
- Graph Smoothing for Enhanced Local Geometry Learning in Point Cloud Analysis : Abstract: Graph-based methods have proven to be effective in capturing relationships among points for 3D point cloud analysis. However, these methods often suffer from suboptimal graph structures, par...
- CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation : Abstract: Character image animation is gaining significant importance across various domains, driven by the demand for robust and flexible multi-subject rendering. While existing methods excel in sing...
- PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models : Abstract: Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation. This gap highlights a critical limitation in ren...
- M3DDM+: An improved video outpainting by a modified masking strategy : Abstract: M3DDM provides a computationally efficient framework for video outpainting via latent diffusion modeling. However, it exhibits significant quality degradation -- manifested as spatial blur a...
- MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement : Abstract: Medical Vision-Language Models (MedVLMs) excel at perception tasks but struggle with complex clinical reasoning required in real-world scenarios. While reinforcement learning (RL) has been e...
- Classification of Chest XRay Diseases through image processing and analysis techniques : Abstract: Multi-Classification Chest X-Ray Images are one of the most prevalent forms of radiological examination used for diagnosing thoracic diseases. In this study, we offer a concise overview of s...
- FrankenMotion: Part-level Human Motion Generation and Composition : Abstract: Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to ...
- Effects of Different Attention Mechanisms Applied on 3D Models in Video Classification : Abstract: Human action recognition has become an important research focus in computer vision due to the wide range of applications where it is used. 3D Resnet-based CNN models, particularly MC3, R3D, ...
- One Model, Many Behaviors: Training-Induced Effects on Out-of-Distribution Detection : Abstract: Out-of-distribution (OOD) detection is crucial for deploying robust and reliable machine-learning systems in open-world settings. Despite steady advances in OOD detectors, their interplay wi...
- A Unified 3D Object Perception Framework for Real-Time Outside-In Multi-Camera Systems : Abstract: Accurate 3D object perception and multi-target multi-camera (MTMC) tracking are fundamental for the digital transformation of industrial infrastructure. However, transitioning "inside-out" a...
- ICONIC-444: A 3.1-Million-Image Dataset for OOD Detection Research : Abstract: Current progress in out-of-distribution (OOD) detection is limited by the lack of large, high-quality datasets with clearly defined OOD categories across varying difficulty levels (near- to ...
- Future Optical Flow Prediction Improves Robot Control & Video Generation : Abstract: Future motion representations, such as optical flow, offer immense value for control and generative tasks. However, forecasting generalizable spatially dense motion representations remains a...
- Isotropy-Optimized Contrastive Learning for Semantic Course Recommendation : Abstract: This paper presents a semantic course recommendation system for students using a self-supervised contrastive learning approach built upon BERT (Bidirectional Encoder Representations from Tra...
- FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning : Abstract: Recent end-to-end spoken dialogue systems leverage speech tokenizers and neural audio codecs to enable LLMs to operate directly on discrete speech representations. However, these models ofte...
- SonicBench: Dissecting the Physical Perception Bottleneck in Large Audio Language Models : Abstract: Large Audio Language Models (LALMs) excel at semantic and paralinguistic tasks, yet their ability to perceive the fundamental physical attributes of audio such as pitch, loudness, and spatia...
- AJAR: Adaptive Jailbreak Architecture for Red-teaming : Abstract: As Large Language Models (LLMs) evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action securit...
- How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers : Abstract: Frontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the to...
- CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation : Abstract: In the generative AI era, where even critical medical tasks are increasingly automated, radiology report generation (RRG) continues to rely on suboptimal metrics for quality assessment. Deve...
- Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for enhancing large language models' question-answering capabilities through the integration of external knowledge. Ho...
- The unreasonable effectiveness of pattern matching : Abstract: We report on an astonishing ability of large language models (LLMs) to make sense of "Jabberwocky" language in which most or all content words have been randomly replaced by nonsense strings...
- Reward Modeling for Scientific Writing Evaluation : Abstract: Scientific writing is an expert-domain task that demands deep domain knowledge, task-specific requirements and reasoning capabilities that leverage the domain knowledge to satisfy the task s...
- Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models : Abstract: Chain-of-Thought reasoning has significantly enhanced the problem-solving capabilities of Large Language Models. Unfortunately, current models generate reasoning steps sequentially without f...
- Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming : Abstract: Large Language Models (LLMs) increasingly succeed on competitive programming problems, yet existing evaluations conflate algorithmic reasoning with code-level implementation. We argue that c...
- F-Actor: Controllable Conversational Behaviour in Full-Duplex Models : Abstract: Spoken conversational systems require more than accurate speech generation to have human-like conversations: to feel natural and engaging, they must produce conversational behaviour that ada...
- Membership Inference on LLMs in the Wild : Abstract: Membership Inference Attacks (MIAs) act as a crucial auditing tool for the opaque training data of Large Language Models (LLMs). However, existing techniques predominantly rely on inaccessib...
- One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking : Abstract: Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary model...
- Language of Thought Shapes Output Diversity in Large Language Models : Abstract: Output diversity is crucial for Large Language Models as it underpins pluralism and creativity. In this work, we reveal that controlling the language used during model thinking-the language ...
- MultiCaption: Detecting disinformation using multilingual visual claims : Abstract: Online disinformation poses an escalating threat to society, driven increasingly by the rapid spread of misleading content across both multimedia and multilingual platforms. While automated ...
- T$^\star$: Progressive Block Scaling for MDM Through Trajectory Aware RL : Abstract: We present T$^\star$, a simple \textsc{TraceRL}-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small...
- DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction : Abstract: Document-Level Relation Extraction (DocRE) presents significant challenges due to its reliance on cross-sentence context and the long-tail distribution of relation types, where many relation...
- The Growing Gains and Pains of Iterative Web Corpora Crawling: Insights from South Slavic CLASSLA-web 2.0 Corpora : Abstract: Crawling national top-level domains has proven to be highly effective for collecting texts in less-resourced languages. This approach has been recently used for South Slavic languages and re...
- Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments : Abstract: Large Language Models (LLMs) can now solve entire exams directly from uploaded PDF assessments, raising urgent concerns about academic integrity and the reliability of grades and credentials...
- Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data : Abstract: We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than e...
- From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models : Abstract: Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the rol...
- NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems : Abstract: Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopte...
- Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies : Abstract: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. W...
- ZPD Detector: Data Selection via Capability-Difficulty Alignment for Large Language Models : Abstract: As the cost of training large language models continues to increase and high-quality training data become increasingly scarce, selecting high-value samples or synthesizing effective training...
- Massively Multilingual Joint Segmentation and Glossing : Abstract: Automated interlinear gloss prediction with neural networks is a promising approach to accelerate language documentation efforts. However, while state-of-the-art models like GlossLM achieve ...
- Neural Induction of Finite-State Transducers : Abstract: Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transd...
- DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference : Abstract: LLMs are increasingly used as third-party judges, yet their reliability when evaluating speakers in dialogue remains poorly understood. We show that LLMs judge identical claims differently d...
- EncodeRec: An Embedding Backbone for Recommendation Systems : Abstract: Recent recommender systems increasingly leverage embeddings from large pre-trained language models (PLMs). However, such embeddings exhibit two key limitations: (1) PLMs are not explicitly o...
- A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents : Abstract: Style features such as friendly, helpful, or concise are widely used in prompts to steer the behavior of Large Language Model (LLM) conversational agents, yet their unintended side effects r...
- BYOL: Bring Your Own Language Into LLMs : Abstract: Large Language Models (LLMs) exhibit strong multilingual capabilities, yet remain fundamentally constrained by the severe imbalance in global language resources. While over 7,000 languages a...
- Conditional Distribution Compression via the Kernel Conditional Mean Embedding : Abstract: Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribu...
- A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations : Abstract: We propose a scalable preconditioned primal-dual hybrid gradient algorithm for solving partial differential equations (PDEs). We multiply the PDE with a dual test function to obtain an inf-s...
- High-Dimensional Tail Index Regression : Abstract: Motivated by the empirical observation of power-law distributions in the credits (e.g., ``likes'') of viral posts in social media, we introduce a high-dimensional tail index regression model...
- Detecting Toxic Flow : Abstract: This paper develops a framework to predict toxic trades that a broker receives from her clients. Toxic trades are predicted with a novel online learning Bayesian method which we call the pro...
- UCB-type Algorithm for Budget-Constrained Expert Learning : Abstract: In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments...
- ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs : Abstract: Robust model-editing techniques are essential for deploying large language models (LLMs) in practical applications, as they enable cost-effective ways to deal with challenges such as privacy...
- ShapeR: Robust Conditional 3D Shape Generation from Casual Captures : Abstract: Recent advances in 3D shape generation have achieved impressive results, but most existing methods rely on clean, unoccluded, and well-segmented inputs. Such conditions are rarely met in rea...
- On the Probability of First Success in Differential Evolution: Hazard Identities and Tail Bounds : Abstract: We study first-hitting times in Differential Evolution (DE) through a conditional hazard frame work. Instead of analyzing convergence via Markov-chain transition kernels or drift arguments, ...
- A Probabilistic Approach to Trajectory-Based Optimal Experimental Design : Abstract: We present a novel probabilistic approach for optimal path experimental design. In this approach a discrete path optimization problem is defined on a static navigation mesh, and trajectories...
- Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations : Abstract: Learning structured task representations from human demonstrations is essential for understanding long-horizon manipulation behaviors, particularly in bimanual settings where action ordering...
- IMS: Intelligent Hardware Monitoring System for Secure SoCs : Abstract: In the modern Systems-on-Chip (SoC), the Advanced eXtensible Interface (AXI) protocol exhibits security vulnerabilities, enabling partial or complete denial-of-service (DoS) through protocol...
- Near-Optimal Decentralized Stochastic Nonconvex Optimization with Heavy-Tailed Noise : Abstract: This paper studies decentralized stochastic nonconvex optimization problem over row-stochastic networks. We consider the heavy-tailed gradient noise which is empirically observed in many pop...
- PubMed-OCR: PMC Open Access OCR Annotations : Abstract: PubMed-OCR is an OCR-centric corpus of scientific articles derived from PubMed Central Open Access PDFs. Each page image is annotated with Google Cloud Vision and released in a compact JSON ...
- Statistical Robustness of Interval CVaR Based Regression Models under Perturbation and Contamination : Abstract: Robustness under perturbation and contamination is a prominent issue in statistical learning. We address the robust nonlinear regression based on the so-called interval conditional value-at-...
- Zero-Shot Detection of Elastic Transient Morphology Across Physical Systems : Abstract: We test whether a representation learned from interferometric strain transients in gravitational-wave observatories can act as a frozen morphology-sensitive operator for unseen sensors, prov...
- New Adaptive Mechanism for Large Neighborhood Search using Dual Actor-Critic : Abstract: Adaptive Large Neighborhood Search (ALNS) is a widely used heuristic method for solving combinatorial optimization problems. ALNS explores the solution space by iteratively using destroy and...
- Beer-Lambert Autoencoder for Unsupervised Stain Representation Learning and Deconvolution in Multi-immunohistochemical Brightfield Histology Images : Abstract: Separating the contributions of individual chromogenic stains in RGB histology whole slide images (WSIs) is essential for stain normalization, quantitative assessment of marker expression, a...
- Information Theoretic Perspective on Representation Learning : Abstract: An information-theoretic framework is introduced to analyze last-layer embedding, focusing on learned representations for regression tasks. We define representation-rate and derive limits on...
- Scalable Music Cover Retrieval Using Lyrics-Aligned Audio Embeddings : Abstract: Music Cover Retrieval, also known as Version Identification, aims to recognize distinct renditions of the same underlying musical work, a task central to catalog management, copyright enforc...
- Effects of Introducing Synaptic Scaling on Spiking Neural Network Learning : Abstract: Spiking neural networks (SNNs) employing unsupervised learning methods inspired by neural plasticity are expected to be a new framework for artificial intelligence. In this study, we investi...
- Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering : Abstract: Retrieval-Augmented Generation (RAG) has demonstrated significant effectiveness in enhancing large language models (LLMs) for complex multi-hop question answering (QA). For multi-hop QA task...
- How DDAIR you? Disambiguated Data Augmentation for Intent Recognition : Abstract: Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with reg...
- Model-free policy gradient for discrete-time mean-field control : Abstract: We study model-free policy learning for discrete-time mean-field control (MFC) problems with finite state space and compact action space. In contrast to the extensive literature on value-bas...
- Comprehensive Robust Dynamic Mode Decomposition from Mode Extraction to Dimensional Reduction : Abstract: We propose Comprehensive Robust Dynamic Mode Decomposition (CR-DMD), a novel framework that robustifies the entire DMD process - from mode extraction to dimensional reduction - against mixed...
- KANHedge: Efficient Hedging of High-Dimensional Options Using Kolmogorov-Arnold Network-Based BSDE Solver : Abstract: High-dimensional option pricing and hedging present significant challenges in quantitative finance, where traditional PDE-based methods struggle with the curse of dimensionality. The BSDE fr...
- Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series : Abstract: In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The da...
- CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs : Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities but often grapple with reliability challenges like hallucinations. While Knowledge Graphs (KGs) offer explici...
- Exact Constraint Enforcement in Physics-Informed Extreme Learning Machines using Null-Space Projection Framework : Abstract: Physics-informed extreme learning machines (PIELMs) typically impose boundary and initial conditions through penalty terms, yielding only approximate satisfaction that is sensitive to user-s...
- Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection : Abstract: Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regim...
- Depression Detection Based on Electroencephalography Using a Hybrid Deep Neural Network CNN-GRU and MRMR Feature Selection : Abstract: This study investigates the detection and classification of depressive and non-depressive states using deep learning approaches. Depression is a prevalent mental health disorder that substan...
- A PAC-Bayesian Analysis of Channel-Induced Degradation in Edge Inference : Abstract: In the emerging paradigm of edge inference, neural networks (NNs) are partitioned across distributed edge devices that collaboratively perform inference via wireless transmission. However, s...
- Learning collision operators from plasma phase space data using differentiable simulators : Abstract: We propose a methodology to infer collision operators from phase space data of plasma dynamics. Our approach combines a differentiable kinetic simulator, whose core component in this work is...
- Reasoning Models Generate Societies of Thought : Abstract: Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable in...
- LLMs for Game Theory: Entropy-Guided In-Context Learning and Adaptive CoT Reasoning : Abstract: We propose a novel LLM-based framework for reasoning in discrete, game-theoretic tasks, illustrated with \emph{Tic-Tac-Toe}. The method integrates in-context learning with entropy-guided cha...
- Physically constrained unfolded multi-dimensional OMP for large MIMO systems : Abstract: Sparse recovery methods are essential for channel estimation and localization in modern communication systems, but their reliability relies on accurate physical models, which are rarely perf...
- Mass Distribution versus Density Distribution in the Context of Clustering : Abstract: This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto des...
- Sensor Placement for Urban Traffic Interpolation: A Data-Driven Evaluation to Inform Policy : Abstract: Data on citywide street-segment traffic volumes are essential for urban planning and sustainable mobility management. Yet such data are available only for a limited subset of streets due to ...
- UBiGTLoc: A Unified BiLSTM-Graph Transformer Localization Framework for IoT Sensor Networks : Abstract: Sensor nodes localization in wireless Internet of Things (IoT) sensor networks is crucial for the effective operation of diverse applications, such as smart cities and smart agriculture. Exi...
- SSC-UNet: UNet with Self-Supervised Contrastive Learning for Phonocardiography Noise Reduction : Abstract: Congenital Heart Disease (CHD) remains a significant global health concern affecting approximately 1\% of births worldwide. Phonocardiography has emerged as a supplementary tool to diagnose ...
- QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid : Abstract: Smart grid infrastructures have revolutionized energy distribution, but their day-to-day operations require robust anomaly detection methods to counter risks associated with cyber-physical t...
- Extractive summarization on a CMOS Ising machine : Abstract: Extractive summarization (ES) aims to generate a concise summary by selecting a subset of sentences from a document while maximizing relevance and minimizing redundancy. Although modern ES s...
- Low-Rank Key Value Attention : Abstract: Transformer pretraining is increasingly constrained by memory and compute requirements, with the key-value (KV) cache emerging as a dominant bottleneck during training and autoregressive dec...
- When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models : Abstract: Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its appli...
- Inter-patient ECG Arrhythmia Classification with LGNs and LUTNs : Abstract: Deep Differentiable Logic Gate Networks (LGNs) and Lookup Table Networks (LUTNs) are demonstrated to be suitable for the automatic classification of electrocardiograms (ECGs) using the inter...
- Forcing and Diagnosing Failure Modes of Fourier Neural Operators Across Diverse PDE Families : Abstract: Fourier Neural Operators (FNOs) have shown strong performance in learning solution maps of partial differential equations (PDEs), but their robustness under distribution shifts, long-horizon...
- Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning : Abstract: Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision process...
- Latent Space Inference via Paired Autoencoders : Abstract: This work describes a novel data-driven latent space inference framework built on paired autoencoders to handle observational inconsistencies when solving inverse problems. Our approach uses...
- Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency : Abstract: Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporati...
- Unlocking the Potentials of Retrieval-Augmented Generation for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. However, the potential of Retrieval-Augmented Generation (RAG), whic...
- FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning : Abstract: Tabular data high-stakes critical decision-making in domains such as finance, healthcare, and scientific discovery. Yet, learning effectively from tabular data in few-shot settings, where la...
- Metabolomic Biomarker Discovery for ADHD Diagnosis Using Interpretable Machine Learning : Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder with limited objective diagnostic tools, highlighting the urgent need for objective, biology-based ...
- Sample-Near-Optimal Agnostic Boosting with Improved Running Time : Abstract: Boosting is a powerful method that turns weak learners, which perform only slightly better than random guessing, into strong learners with high accuracy. While boosting is well understood in...
- Latent Dynamics Graph Convolutional Networks for model order reduction of parameterized time-dependent PDEs : Abstract: Graph Neural Networks (GNNs) are emerging as powerful tools for nonlinear Model Order Reduction (MOR) of time-dependent parameterized Partial Differential Equations (PDEs). However, existing...
- Operator learning on domain boundary through combining fundamental solution-based artificial data and boundary integral techniques : Abstract: For linear partial differential equations with known fundamental solutions, this work introduces a novel operator learning framework that relies exclusively on domain boundary data, includin...
- TimeMar: Multi-Scale Autoregressive Modeling for Unconditional Time Series Generation : Abstract: Generative modeling offers a promising solution to data scarcity and privacy challenges in time series analysis. However, the structural complexity of time series, characterized by multi-sca...
- LSTM VS. Feed-Forward Autoencoders for Unsupervised Fault Detection in Hydraulic Pumps : Abstract: Unplanned failures in industrial hydraulic pumps can halt production and incur substantial costs. We explore two unsupervised autoencoder (AE) schemes for early fault detection: a feed-forwa...
- GMM-COMET: Continual Source-Free Universal Domain Adaptation via a Mean Teacher and Gaussian Mixture Model-Based Pseudo-Labeling : Abstract: Unsupervised domain adaptation tackles the problem that domain shifts between training and test data impair the performance of neural networks in many real-world applications. Thereby, in re...
- Theoretically and Practically Efficient Resistance Distance Computation on Large Graphs : Abstract: The computation of resistance distance is pivotal in a wide range of graph analysis applications, including graph clustering, link prediction, and graph neural networks. Despite its foundati...
- Assesing the Viability of Unsupervised Learning with Autoencoders for Predictive Maintenance in Helicopter Engines : Abstract: Unplanned engine failures in helicopters can lead to severe operational disruptions, safety hazards, and costly repairs. To mitigate these risks, this study compares two predictive maintenan...
- FSL-BDP: Federated Survival Learning with Bayesian Differential Privacy for Credit Risk Modeling : Abstract: Credit risk models are a critical decision-support tool for financial institutions, yet tightening data-protection rules (e.g., GDPR, CCPA) increasingly prohibit cross-border sharing of borr...
- Shape-morphing programming of soft materials on complex geometries via neural operator : Abstract: Shape-morphing soft materials can enable diverse target morphologies through voxel-level material distribution design, offering significant potential for various applications. Despite progre...
- Optimized Algorithms for Text Clustering with LLM-Generated Constraints : Abstract: Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have inc...
- Differentially Private Subspace Fine-Tuning for Large Language Models : Abstract: Fine-tuning large language models on downstream tasks is crucial for realizing their cross-domain potential but often relies on sensitive data, raising privacy concerns. Differential privacy...
- Soft Bayesian Context Tree Models for Real-Valued Time Series : Abstract: This paper proposes the soft Bayesian context tree model (Soft-BCT), which is a novel BCT model for real-valued time series. The Soft-BCT considers soft (probabilistic) splits of the context...
- Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spur...
- OpFML: Pipeline for ML-based Operational Forecasting : Abstract: Machine learning is finding its application in a multitude of areas in science and research, and Climate and Earth Sciences is no exception to this trend. Operational forecasting systems bas...
- Self-Augmented Mixture-of-Experts for QoS Prediction : Abstract: Quality of Service (QoS) prediction is one of the most fundamental problems in service computing and personalized recommendation. In the problem, there is a set of users and services, each a...
- AVP-Pro: An Adaptive Multi-Modal Fusion and Contrastive Learning Approach for Comprehensive Two-Stage Antiviral Peptide Identification : Abstract: The accurate identification of antiviral peptides (AVPs) is crucial for novel drug development. However, existing methods still have limitations in capturing complex sequence dependencies an...
- Matching High-Dimensional Geometric Quantiles for Test-Time Adaptation of Transformers and Convolutional Networks Alike : Abstract: Test-time adaptation (TTA) refers to adapting a classifier for the test data when the probability distribution of the test data slightly differs from that of the training data of the model. ...
- Backdoor Attacks on Multi-modal Contrastive Learning : Abstract: Contrastive learning has become a leading self- supervised approach to representation learning across domains, including vision, multimodal settings, graphs, and federated learning. However,...
- Constant Metric Scaling in Riemannian Computation : Abstract: Constant rescaling of a Riemannian metric appears in many computational settings, often through a global scale parameter that is introduced either explicitly or implicitly. Although this ope...
- Reasoning Distillation for Lightweight Automated Program Repair : Abstract: We study whether lightweight symbolic reasoning supervision can improve fix type classification in compact automated program repair models. Small code models are attractive for resource-cons...
- Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration : Abstract: Restoring critical loads after extreme events demands adaptive control to maintain distribution-grid resilience, yet uncertainty in renewable generation, limited dispatchable resources, and ...
- Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent : Abstract: Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD le...
- Multivariate LSTM-Based Forecasting for Renewable Energy: Enhancing Climate Change Mitigation : Abstract: The increasing integration of renewable energy sources (RESs) into modern power systems presents significant opportunities but also notable challenges, primarily due to the inherent variabil...
- HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training : Abstract: Split learning (SL) enables collaborative training of large language models (LLMs) between resource-constrained edge devices and compute-rich servers by partitioning model computation across...
- FAConvLSTM: Factorized-Attention ConvLSTM for Efficient Feature Extraction in Multivariate Climate Data : Abstract: Learning physically meaningful spatiotemporal representations from high-resolution multivariate Earth observation data is challenging due to strong local dynamics, long-range teleconnections...
- Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation : Abstract: Sustainability is becoming increasingly critical in the maritime transport, encompassing both environmental and social impacts, such as Greenhouse Gas (GHG) emissions and navigational safety...
- Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning : Abstract: Numerous offline and model-based reinforcement learning systems incorporate world models to emulate the inherent environments. A world model is particularly important in scenarios where dire...
- Unit-Consistent (UC) Adjoint for GSD and Backprop in Deep Learning Applications : Abstract: Deep neural networks constructed from linear maps and positively homogeneous nonlinearities (e.g., ReLU) possess a fundamental gauge symmetry: the network function is invariant to node-wise ...
- Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting : Abstract: Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event...
- AI-Guided Human-In-the-Loop Inverse Design of High Performance Engineering Structures : Abstract: Inverse design tools such as Topology Optimization (TO) can achieve new levels of improvement for high-performance engineered structures. However, widespread use is hindered by high computat...
- Mugi: Value Level Parallelism For Efficient LLMs : Abstract: Value level parallelism (VLP) has been proposed to improve the efficiency of large-batch, low-precision general matrix multiply (GEMM) between symmetric activations and weights. In transform...
- Towards Tensor Network Models for Low-Latency Jet Tagging on FPGAs : Abstract: We present a systematic study of Tensor Network (TN) models $\unicode{x2013}$ Matrix Product States (MPS) and Tree Tensor Networks (TTN) $\unicode{x2013}$ for real-time jet tagging in high-e...
- Analytic Bijections for Smooth and Interpretable Normalizing Flows : Abstract: A key challenge in designing normalizing flows is finding expressive scalar bijections that remain invertible with tractable Jacobians. Existing approaches face trade-offs: affine transforma...
- Vendor-Aware Industrial Agents: RAG-Enhanced LLMs for Secure On-Premise PLC Code Generation : Abstract: Programmable Logic Controllers are operated by proprietary code dialects; this makes it challenging to train coding assistants. Current LLMs are trained on large code datasets and are capabl...
- Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training : Abstract: Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with...
- Policy alone is probably not the solution: A large-scale experiment on how developers struggle to design meaningful end-user explanations : Abstract: Developers play a central role in determining how machine learning systems are explained in practice, yet they are rarely trained to design explanations for non-technical audiences. Despite ...
- Balanced Edge Pruning for Graph Anomaly Detection with Noisy Labels : Abstract: Graph anomaly detection (GAD) is widely applied in many areas, such as financial fraud detection and social spammer detection. Anomalous nodes in the graph not only impact their own communit...
- A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning : Abstract: Offline reinforcement learning (RL) provides a promising solution to learning an agent fully relying on a data-driven paradigm. However, constrained by the limited quality of the offline dat...
- Utilizing Class Separation Distance for the Evaluation of Corruption Robustness of Machine Learning Classifiers : Abstract: Robustness is a fundamental pillar of Machine Learning (ML) classifiers, substantially determining their reliability. Methods for assessing classifier robustness are therefore essential. In ...
- Theorem Prover as a Judge for Synthetic Data Generation : Abstract: The demand for synthetic data in mathematical reasoning has increased due to its potential to enhance the mathematical capabilities of large language models (LLMs). However, ensuring the val...
- Feature Propagation on Knowledge Graphs using Cellular Sheaves : Abstract: Many inference tasks on knowledge graphs, including relation prediction, operate on knowledge graph embeddings -- vector representations of the vertices (entities) and edges (relations) that...
- Do explanations generalize across large reasoning models? : Abstract: Large reasoning models (LRMs) produce a textual chain of thought (CoT) in the process of solving a problem, which serves as a potentially powerful tool to understand the problem by surfacing...
- Building Production-Ready Probes For Gemini : Abstract: Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful systems. Prior work has shown that activation ...
- MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management : Abstract: Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substanti...
- The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents : Abstract: The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available tec...
- MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models : Abstract: As vision-language models (VLMs) tackle increasingly complex and multimodal tasks, the rapid growth of Key-Value (KV) cache imposes significant memory and computational bottlenecks during in...
- Interactive Narrative Analytics: Bridging Computational Narrative Extraction and Human Sensemaking : Abstract: Information overload and misinformation create significant challenges in extracting meaningful narratives from large news collections. This paper defines the nascent field of Interactive Nar...
- PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs : Abstract: Large-scale livestock operations pose significant risks to human health and the environment, while also being vulnerable to threats such as infectious diseases and extreme weather events. As...
- Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps : Abstract: We propose Map2Thought, a framework that enables explicit and interpretable spatial reasoning for 3D VLMs. The framework is grounded in two key components: Metric Cognitive Map (Metric-CogMa...
- Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models : Abstract: Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate th...
- GenDA: Generative Data Assimilation on Complex Urban Areas via Classifier-Free Diffusion Guidance : Abstract: Urban wind flow reconstruction is essential for assessing air quality, heat dispersion, and pedestrian comfort, yet remains challenging when only sparse sensor data are available. We propose...
- Relational Linearity is a Predictor of Hallucinations : Abstract: Hallucination is a central failure mode in large language models (LLMs). We focus on hallucinations of answers to questions like: "Which instrument did Glenn Gould play?", but we ask these q...
- The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents : Abstract: Recently, with the rapid development of robot learning and imitation learning, numerous datasets and methods have emerged. However, these datasets and their task designs often lack systemati...
- Topology-Guaranteed Image Segmentation: Enforcing Connectivity, Genus, and Width Constraints : Abstract: Existing research highlights the crucial role of topological priors in image segmentation, particularly in preserving essential structures such as connectivity and genus. Accurately capturin...
- Wetland mapping from sparse annotations with satellite image time series and temporal-aware segment anything model : Abstract: Accurate wetland mapping is essential for ecosystem monitoring, yet dense pixel-level annotation is prohibitively expensive and practical applications usually rely on sparse point labels, un...
- Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences : Abstract: General-purpose Large Language Models (LLMs) show significant potential in recruitment applications, where decisions require reasoning over unstructured text, balancing multiple criteria, an...
- Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs : Abstract: Multi-agent LLM ensembles can converge on coordinated, socially harmful equilibria. This paper advances an experimental framework for evaluating Institutional AI, our system-level approach t...
- Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding : Abstract: Recent progress in multi-modal large language models (MLLMs) has significantly advanced video understanding. However, their performance on long-form videos remains limited by computational c...
- FEATHer: Fourier-Efficient Adaptive Temporal Hierarchy Forecaster for Time-Series Forecasting : Abstract: Time-series forecasting is fundamental in industrial domains like manufacturing and smart factories. As systems evolve toward automation, models must operate on edge devices (e.g., PLCs, mic...
- How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting : Abstract: Large language models (LLMs) show promise in drafting responses to patient portal messages, yet their integration into clinical workflows raises various concerns, including whether they woul...
- From SERPs to Sound: How Search Engine Result Pages and AI-generated Podcasts Interact to Influence User Attitudes on Controversial Topics : Abstract: Compared to search engine result pages (SERPs), AI-generated podcasts represent a relatively new and relatively more passive modality of information consumption, delivering narratives in a n...
- X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning : Abstract: Visuomotor policies often leverage large pre-trained Vision Transformers (ViTs) for their powerful generalization capabilities. However, their significant data requirements present a major c...
- Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation : Abstract: Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT...
- FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models : Abstract: Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses. A promising approach to rectify these flaws is correcting L...
- SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients : Abstract: Federated learning (FL) for large language models (LLMs) has attracted increasing attention as a way to enable privacy-preserving adaptation over distributed data. Parameter-efficient method...
- LoRA as Oracle : Abstract: Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detectio...
- Epistemic Control and the Normativity of Machine Learning-Based Science : Abstract: The past few years have witnessed an increasing use of machine learning (ML) systems in science. Paul Humphreys has argued that, because of specific characteristics of ML systems, human scie...
- FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization : Abstract: Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativenes...
- SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained thr...
- Artificial Intelligence and the US Economy: An Accounting Perspective on Investment and Production : Abstract: Artificial intelligence (AI) has moved to the center of policy, market, and academic debates, but its macroeconomic footprint is still only partly understood. This paper provides an overview...
- Clustering High-dimensional Data: Balancing Abstraction and Representation Tutorial at AAAI 2026 : Abstract: How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous deta...
- Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation : Abstract: Multimedia recommendation systems leverage user-item interactions and multimodal information to capture user preferences, enabling more accurate and personalized recommendations. Despite not...
- Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) frameworks face a trade-off between the comprehensiveness of global search and the efficiency of local search. Existing methods are ofte...
- Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model : Abstract: The simulation-to-reality (sim-to-real) transfer of large-scale hydraulic robots presents a significant challenge in robotics because of the inherent slow control response and complex fluid ...
- Context-aware Graph Causality Inference for Few-Shot Molecular Property Prediction : Abstract: Molecular property prediction is becoming one of the major applications of graph learning in Web-based services, e.g., online protein structure prediction and drug discovery. A key challenge...
- Learn Before Represent: Bridging Generative and Contrastive Learning for Domain-Specific LLM Embeddings : Abstract: Large Language Models (LLMs) adapted via contrastive learning excel in general representation learning but struggle in vertical domains like chemistry and law, primarily due to a lack of dom...
- Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning : Abstract: Vision-as-inverse-graphics, the concept of reconstructing an image as an editable graphics program is a long-standing goal of computer vision. Yet even strong VLMs aren't able to achieve thi...
- Efficient Multilingual Name Type Classification Using Convolutional Networks : Abstract: We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-se...
- Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments : Abstract: Marker-based landing is widely used in drone delivery and return-to-base systems for its simplicity and reliability. However, most approaches assume idealized landing site visibility and sen...
- ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development : Abstract: The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven pro...
- A3D: Adaptive Affordance Assembly with Dual-Arm Manipulation : Abstract: Furniture assembly is a crucial yet challenging task for robots, requiring precise dual-arm coordination where one arm manipulates parts while the other provides collaborative support and st...
- Bridging Cognitive Neuroscience and Graph Intelligence: Hippocampus-Inspired Multi-View Hypergraph Learning for Web Finance Fraud : Abstract: Online financial services constitute an essential component of contemporary web ecosystems, yet their openness introduces substantial exposure to fraud that harms vulnerable users and weaken...
- Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage : Abstract: Fairness in automated decision-making has become a critical concern, particularly in high-pressure healthcare scenarios such as emergency triage, where fast and equitable decisions are essen...
- H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning : Abstract: In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (...
- Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings : Abstract: We examine whether large language models (LLMs) can predict biased decision-making in conversational settings, and whether their predictions capture not only human cognitive biases but also ...
- Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse : Abstract: Sequential knowledge editing in large language models often causes catastrophic collapse of the model's general abilities, especially for parameter-modifying methods. Existing approaches mit...
- Your One-Stop Solution for AI-Generated Video Detection : Abstract: Recent advances in generative modeling can create remarkably realistic synthetic videos, making it increasingly difficult for humans to distinguish them from real ones and necessitating reli...
- IDDR-NGP: Incorporating Detectors for Distractor Removal with Instant Neural Radiance Field : Abstract: This paper presents the first unified distractor removal method, named IDDR-NGP, which directly operates on Instant-NPG. The method is able to remove a wide range of distractors in 3D scenes...
- Combating Spurious Correlations in Graph Interpretability via Self-Reflection : Abstract: Interpretable graph learning has recently emerged as a popular research topic in machine learning. The goal is to identify the important nodes and edges of an input graph that are crucial fo...
- Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs : Abstract: Large Language Models (LLMs) frequently exhibit strong translation abilities, even without task-specific fine-tuning. However, the internal mechanisms governing this innate capability remain...
- Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach : Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by develop...
- When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs : Abstract: Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that...
- Steering Language Models Before They Speak: Logit-Level Interventions : Abstract: Steering LLMs is essential for specialized applications such as style-sensitive text rewriting, user-adaptive communication, and toxicity mitigation. Current steering methods, such as prompt...
- Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents : Abstract: The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or ...
- Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions : Abstract: The simulation of realistic clinical interactions plays a pivotal role in advancing clinical Large Language Models (LLMs) and supporting medical diagnostic education. Existing approaches and...
- PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis : Abstract: Traditionally, AI research in medical diagnosis has largely centered on image analysis. While this has led to notable advancements, the absence of patient-reported symptoms continues to hind...
- Sparse Data Tree Canopy Segmentation: Fine-Tuning Leading Pretrained Models on Only 150 Images : Abstract: Tree canopy detection from aerial imagery is an important task for environmental monitoring, urban planning, and ecosystem analysis. Simulating real-life data annotation scarcity, the Solafu...
- Selecting Language Models for Social Science: Start Small, Start Open, and Validate : Abstract: Currently, there are thousands of large pretrained language models (LLMs) available to social scientists. How do we select among them? Using validity, reliability, reproducibility, and repli...
- RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions : Abstract: Robust Multi-Task Learning (MTL) is crucial for autonomous systems operating in real-world environments, where adverse weather conditions can severely degrade model performance and reliabili...
- Self-learned representation-guided latent diffusion model for breast cancer classification in deep ultraviolet whole surface images : Abstract: Breast-Conserving Surgery (BCS) requires precise intraoperative margin assessment to preserve healthy tissue. Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) offers rapid, high-r...
- Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation : Abstract: Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicabili...
- Can Vision-Language Models Understand Construction Workers? An Exploratory Study : Abstract: As robotics become increasingly integrated into construction workflows, their ability to interpret and respond to human behavior will be essential for enabling safe and effective collaborati...
- Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets : Abstract: If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient a...
- Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents : Abstract: Recent advances in code generation models have unlocked unprecedented opportunities for automating feature engineering, yet their adoption in real-world ML teams remains constrained by criti...
- Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning -- Towards a Pure Neural Logic Core : Abstract: Large language models (LLMs) currently suffer from parameter entanglement, where general reasoning capabilities (logic) and specific factual knowledge (facts) exist in a superposition state ...
- Unified Optimization of Source Weights and Transfer Quantities in Multi-Source Transfer Learning: An Asymptotic Framework : Abstract: Transfer learning plays a vital role in improving model performance in data-scarce scenarios. However, naive uniform transfer from multiple source tasks may result in negative transfer, high...
- LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems : Abstract: Understanding large software systems is a challenging task, especially when code is distributed across multiple repositories and microservices. Developers often need to reason not only about...
- Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers : Abstract: Traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented p...
- AnyECG: Evolved ECG Foundation Model for Holistic Health Profiling : Abstract: Background: Artificial intelligence enabled electrocardiography (AI-ECG) has demonstrated the ability to detect diverse pathologies, but most existing models focus on single disease identifi...
- Line-based Event Preprocessing: Towards Low-Energy Neuromorphic Computer Vision : Abstract: Neuromorphic vision made significant progress in recent years, thanks to the natural match between spiking neural networks and event data in terms of biological inspiration, energy savings, ...
- Neuro-Symbolic Activation Discovery: Transferring Mathematical Structures from Physics to Ecology for Parameter-Efficient Neural Networks : Abstract: Modern neural networks rely on generic activation functions (ReLU, GELU, SiLU) that ignore the mathematical structure inherent in scientific data. We propose Neuro-Symbolic Activation Discov...
- Millimeter-Wave Gesture Recognition in ISAC: Does Reducing Sensing Airtime Hamper Accuracy? : Abstract: Most Integrated Sensing and Communications (ISAC) systems require dividing airtime across their two modes. However, the specific impact of this decision on sensing performance remains unclea...
- DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion : Abstract: Speech tokenizers serve as the cornerstone of discrete Speech Large Language Models (Speech LLMs). Existing tokenizers either prioritize semantic encoding, fuse semantic content with acousti...
- EvidFuse: Writing-Time Evidence Learning for Consistent Text-Chart Data Reporting : Abstract: Data-driven reports communicate decision-relevant insights by tightly interleaving narrative text with charts grounded in underlying tables. However, current LLM-based systems typically gene...
- Generative AI Purpose-built for Social and Mental Health: A Real-World Pilot : Abstract: Generative artificial intelligence (GAI) chatbots built for mental health could deliver safe, personalized, and scalable mental health support. We evaluate a foundation model designed for me...
- BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics : Abstract: Competitive sports require sophisticated tactical analysis, yet combat disciplines like boxing remain underdeveloped in AI-driven analytics due to the complexity of action dynamics and the l...
- Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning : Abstract: Ethiopia's Ministry of Health is upgrading health posts to improve access to essential services, particularly in rural areas. Limited resources, however, require careful prioritization of wh...
- Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs : Abstract: Predictive Process Monitoring is a branch of process mining that aims to predict the outcome of an ongoing process. Recently, it leveraged machine-and-deep learning architectures. In this pa...
- Hyperparameter Optimization of Constraint Programming Solvers : Abstract: The performance of constraint programming solvers is highly sensitive to the choice of their hyperparameters. Manually finding the best solver configuration is a difficult, time-consuming ta...
- AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems : Abstract: Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks l...
- XChoice: Explainable Evaluation of AI-Human Alignment in LLM-based Constrained Choice Decision Making : Abstract: We present XChoice, an explainable framework for evaluating AI-human alignment in constrained decision making. Moving beyond outcome agreement such as accuracy and F1 score, XChoice fits a m...
- Beyond Model Scaling: Test-Time Intervention for Efficient Deep Reasoning : Abstract: Large Reasoning Models (LRMs) excel at multi-step reasoning but often suffer from inefficient reasoning processes like overthinking and overshoot, where excessive or misdirected reasoning in...
- Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems : Abstract: This paper proposes a policy-based deep reinforcement learning hyper-heuristic framework for solving the Job Shop Scheduling Problem. The hyper-heuristic agent learns to switch scheduling ru...
- TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech : Abstract: Social media platforms are increasingly dominated by long-form multimodal content, where harmful narratives are constructed through a complex interplay of audio, visual, and textual cues. Wh...
- Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems : Abstract: Multi-Agent Systems (MAS) built on large language models typically solve complex tasks by coordinating multiple agents through workflows. Existing approaches generates workflows either at ta...
- ReCreate: Reasoning and Creating Domain Agents Driven by Experience : Abstract: Large Language Model agents are reshaping the industrial landscape. However, most practical agents remain human-designed because tasks differ widely, making them labor-intensive to build. Th...
- MiCA: A Mobility-Informed Causal Adapter for Lightweight Epidemic Forecasting : Abstract: Accurate forecasting of infectious disease dynamics is critical for public health planning and intervention. Human mobility plays a central role in shaping the spatial spread of epidemics, b...
- AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts : Abstract: Large Language Models (LLMs) based autonomous agents demonstrate multifaceted capabilities to contribute substantially to economic production. However, existing benchmarks remain focused on ...
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search : Abstract: RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized vi...
- Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics : Abstract: The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional c...
- AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing : Abstract: LLM role-playing aims to portray arbitrary characters in interactive narratives, yet existing systems often suffer from limited immersion and adaptability. They typically under-model dynamic...
- What Matters in Data Curation for Multimodal Reasoning? Insights from the DCVLR Challenge : Abstract: We study data curation for multimodal reasoning through the NeurIPS 2025 Data Curation for Vision-Language Reasoning (DCVLR) challenge, which isolates dataset selection by fixing the model a...
- ARC Prize 2025: Technical Report : Abstract: The ARC-AGI benchmark series serves as a critical measure of few-shot generalization on novel tasks, a core aspect of intelligence. The ARC Prize 2025 global competition targeted the newly r...
- Optimisation of complex product innovation processes based on trend models with three-valued logic : Abstract: This paper investigates complex product-innovation processes using models grounded in a set of heuristics. Each heuristic is expressed through simple trends -- increasing, decreasing, or con...
- Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration : Abstract: An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires ...
- CTHA: Constrained Temporal Hierarchical Architecture for Stable Multi-Agent LLM Systems : Abstract: Recently, multi-time-scale agent architectures have extended the ubiquitous single-loop paradigm by introducing temporal hierarchies with distinct cognitive layers. While yielding substantia...
- ORBITFLOW: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration : Abstract: Serving long-context LLMs is challenging because request lengths and batch composition vary during token generation, causing the memory footprint to fluctuate significantly at runtime. Offlo...
- Building AI Agents to Improve Job Referral Requests to Strangers : Abstract: This paper develops AI agents that help job seekers write effective requests for job referrals in a professional online community. The basic workflow consists of an improver agent that rewri...
- Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models : Abstract: Perceived trustworthiness underpins how users navigate online information, yet it remains unclear whether large language models (LLMs),increasingly embedded in search, recommendation, and co...
- Japanese AI Agent System on Human Papillomavirus Vaccination: System Design : Abstract: Human papillomavirus (HPV) vaccine hesitancy poses significant public health challenges, particularly in Japan where proactive vaccination recommendations were suspended from 2013 to 2021. T...
Research Sources: 262 | Generated: 1/19/2026
