AI Research News Feeds for November 3rd, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers : Abstract: Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy. With the growing adoption of Vision Transformers...
ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning : Abstract: Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most disc...
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding : Abstract: Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the u...
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning : Abstract: Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and im...
Deep Neural Watermarking for Robust Copyright Protection in 3D Point Clouds : Abstract: The protection of intellectual property has become critical due to the rapid growth of three-dimensional content in digital media. Unlike traditional images or videos, 3D point clouds presen...
MapSAM2: Adapting SAM2 for Automatic Segmentation of Historical Map Images and Time Series : Abstract: Historical maps are unique and valuable archives that document geographic features across different time periods. However, automated analysis of historical map images remains a significant c...
Who Made This? Fake Detection and Source Attribution with Diffusion Features : Abstract: The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authentic...
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model : Abstract: Recently, augmenting Vision-Language-Action models (VLAs) with world modeling has shown promise in improving robotic policy learning. However, it remains challenging to jointly predict next-...
NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception : Abstract: Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in coll...
Gaussian Combined Distance: A Generic Metric for Object Detection : Abstract: In object detection, a well-defined similarity metric can significantly enhance model performance. Currently, the IoU-based similarity metric is the most commonly preferred choice for detect...
Deep learning denoising unlocks quantitative insights in operando materials microscopy : Abstract: Operando microscopy provides direct insight into the dynamic chemical and physical processes that govern functional materials, yet measurement noise limits the effective resolution and under...
Vision Transformer for Robust Occluded Person Reidentification in Complex Surveillance Scenes : Abstract: Person re-identification (ReID) in surveillance is challenged by occlusion, viewpoint distortion, and poor image quality. Most existing methods rely on complex modules or perform well only o...
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals : Abstract: Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectori...
LifWavNet: Lifting Wavelet-based Network for Non-contact ECG Reconstruction from Radar : Abstract: Non-contact electrocardiogram (ECG) reconstruction from radar signals offers a promising approach for unobtrusive cardiac monitoring. We present LifWavNet, a lifting wavelet network based on...
Audio-Visual Speech Enhancement In Complex Scenarios With Separation And Dereverberation Joint Modeling : Abstract: Audio-visual speech enhancement (AVSE) is a task that uses visual auxiliary information to extract a target speaker's speech from mixed audio. In real-world scenarios, there often exist comp...
Generative diffusion modeling protocols for improving the Kikuchi pattern indexing in electron back-scatter diffraction : Abstract: Electron back-scatter diffraction (EBSD) has traditionally relied upon methods such as the Hough transform and dictionary Indexing to interpret diffraction patterns and extract crystallograp...
A fragile zero-watermarking method based on dual quaternion matrix decomposition : Abstract: Medical images play a crucial role in assisting diagnosis, remote consultation, and academic research. However, during the transmission and sharing process, they face serious risks of copyri...
SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Style Transfer : Abstract: Recent style transfer problems are still largely dominated by Generative Adversarial Network (GAN) from the perspective of cross-domain image-to-image (I2I) translation, where the pivotal is...
GASP: Gaussian Splatting for Physic-Based Simulations : Abstract: Physics simulation is paramount for modeling and utilizing 3D scenes in various real-world applications. However, integrating with state-of-the-art 3D scene rendering techniques such as Gaus...
EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting : Abstract: Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attemp...
PROFIT: A Specialized Optimizer for Deep Fine Tuning : Abstract: The fine-tuning of pre-trained models has become ubiquitous in generative AI, computer vision, and robotics. Although much attention has been paid to improving the efficiency of fine-tuning ...
MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians : Abstract: Reconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields...
AMD-Hummingbird: Towards an Efficient Text-to-Video Model : Abstract: Text-to-Video (T2V) generation has attracted significant attention for its ability to synthesize realistic videos from textual descriptions. However, existing models struggle to balance comp...
D$^2$USt3R: Enhancing 3D Reconstruction for Dynamic Scenes : Abstract: In this work, we address the task of 3D reconstruction in dynamic scenes, where object motions frequently degrade the quality of previous 3D pointmap regression methods, such as DUSt3R, that...
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation : Abstract: Recent advances in reinforcement learning (RL) have strengthened the reasoning capabilities of vision-language models (VLMs). However, enhancing policy exploration to better scale test-time ...
Panoramic Out-of-Distribution Segmentation for Autonomous Driving : Abstract: Panoramic imaging enables capturing 360{\deg} images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous drivin...
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models : Abstract: World models have recently gained prominence for action-conditioned visual prediction in complex environments. However, relying on only a few recent observations causes them to lose long-ter...
On the Theory of Conditional Feature Alignment for Unsupervised Domain-Adaptive Counting : Abstract: Object counting models suffer when deployed across domains with differing density variety, since density shifts are inherently task-relevant and violate standard domain adaptation assumption...
LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation : Abstract: While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, a...
Augmented Reality-based Guidance with Deformable Registration in Head and Neck Tumor Resection : Abstract: Head and neck squamous cell carcinoma (HNSCC) has one of the highest rates of recurrence cases among solid malignancies. Recurrence rates can be reduced by improving positive margins localiz...
MeisenMeister: A Simple Two Stage Pipeline for Breast Cancer Classification on MRI : Abstract: The ODELIA Breast MRI Challenge 2025 addresses a critical issue in breast cancer screening: improving early detection through more efficient and accurate interpretation of breast MRI scans. ...
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing : Abstract: Existing image editing methods can handle simple editing instructions very well. To deal with complex editing instructions, they often need to jointly fine-tune the large language models (LL...
RzenEmbed: Towards Comprehensive Multimodal Retrieval : Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has extended CLIP-based frameworks to produce powerful, universal embeddings for retrieval tasks. However, existing methods ...
A Hybrid Deep Learning and Forensic Approach for Robust Deepfake Detection : Abstract: The rapid evolution of generative adversarial networks (GANs) and diffusion models has made synthetic media increasingly realistic, raising societal concerns around misinformation, identity ...
E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources : Abstract: Diffusion models have shown strong capabilities in generating high-quality images from text prompts. However, these models often require large-scale training data and significant computation...
From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration : Abstract: Scientific illustrations demand both high information density and post-editability. However, current generative models have two major limitations: Frist, image generation models output raste...
A Multi-tiered Human-in-the-loop Approach for Interactive School Mapping Using Earth Observation and Machine Learning : Abstract: This paper presents a multi-tiered human-in-the-loop framework for interactive school mapping designed to improve the accuracy and completeness of educational facility records, particularly ...
Referee: Reference-aware Audiovisual Deepfake Detection : Abstract: Since deepfakes generated by advanced generative models have rapidly posed serious threats, existing audiovisual deepfake detection approaches struggle to generalize to unseen forgeries. We ...
DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting : Abstract: We present a Directional Consistency (DC)-driven Adaptive Density Control (ADC) for 3D Gaussian Splatting (DC4GS). Whereas the conventional ADC bases its primitive splitting on the magnitude...
SYNAPSE-Net: A Unified Framework with Lesion-Aware Hierarchical Gating for Robust Segmentation of Heterogeneous Brain Lesions : Abstract: Automated segmentation of heterogeneous brain lesions from multi-modal MRI remains a critical challenge in clinical neuroimaging. Current deep learning models are typically specialized `poin...
MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation : Abstract: In this study, we propose MoME, a Mixture of Visual Language Medical Experts, for Medical Image Segmentation. MoME adapts the successful Mixture of Experts (MoE) paradigm, widely used in Lar...
Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning : Abstract: In open-world environments, human-object interactions (HOIs) evolve continuously, challenging conventional closed-world HOI detection models. Inspired by humans' ability to progressively acq...
DeblurSDI: Blind Image Deblurring Using Self-diffusion : Abstract: Blind image deconvolution is a challenging ill-posed inverse problem, where both the latent sharp image and the blur kernel are unknown. Traditional methods often rely on handcrafted priors,...
VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video : Abstract: This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote ph...
AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception : Abstract: This paper presents the Autonomous Driving Segment Anything Model (AD-SAM), a fine-tuned vision foundation model for semantic segmentation in autonomous driving (AD). AD-SAM extends the Segm...
Hierarchical Transformers for Unsupervised 3D Shape Abstraction : Abstract: We introduce HiT, a novel hierarchical neural field representation for 3D shapes that learns general hierarchies in a coarse-to-fine manner across different shape categories in an unsupervis...
WildfireX-SLAM: A Large-scale Low-altitude RGB-D Dataset for Wildfire SLAM and Beyond : Abstract: 3D Gaussian splatting (3DGS) and its subsequent variants have led to remarkable progress in simultaneous localization and mapping (SLAM). While most recent 3DGS-based SLAM works focus on sma...
Improving Cross-view Object Geo-localization: A Dual Attention Approach with Cross-view Interaction and Multi-Scale Spatial Features : Abstract: Cross-view object geo-localization has recently gained attention due to potential applications. Existing methods aim to capture spatial dependencies of query objects between different views ...
HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition : Abstract: Three-dimensional scene generation holds significant potential in gaming, film, and virtual reality. However, most existing methods adopt a single-step generation process, making it difficul...
AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification : Abstract: Remote sensing image scene classification remains a challenging task, primarily due to the complex spatial structures and multi-scale characteristics of ground objects. Existing approaches s...
How Close Are We? Limitations and Progress of AI Models in Banff Lesion Scoring : Abstract: The Banff Classification provides the global standard for evaluating renal transplant biopsies, yet its semi-quantitative nature, complex criteria, and inter-observer variability present sig...
M^3Detection: Multi-Frame Multi-Level Feature Fusion for Multi-Modal 3D Object Detection with Camera and 4D Imaging Radar : Abstract: Recent advances in 4D imaging radar have enabled robust perception in adverse weather, while camera sensors provide dense semantic information. Fusing the these complementary modalities has ...
DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model : Abstract: Recently, diffusion models have shown their impressive ability in visual generation tasks. Besides static images, more and more research attentions have been drawn to the generation of reali...
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles : Abstract: Video identification attacks pose a significant privacy threat that can reveal videos that victims watch, which may disclose their hobbies, religious beliefs, political leanings, sexual orie...
SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping : Abstract: Hyperspectral imaging (HSI) is a vital tool for fine-grained land-use and land-cover (LULC) mapping. However, the inherent heterogeneity of HSI data has long posed a major barrier to develop...
Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery : Abstract: Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of...
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts : Abstract: Recent advances in language and vision have demonstrated that scaling up model capacity consistently improves performance across diverse tasks. In 3D visual geometry reconstruction, large-sc...
Object-IR: Leveraging Object Consistency and Mesh Deformation for Self-Supervised Image Retargeting : Abstract: Eliminating geometric distortion in semantically important regions remains an intractable challenge in image retargeting. This paper presents Object-IR, a self-supervised architecture that r...
Fusion of Heterogeneous Pathology Foundation Models for Whole Slide Image Analysis : Abstract: Whole slide image (WSI) analysis has emerged as an increasingly essential technique in computational pathology. Recent advances in the pathological foundation models (FMs) have demonstrated ...
Rethinking Robust Adversarial Concept Erasure in Diffusion Models : Abstract: Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most ...
Trans-defense: Transformer-based Denoiser for Adversarial Defense with Spatial-Frequency Domain Representation : Abstract: In recent times, deep neural networks (DNNs) have been successfully adopted for various applications. Despite their notable achievements, it has become evident that DNNs are vulnerable to so...
C-LEAD: Contrastive Learning for Enhanced Adversarial Defense : Abstract: Deep neural networks (DNNs) have achieved remarkable success in computer vision tasks such as image classification, segmentation, and object detection. However, they are vulnerable to advers...
Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes : Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities in zero-shot action recognition by learning to associate video embeddings with class embeddings. However, a significan...
RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents : Abstract: Multi-modal Retrieval-Augmented Generation (RAG) has become a critical method for empowering LLMs by leveraging candidate visual documents. However, current methods consider the entire docum...
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration : Abstract: Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models,...
Versatile and Efficient Medical Image Super-Resolution Via Frequency-Gated Mamba : Abstract: Medical image super-resolution (SR) is essential for enhancing diagnostic accuracy while reducing acquisition cost and scanning time. However, modeling both long-range anatomical structures ...
Overcoming Prompts Pool Confusion via Parameterized Prompt for Incremental Object Detection : Abstract: Recent studies have demonstrated that incorporating trainable prompts into pretrained models enables effective incremental learning. However, the application of prompts in incremental object...
SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction : Abstract: Surgical reconstruction of dynamic tissues from endoscopic videos is a crucial technology in robot-assisted surgery. The development of Neural Radiance Fields (NeRFs) has greatly advanced de...
Querying functional and structural niches on spatial transcriptomics data : Abstract: Cells in multicellular organisms coordinate to form functional and structural niches. With spatial transcriptomics enabling gene expression profiling in spatial contexts, it has been reveale...
Supervised Quadratic Feature Analysis: Information Geometry Approach for Dimensionality Reduction : Abstract: Supervised dimensionality reduction maps labeled data into a low-dimensional feature space while preserving class discriminability. A common approach is to maximize a statistical measure of ...
A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees : Abstract: Finding an $\epsilon$-stationary point of a nonconvex function with a Lipschitz continuous Hessian is a central problem in optimization. Regularized Newton methods are a classical tool and h...
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm : Abstract: Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate paramet...
Kernel Mean Embedding Topology: Weak and Strong Forms for Stochastic Kernels and Implications for Model Learning : Abstract: We introduce a novel topology, called Kernel Mean Embedding Topology, for stochastic kernels, in a weak and strong form. This topology, defined on the spaces of Bochner integrable functions ...
Qini Curve Estimation under Clustered Network Interference : Abstract: Qini curves are a widely used tool for assessing treatment policies under allocation constraints as they visualize the incremental gain of a new treatment policy versus the cost of its imple...
DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions : Abstract: We consider the Inverse Optimal Stopping (IOS) problem where, based on stopped expert trajectories, one aims to recover the optimal stopping region through the continuation and stopping gain...
Manifold Learning for Hyperspectral Images : Abstract: Traditional feature extraction and projection techniques, such as Principal Component Analysis, struggle to adequately represent X-Ray Transmission (XRT) Multi-Energy (ME) images, limiting t...
The cell as a token: high-dimensional geometry in language models and cell embeddings : Abstract: Single-cell sequencing technology maps cells to a high-dimensional space encoding their internal activity. Recently-proposed virtual cell models extend this concept, enriching cells' represe...
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving : Abstract: Early-Exit Large Language Models (EE-LLMs) enable high throughput inference by allowing tokens to exit early at intermediate layers. However, their throughput is limited by the computational...
Token Distillation: Attention-aware Input Embeddings For New Tokens : Abstract: Current language models rely on static vocabularies determined at pretraining time, which can lead to decreased performance and increased computational cost for domains underrepresented in t...
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy : Abstract: Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research i...
Conformal Object Detection by Sequential Risk Control : Abstract: Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in safety-critical applications is hindered by the inherent lack of reliability ...
Game Theoretic Resilience Recommendation Framework for CyberPhysical Microgrids Using Hypergraph MetaLearning : Abstract: This paper presents a physics-aware cyberphysical resilience framework for radial microgrids under coordinated cyberattacks. The proposed approach models the attacker through a hypergraph ne...
Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling : Abstract: Hybrid models that combine state space models (SSMs) with attention mechanisms have shown strong performance by leveraging the efficiency of SSMs and the high recall ability of attention. Ho...
Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services : Abstract: With the increasing use of conversational AI systems, there is growing concern over privacy leaks, especially when users share sensitive personal data in interactions with Large Language Mod...
Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral : Abstract: Several previous works concluded that the largest part of generation capabilities of large language models (LLM) are learned (early) during pre-training. However, LLMs still require further ...
Quantitative Intertextuality from the Digital Humanities Perspective: A Survey : Abstract: The connection between texts is referred to as intertextuality in literary theory, which served as an important theoretical basis in many digital humanities studies. Over the past decade, ad...
Recursive numeral systems are highly regular and easy to process : Abstract: Previous work has argued that recursive numeral systems optimise the trade-off between lexicon size and average morphosyntatic complexity (Deni\'c and Szymanik, 2024). However, showing that ...
VISTA Score: Verification In Sequential Turn-based Assessment : Abstract: Hallucination--defined here as generating statements unsupported or contradicted by available evidence or conversational context--remains a major obstacle to deploying conversational AI syst...
LLM-Centric RAG with Multi-Granular Indexing and Confidence Constraints : Abstract: This paper addresses the issues of insufficient coverage, unstable results, and limited reliability in retrieval-augmented generation under complex knowledge environments, and proposes a con...
Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models : Abstract: This paper addresses the limitations of large-scale language models in safety alignment and robustness by proposing a fine-tuning method that combines contrastive distillation with noise-rob...
Characterizing Selective Refusal Bias in Large Language Models : Abstract: Safety guardrails in large language models(LLMs) are developed to prevent malicious users from generating toxic content at a large scale. However, these measures can inadvertently introduce ...
Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks : Abstract: As Natural Language Generation (NLG) continues to be widely adopted, properly assessing it has become quite difficult. Lately, using large language models (LLMs) for evaluating these generat...
Probability Distributions Computed by Hard-Attention Transformers : Abstract: Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strin...
Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+ : Abstract: The URIEL+ linguistic knowledge base supports multilingual research by encoding languages through geographic, genetic, and typological vectors. However, data sparsity remains prevalent, in t...
Identifying the Periodicity of Information in Natural Language : Abstract: Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its ...
A Unified Representation Underlying the Judgment of Large Language Models : Abstract: A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery ...
TransAlign: Machine Translation Encoders are Strong Word Aligners, Too : Abstract: In the absence of sizable training data for most world languages and NLP tasks, translation-based strategies such as translate-test -- evaluating on noisy source language data translated fro...
ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations : Abstract: This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unli...
From the Rock Floor to the Cloud: A Systematic Survey of State-of-the-Art NLP in Battery Life Cycle : Abstract: We present a comprehensive systematic survey of the application of natural language processing (NLP) along the entire battery life cycle, instead of one stage or method, and introduce a nove...
Awal -- Community-Powered Language Technology for Tamazight : Abstract: This paper presents Awal, a community-powered initiative for developing language technology resources for Tamazight. We provide a comprehensive review of the NLP landscape for Tamazight, exa...
Dynamic Affective Memory Management for Personalized LLM Agents : Abstract: Advances in large language models are making personalized AI agents a new research focus. While current agent systems primarily rely on personalized external memory databases to deliver cust...
Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning : Abstract: In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic eva...
The aftermath of compounds: Investigating Compounds and their Semantic Representations : Abstract: This study investigates how well computational embeddings align with human semantic judgments in the processing of English compound words. We compare static word vectors (GloVe) and contextu...
Effect of Domain Generalization Techniques in Low Resource Systems : Abstract: Machine learning models typically assume that training and test data follow the same distribution, an assumption that often fails in real-world scenarios due to distribution shifts. This iss...
SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps : Abstract: We introduce SQLSpace, a human-interpretable, generalizable, compact representation for text-to-SQL examples derived with minimal human intervention. We demonstrate the utility of these repr...
Patient-Centered Summarization Framework for AI Clinical Summarization: A Mixed-Methods Design : Abstract: Large Language Models (LLMs) are increasingly demonstrating the potential to reach human-level performance in generating clinical summaries from patient-clinician conversations. However, the...
Multilingual BERT language model for medical tasks: Evaluation on domain-specific adaptation and cross-linguality : Abstract: In multilingual healthcare applications, the availability of domain-specific natural language processing(NLP) tools is limited, especially for low-resource languages. Although multilingual b...
Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization : Abstract: LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-ed...
MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval : Abstract: Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new informati...
Culture Cartography: Mapping the Landscape of Cultural Knowledge : Abstract: To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-gro...
Evaluating Perspectival Biases in Cross-Modal Retrieval : Abstract: Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically re...
Semantic Frame Aggregation-based Transformer for Live Video Comment Generation : Abstract: Live commenting on video streams has surged in popularity on platforms like Twitch, enhancing viewer engagement through dynamic interactions. However, automatically generating contextually a...
Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions : Abstract: As AI systems become increasingly integrated into human lives, endowing them with robust social intelligence has emerged as a critical frontier. A key aspect of this intelligence is discerni...
SMOL: Professionally translated parallel data for 115 under-represented languages : Abstract: We open-source SMOL (Set of Maximal Overall Leverage), a suite of training data to unlock machine translation for low-resource languages. SMOL has been translated into 124 (and growing) unde...
FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages : Abstract: This paper presents the winning submission of the RaaVa team to the AmericasNLP 2025 Shared Task 3 on Automatic Evaluation Metrics for Machine Translation (MT) into Indigenous Languages of A...
Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning : Abstract: Hybrid LLM architectures that combine Attention and State Space Models (SSMs) achieve state-of-the-art accuracy and runtime performance. Recent work has demonstrated that applying compressio...
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation : Abstract: Vision-Language Models (VLMs) often struggle to balance visual and textual information when summarizing complex multimodal inputs, such as entire TV show episodes. In this paper, we propose ...
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models : Abstract: We introduce the Diffusion Chain of Lateral Thought (DCoLT), a reasoning framework for diffusion language models. DCoLT treats each intermediate step in the reverse diffusion process as a la...
VeriFastScore: Speeding up long-form factuality evaluation : Abstract: Metrics like FactScore and VeriScore that evaluate long-form factuality operate by decomposing an input response into atomic claims and then individually verifying each claim. While effectiv...
Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations : Abstract: Although mathematics is often considered culturally neutral, the way mathematical problems are presented can carry implicit cultural context. Existing benchmarks like GSM8K are predominantly...
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing : Abstract: The rapid advancement of Large Language Models (LLMs) has spurred significant progress in Large Speech-Language Models (LSLMs), enhancing their capabilities in both speech understanding and ...
RADAR: Benchmarking Language Models on Imperfect Tabular Data : Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle da...
PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT : Abstract: Quantitative computed tomography (QCT) plays a crucial role in assessing bone strength and fracture risk by enabling volumetric analysis of bone density distribution in the proximal femur. H...
Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion : Abstract: This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA) to integrate an ensemble of diverse machine learning models, ach...
Quantitative Bounds for Length Generalization in Transformers : Abstract: We study the problem of length generalization (LG) in transformers: the ability of a model trained on shorter sequences to maintain performance when evaluated on much longer, previously unse...
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning : Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiab...
MLPerf Automotive : Abstract: We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a...
Towards Understanding Self-play for LLM Reasoning : Abstract: Recent advances in large language model (LLM) reasoning, led by reinforcement learning with verifiable rewards (RLVR), have inspired self-play post-training, where models improve by generati...
Functional embeddings enable Aggregation of multi-area SEEG recordings over subjects and sessions : Abstract: Aggregating intracranial recordings across subjects is challenging since electrode count, placement, and covered regions vary widely. Spatial normalization methods like MNI coordinates offer...
Hierarchical Bayesian Model for Gene Deconvolution and Functional Analysis in Human Endometrium Across the Menstrual Cycle : Abstract: Bulk tissue RNA sequencing of heterogeneous samples provides averaged gene expression profiles, obscuring cell type-specific dynamics. To address this, we present a probabilistic hierarchica...
Group-Sensitive Offline Contextual Bandits : Abstract: Offline contextual bandits allow one to learn policies from historical/offline data without requiring online interaction. However, offline policy optimization that maximizes overall expected...
AI Agents in Drug Discovery : Abstract: Artificial intelligence (AI) agents are emerging as transformative tools in drug discovery, with the ability to autonomously reason, act, and learn through complicated research workflows. Bu...
Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring : Abstract: This study explored the utilities of rationales generated by GPT-4.1 and GPT-5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data. Essay-based scoring was compared wit...
FairAD: Computationally Efficient Fair Graph Clustering via Algebraic Distance : Abstract: Due to the growing concern about unsavory behaviors of machine learning models toward certain demographic groups, the notion of 'fairness' has recently drawn much attention from the communit...
Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity Scores : Abstract: Database Management Systems (DBMSs) are fundamental for managing large-scale and heterogeneous data, and their performance is critically influenced by configuration parameters. Effective tun...
A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions : Abstract: In this paper, we study the problem of online sparse linear regression (OSLR) where the algorithms are restricted to accessing only $k$ out of $d$ attributes per instance for prediction, whi...
SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML Inference : Abstract: Dynamic offloading of Machine Learning (ML) model partitions across different resource orchestration services, such as Function-as-a-Service (FaaS) and Infrastructure-as-a-Service (IaaS), ca...
MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting : Abstract: Traffic accidents represent a critical public health challenge, claiming over 1.35 million lives annually worldwide. Traditional accident prediction models treat road segments independently,...
FedSM: Robust Semantics-Guided Feature Mixup for Bias Reduction in Federated Learning with Long-Tail Data : Abstract: Federated Learning (FL) enables collaborative model training across decentralized clients without sharing private data. However, FL suffers from biased global models due to non-IID and long-...
ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models : Abstract: Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low e...
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction : Abstract: Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test d...
Temporal Cardiovascular Dynamics for Improved PPG-Based Heart Rate Estimation : Abstract: The oscillations of the human heart rate are inherently complex and non-linear -- they are best described by mathematical chaos, and they present a challenge when applied to the practical do...
Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift : Abstract: With the growing volume of Internet of Things (IoT) network traffic, machine learning (ML)-based anomaly detection is more relevant than ever. Traditional batch learning models face challeng...
MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data : Abstract: The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to ad...
Reasoning Models Sometimes Output Illegible Chains of Thought : Abstract: Language models trained via outcome-based reinforcement learning (RL) to reason using chain-of-thought (CoT) have shown remarkable performance. Monitoring such a model's CoT may allow us to ...
MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss : Abstract: Understanding post-wildfire vegetation loss is critical for developing effective ecological recovery strategies and is often challenging due to the extended time and effort required to captu...
Spectral Neural Graph Sparsification : Abstract: Graphs are central to modeling complex systems in domains such as social networks, molecular chemistry, and neuroscience. While Graph Neural Networks, particularly Graph Convolutional Networ...
Simplex-to-Euclidean Bijections for Categorical Flow Matching : Abstract: We propose a method for learning and sampling from probability distributions supported on the simplex. Our approach maps the open simplex to Euclidean space via smooth bijections, leveraging...
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs : Abstract: The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax=b, where effective preconditioners are crucial for fast convergence. Tr...
Active transfer learning for structural health monitoring : Abstract: Data for training structural health monitoring (SHM) systems are often expensive and/or impractical to obtain, particularly for labelled data. Population-based SHM (PBSHM) aims to address th...
AstuteRAG-FQA: Task-Aware Retrieval-Augmented Generation Framework for Proprietary Data Challenges in Financial Question Answering : Abstract: Retrieval-Augmented Generation (RAG) shows significant promise in knowledge-intensive tasks by improving domain specificity, enhancing temporal relevance, and reducing hallucinations. Howeve...
ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling : Abstract: Formulating optimization problems for industrial applications demands significant manual effort and domain expertise. While Large Language Models (LLMs) show promise in automating this proce...
Panprediction: Optimal Predictions for Any Downstream Task and Loss : Abstract: Supervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model trai...
Imbalanced Classification through the Lens of Spurious Correlations : Abstract: Class imbalance poses a fundamental challenge in machine learning, frequently leading to unreliable classification performance. While prior methods focus on data- or loss-reweighting schemes...
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models : Abstract: The demand for efficient natural language processing (NLP) systems has led to the development of lightweight language models. Previous work in this area has primarily focused on manual desig...
Diabetes Lifestyle Medicine Treatment Assistance Using Reinforcement Learning : Abstract: Type 2 diabetes prevention and treatment can benefit from personalized lifestyle prescriptions. However, the delivery of personalized lifestyle medicine prescriptions is limited by the short...
A Machine Learning-Based Framework to Shorten the Questionnaire for Assessing Autism Intervention : Abstract: Caregivers of individuals with autism spectrum disorder (ASD) often find the 77-item Autism Treatment Evaluation Checklist (ATEC) burdensome, limiting its use for routine monitoring. This st...
Towards Gaussian processes modelling to study the late effects of radiotherapy in children and young adults with brain tumours : Abstract: Survivors of childhood cancer need lifelong monitoring for side effects from radiotherapy. However, longitudinal data from routine monitoring is often infrequently and irregularly sampled, a...
Toward precision soil health: A regional framework for site-specific management across Missouri : Abstract: Effective soil health management is crucial for sustaining agriculture, adopting ecosystem resilience, and preserving water quality. However, Missouri's diverse landscapes limit the effectiv...
Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition : Abstract: Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable ...
Are Online Sports Fan Communities Becoming More Offensive? A Quantitative Review of Topics, Trends, and Toxicity of r/PremierLeague : Abstract: Online communities for sports fans have surged in popularity, with Reddit's r/PremierLeague emerging as a focal point for fans of one of the globe's most celebrated sports leagues. This boom...
Domain decomposition architectures and Gauss-Newton training for physics-informed neural networks : Abstract: Approximating the solutions of boundary value problems governed by partial differential equations with neural networks is challenging, largely due to the difficult training process. This dif...
GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction : Abstract: Multimodal approaches that integrate protein structure and sequence have achieved remarkable success in protein-protein interface prediction. However, extending these methods to protein-pept...
Accelerating Radiative Transfer for Planetary Atmospheres by Orders of Magnitude with a Transformer-Based Machine Learning Model : Abstract: Radiative transfer calculations are essential for modeling planetary atmospheres. However, standard methods are computationally demanding and impose accuracy-speed trade-offs. High computati...
Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications : Abstract: This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution,...
Learning Generalizable Visuomotor Policy through Dynamics-Alignment : Abstract: Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models hav...
SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories : Abstract: Objective: ServiMon is designed to offer a scalable and intelligent pipeline for data collection and auditing to monitor distributed astronomical systems such as the ASTRI Mini-Array. The sy...
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis : Abstract: In medical imaging, vision-language models face a critical duality: pretrained networks offer broad robustness but lack subtle, modality-specific characteristics, while fine-tuned expert mod...
Traceable Drug Recommendation over Medical Knowledge Graphs : Abstract: Drug recommendation (DR) systems aim to support healthcare professionals in selecting appropriate medications based on patients' medical conditions. State-of-the-art approaches utilize deep ...
When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making : Abstract: We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limi...
Pairwise and Attribute-Aware Decision Tree-Based Preference Elicitation for Cold-Start Recommendation : Abstract: Recommender systems (RSs) are intelligent filtering methods that suggest items to users based on their inferred preferences, derived from their interaction history on the platform. Collabora...
FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning : Abstract: Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Additi...
On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields : Abstract: Flow Matching (FM) method in generative modeling maps arbitrary probability distributions by constructing an interpolation between them and then learning the vector field that defines ODE fo...
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds : Abstract: Modality alignment is critical for vision-language models (VLMs) to effectively integrate information across modalities. However, existing methods extract hierarchical features from text whi...
Interpretable Model-Aware Counterfactual Explanations for Random Forest : Abstract: Despite their enormous predictive power, machine learning models are often unsuitable for applications in regulated industries such as finance, due to their limited capacity to provide expla...
Estimation of aboveground biomass in a tropical dry forest: An intercomparison of airborne, unmanned, and space laser scanning : Abstract: According to the Paris Climate Change Agreement, all nations are required to submit reports on their greenhouse gas emissions and absorption every two years by 2024. Consequently, forests pl...
Minimax-Optimal Two-Sample Test with Sliced Wasserstein : Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promis...
pDANSE: Particle-based Data-driven Nonlinear State Estimation from Nonlinear Measurements : Abstract: We consider the problem of designing a data-driven nonlinear state estimation (DANSE) method that uses (noisy) nonlinear measurements of a process whose underlying state transition model (ST...
Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite Networks : Abstract: The rise of ultra-dense LEO constellations creates a complex and asynchronous network environment, driven by their massive scale, dynamic topologies, and significant delays. This unique comp...
BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization : Abstract: Transformer-based architectures have advanced text summarization, yet their quadratic complexity limits scalability on long documents. This paper introduces BiSparse-AAS (Bilinear Sparse Att...
Representing Classical Compositions through Implication-Realization Temporal-Gestalt Graphs : Abstract: Understanding the structural and cognitive underpinnings of musical compositions remains a key challenge in music theory and computational musicology. While traditional methods focus on harm...
Optimal Convergence Analysis of DDPM for General Distributions : Abstract: Score-based diffusion models have achieved remarkable empirical success in generating high-quality samples from target data distributions. Among them, the Denoising Diffusion Probabilistic M...
Image Hashing via Cross-View Code Alignment in the Age of Foundation Models : Abstract: Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor s...
Learned Static Function Data Structures : Abstract: We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Com...
Enhancing software product lines with machine learning components : Abstract: Modern software systems increasingly integrate machine learning (ML) due to its advancements and ability to enhance data-driven decision-making. However, this integration introduces signific...
SpecAttn: Speculating Sparse Attention : Abstract: Large Language Models (LLMs) face significant computational bottlenecks during inference due to the quadratic complexity of self-attention mechanisms, particularly as context lengths increas...
Bayesian Optimization on Networks : Abstract: This paper studies optimization on networks modeled as metric graphs. Motivated by applications where the objective function is expensive to evaluate or only available as a black box, we dev...
Bayesian model selection and misspecification testing in imaging inverse problems only from noisy and partial measurements : Abstract: Modern imaging techniques heavily rely on Bayesian statistical models to address difficult image reconstruction and restoration tasks. This paper addresses the objective evaluation of such m...
On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection : Abstract: Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code v...
Dark-Field X-Ray Imaging Significantly Improves Deep-Learning based Detection of Synthetic Early-Stage Lung Tumors in Preclinical Models : Abstract: Low-dose computed tomography (LDCT) is the current standard for lung cancer screening, yet its adoption and accessibility remain limited. Many regions lack LDCT infrastructure, and even amon...
Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence : Abstract: Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, t...
Accelerated Rates between Stochastic and Adversarial Online Convex Optimization : Abstract: Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental intere...
Decoding Virtual Healthcare Success through Knowledge-Aware and Multimodal Predictive Modeling : Abstract: Online healthcare consultations have transformed how patients seek medical advice, offering convenience while introducing new challenges for ensuring consultation success. Predicting whether...
Scaling Tractable Probabilistic Circuits: A Systems Perspective : Abstract: Probabilistic Circuits (PCs) are a general framework for tractable deep generative models, which support exact and efficient probabilistic inference on their learned distributions. Recent mo...
Reevaluating Theoretical Analysis Methods for Optimization in Deep Learning : Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on ...
Convergence of continuous-time stochastic gradient descent with applications to deep neural networks : Abstract: We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general suffi...
Swing-by Dynamics in Concept Learning and Compositional Generalization : Abstract: Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalizatio...
DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias : Abstract: In-context learning (ICL) is the remarkable ability displayed by some machine learning models to learn from examples provided in a user prompt without any model parameter updates. ICL was fi...
AERO: Entropy-Guided Framework for Private LLM Inference : Abstract: Privacy-preserving computation enables language model inference directly on encrypted data yet suffers from prohibitive latency and communication overheads, primarily due to nonlinear functi...
Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems : Abstract: Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian ...
Model Inversion Attacks: A Survey of Approaches and Countermeasures : Abstract: The success of deep neural networks has driven numerous research studies and applications from Euclidean to non-Euclidean data. However, there are increasing concerns about privacy leakage, ...
Resource-Adaptive Successive Doubling for Hyperparameter Optimization with Large Datasets on High-Performance Computing Systems : Abstract: On High-Performance Computing (HPC) systems, several hyperparameter configurations can be evaluated in parallel to speed up the Hyperparameter Optimization (HPO) process. State-of-the-art HP...
Byzantine Resilient Federated Multi-Task Representation Learning : Abstract: In this paper, we propose BR-MTRL, a Byzantine-resilient multi-task representation learning framework that handles faulty or malicious agents. Our approach leverages representation learning ...
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models : Abstract: Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing chall...
Benchmarking Ultra-Low-Power $\mu$NPUs : Abstract: Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has spa...
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models : Abstract: Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their perfo...
Kernel conditional tests from learning-theoretic bounds : Abstract: We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct statistical tests of functionals of conditional distributions. These te...
PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design : Abstract: Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create hig...
Geometry-Aware Edge Pooling for Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have shown significant success for graph-based tasks. Motivated by the prevalence of large datasets in real-world applications, pooling layers are crucial compon...
Graph Semi-Supervised Learning for Point Classification on Data Manifolds : Abstract: We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional...
RTNinja: a generalized machine learning framework for analyzing random telegraph noise signals in nanoelectronic devices : Abstract: Random telegraph noise is a prevalent variability phenomenon in nanoelectronic devices, arising from stochastic carrier exchange at defect sites and critically impacting device reliability a...
Hankel Singular Value Regularization for Highly Compressible State Space Models : Abstract: Deep neural networks using state space models as layers are well suited for long-range sequence tasks but can be challenging to compress after training. We use that regularizing the sum of H...
Towards a Generalizable AI for Materials Discovery: Validation through Immersion Coolant Screening : Abstract: Artificial intelligence (AI) has emerged as a powerful accelerator of materials discovery, yet most existing models remain problem-specific, requiring additional data collection and retraini...
Adversarially robust clustering with optimality guarantees : Abstract: We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are us...
RObotic MAnipulation Network (ROMAN) -- Hybrid Hierarchical Learning for Solving Complex Sequential Tasks : Abstract: Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulat...
ESTformer: Transformer utilising spatiotemporal dependencies for electroencephalogram super-resolution : Abstract: Towards practical applications of Electroencephalography (EEG), lightweight acquisition devices garner significant attention. However, EEG channel selection methods are commonly data-sensiti...
Agnostic Tomography of Stabilizer Product States : Abstract: We define a quantum learning task called agnostic tomography, where given copies of an arbitrary state $\rho$ and a class of quantum states $\mathcal{C}$, the goal is to output a succinct de...
Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces : Abstract: This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the no...
Face Spoofing Detection using Deep Learning : Abstract: Digital image spoofing has emerged as a significant security threat in biometric authentication systems, particularly those relying on facial recognition. This study evaluates the performanc...
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations : Abstract: Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained...
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models : Abstract: The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has mot...
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm : Abstract: Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs). However, existing PTQ methods generally suffer from cr...
DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers : Abstract: Existing machine unlearning (MU) approaches exhibit significant sensitivity to hyperparameters, requiring meticulous tuning that limits practical deployment. In this work, we first empirical...
Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness : Abstract: Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raise...
Variational Visual Question Answering for Uncertainty-Aware Selective Prediction : Abstract: Despite remarkable progress in recent years, vision language models (VLMs) remain prone to overconfidence and hallucinations on tasks such as Visual Question Answering (VQA) and Visual Reaso...
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training : Abstract: The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cor...
GAIA: A Foundation Model for Operational Atmospheric Dynamics : Abstract: We introduce GAIA (Geospatial Artificial Intelligence for Atmospheres), a hybrid self-supervised geospatial foundation model that fuses Masked Autoencoders (MAE) with self-distillation with ...
Rethinking Metrics and Benchmarks of Video Anomaly Detection : Abstract: Video Anomaly Detection (VAD), which aims to detect anomalies that deviate from expectation, has attracted increasing attention in recent years. Existing advancements in VAD primarily focus ...
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA : Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suf...
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting : Abstract: Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reve...
Artificial Empathy: AI based Mental Health : Abstract: Many people suffer from mental health problems but not everyone seeks professional help or has access to mental health care. AI chatbots have increasingly become a go-to for individuals who ...
Accelerating Diffusion LLMs via Adaptive Parallel Decoding : Abstract: The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretic...
PoLAR: Polar-Decomposed Low-Rank Adapter Representation : Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To miti...
UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare : Abstract: Healthcare providers often divide patient populations into cohorts based on shared clinical factors, such as medical history, to deliver personalized healthcare services. This idea has also ...
Large Language Models for Combinatorial Optimization of Design Structure Matrix : Abstract: In complex engineering systems, the dependencies among components or development activities are often modeled and analyzed using Design Structure Matrix (DSM). Reorganizing elements within a...
Graph Diffusion that can Insert and Delete : Abstract: Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation by systematically removing structural noise...
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning : Abstract: Large language models (LLMs) continually evolve through pre-training on ever-expanding web data, but this adaptive process also exposes them to subtle forms of misinformation. While prior wo...
SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation : Abstract: Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models re...
Integrating Ontologies with Large Language Models for Enhanced Control Systems in Chemical Engineering : Abstract: This work presents an ontology-integrated large language model (LLM) framework for chemical engineering that unites structured domain knowledge with generative reasoning. The proposed pipeli...
Discovering EV Charging Site Archetypes Through Few Shot Forecasting: The First U.S.-Wide Study : Abstract: The decarbonization of transportation relies on the widespread adoption of electric vehicles (EVs), which requires an accurate understanding of charging behavior to ensure cost-effective, gr...
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) have exhibited remarkable progress. However, deficiencies remain compared to human intelligence, such as hallucination and shallow pattern matching. In t...
Predicting Household Water Consumption Using Satellite and Street View Images in Two Indian Cities : Abstract: Monitoring household water use in rapidly urbanizing regions is hampered by costly, time-intensive enumeration methods and surveys. We investigate whether publicly available imagery-satellit...
HADSF: Aspect Aware Semantic Control for Explainable Recommendation : Abstract: Recent advances in large language models (LLMs) promise more effective information extraction for review-based recommender systems, yet current methods still (i) mine free-form reviews witho...
Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules : Abstract: Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and und...
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems : Abstract: Mixture-of-Experts (MoE) models improve transformer efficiency but lack a unified theoretical explanation, especially when both feed-forward and attention layers are allowed to specialize. T...
Thought Branches: Interpreting LLM Reasoning Requires Resampling : Abstract: Most work interpreting reasoning models studies only a single chain-of-thought (CoT), yet these models define distributions over many possible CoTs. We argue that studying a single sample is...
FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models : Abstract: AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly ap...
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames : Abstract: Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. Th...
DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm : Abstract: To prevent inference attacks in Federated Learning (FL) and reduce the leakage of sensitive information, Client-level Differentially Private Federated Learning (CL-DPFL) is widely used. Howe...
Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation : Abstract: Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major chal...
Leveraging Generic Time Series Foundation Models for EEG Classification : Abstract: Foundation models for time series are emerging as powerful general-purpose backbones, yet their potential for domain-specific biomedical signals such as electroencephalography (EEG) remains ...
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control : Abstract: Large Language Models (LLMs) training is prohibitively expensive, driving interest in low-precision fully-quantized training (FQT). While novel 4-bit formats like NVFP4 offer substantial eff...
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models : Abstract: We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects. While recently developed Arabic and multilingual benchm...
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities : Abstract: Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, the...
Sybil-Resistant Service Discovery for Agent Economies : Abstract: x402 enables Hypertext Transfer Protocol (HTTP) services like application programming interfaces (APIs), data feeds, and inference providers to accept cryptocurrency payments for access. As ...
Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs : Abstract: This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combini...
CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments : Abstract: As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functio...
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum : Abstract: The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is...
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning : Abstract: Spatial understanding remains a weakness of Large Vision-Language Models (LVLMs). Existing supervised fine-tuning (SFT) and recent reinforcement learning with verifiable rewards (RLVR) pipel...
Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models : Abstract: Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to deve...
Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation : Abstract: Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways t...
VessShape: Few-shot 2D blood vessel segmentation by leveraging shape priors from synthetic images : Abstract: Semantic segmentation of blood vessels is an important task in medical image analysis, but its progress is often hindered by the scarcity of large annotated datasets and the poor generalizat...
Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition : Abstract: Modern deep neural networks (DNNs) are typically trained with a global cross-entropy loss in a supervised end-to-end manner: neurons need to store their outgoing weights; training alternates...
Community Detection on Model Explanation Graphs for Explainable AI : Abstract: Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions but often miss higher-order structure: sets of features that act in concert. We propose Modules of Influence (Mo...
Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems : Abstract: In the rapidly evolving field of multi-agent reinforcement learning (MARL), understanding the dynamics of open systems is crucial. Openness in MARL refers to the dynam-ic nature of agent pop...
PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting : Abstract: Recent advances in vision-language models (VLMs) have enabled impressive multimodal reasoning, yet most medical applications remain limited to 2D imaging. In this work, we extend VLMs to 3D ...
Continuous Autoregressive Language Models : Abstract: The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design...
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods : Abstract: Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort req...
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models : Abstract: Semantic interpretability in Reinforcement Learning (RL) enables transparency and verifiability of decision-making. Achieving semantic interpretability in reinforcement learning requires (1)...
A Framework for Objective-Driven Dynamical Stochastic Fields : Abstract: Fields offer a versatile approach for describing complex systems composed of interacting and dynamic components. In particular, some of these dynamical and stochastic systems may exhibit goa...
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning : Abstract: Recent studies have shown that reinforcement learning with verifiable rewards (RLVR) enhances overall accuracy (pass@1) but often fails to improve capability (pass@k) of LLMs in reasoning ta...
Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning : Abstract: Current advances in AI and its applicability have highlighted the need to ensure its trustworthiness for legal, ethical, and even commercial reasons. Sub-symbolic machine learning algorithms...
Don't throw the baby out with the bathwater: How and why deep learning for ARC : Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) presents a formidable challenge for AI systems. Despite the typically low performance on ARC, the deep learning paradigm remains the most effec...
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration : Abstract: Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. How...
HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search : Abstract: Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipeline...
Red Teaming AI Red Teaming : Abstract: Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI...
Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind : Abstract: We report a structural convergence among four influential theories of mind: Kahneman's dual-system theory, Friston's predictive processing, Minsky's society of mind, and Clark's extended min...
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey : Abstract: As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and i...
Continual Vision-and-Language Navigation : Abstract: Developing Vision-and-Language Navigation (VLN) agents typically assumes a \textit{train-once-deploy-once} strategy, which is unrealistic as deployed agents continually encounter novel envir...
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher : Abstract: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to ...
RepoMasterEval: Evaluating Code Completion via Real-World Repositories : Abstract: With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on ...
Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI : Abstract: Explainable AI (XAI) aims to make AI systems more transparent, yet many practices emphasise mathematical rigour over practical user needs. We propose an alternative to this model-centric app...
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks : Abstract: Direct alignment algorithms have proven an effective step for aligning language models to human-desired behaviors. Current variants of the Direct Preference Optimization objective have focus...
A Systematic Literature Review of Spatio-Temporal Graph Neural Network Models for Time Series Forecasting and Classification : Abstract: In recent years, spatio-temporal graph neural networks (GNNs) have attracted considerable interest in the field of time series analysis, due to their ability to capture, at once, dependencie...
Representative Social Choice: From Learning Theory to AI Alignment : Abstract: Social choice theory is the study of preference aggregation across a population, used both in mechanism design for human agents and in the democratic alignment of language models. In this st...
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models : Abstract: Mixture of experts (MoE) architectures have become a cornerstone for scaling up and are a key component in most large language models such as GPT-OSS, DeepSeek-V3, Llama-4, and Gemini-2.5. H...
Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization : Abstract: The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Exis...
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents : Abstract: With the integration of large language models (LLMs), embodied agents have strong capabilities to understand and plan complicated natural language instructions. However, a foreseeable issue ...
Multilingual State Space Models for Structured Question Answering in Indic Languages : Abstract: The diversity and complexity of Indic languages present unique challenges for natural language processing (NLP) tasks, particularly in the domain of question answering (QA).To address these ...
On-device Computation of Single-lead ECG Parameters for Real-time Remote Cardiac Health Assessment: A Real-world Validation Study : Abstract: Accurate, continuous out-of-hospital electrocardiogram (ECG) parameter measurement is vital for real-time cardiac health monitoring and telemedicine. On-device computation of single-lead ECG...
Training a Generally Curious Agent : Abstract: Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information g...
Scalable Best-of-N Selection for Large Language Models via Self-Certainty : Abstract: Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs) through increased test-time computation. Current state-of-the-art methods often...
Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing : Abstract: This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded pert...
(How) Do Language Models Track State? : Abstract: Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that seem to require tracking the unobserved state of an evolving world. How do they do this? W...
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models : Abstract: Automatically annotating job data with standardized occupations from taxonomies, known as occupation classification, is crucial for labor market analysis. However, this task is often hindere...
Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics : Abstract: Accurate emotion recognition is pivotal for nuanced and engaging human-computer interactions, yet remains difficult to achieve, especially in dynamic, conversation-like settings. In this stu...
Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines : Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents wit...
Discriminative Rule Learning for Outcome-Guided Process Model Discovery : Abstract: Event logs extracted from information systems offer a rich foundation for understanding and improving business processes. In many real-world applications, it is possible to distinguish betwe...
An In-depth Study of LLM Contributions to the Bin Packing Problem : Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based gen...
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use : Abstract: Recently, large language models (LLMs) have demonstrated remarkable problem-solving capabilities by autonomously integrating with external tools for collaborative reasoning. However, due to ...
Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints : Abstract: Modelling pedestrian-driver interactions is critical for understanding human road user behaviour and developing safe autonomous vehicle systems. Existing approaches often rely on rule-based ...
Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry : Abstract: A fundamental bottleneck in human-AI collaboration is the "intention expression gap," the difficulty for humans to effectively convey complex, high-dimensional thoughts to AI. This challenge...
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains : Abstract: Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like ``overthinking'' simple problems and ``underthinking'' complex ones. Whi...
GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language : Abstract: Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face cha...
Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance : Abstract: Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets...
SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning : Abstract: Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely ...
InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research : Abstract: AI agents could accelerate scientific discovery by automating hypothesis formation, experiment design, coding, execution, and analysis, yet existing benchmarks probe narrow skills in simplif...
VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation : Abstract: Automation of Register Transfer Level (RTL) design can help developers meet increasing computational demands. Large Language Models (LLMs) show promise for Hardware Description Language (HDL...
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning : Abstract: Multimodal large language models (MLLMs) have advanced embodied agents by enabling direct perception, reasoning, and planning task-oriented actions from visual inputs. However, such vision d...
Validity Is What You Need : Abstract: While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist ...
Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training : Abstract: Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them t...
MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design : Abstract: Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representation...
A Neural Architecture Search Method using Auxiliary Evaluation Metric based on ResNet Architecture : Abstract: This paper proposes a neural architecture search space using ResNet as a framework, with search objectives including parameters for convolution, pooling, fully connected layers, and connecti...
A Transformer-based Neural Architecture Search Method : Abstract: This paper presents a neural architecture search method based on Transformer architecture, searching cross multihead attention computation ways for different number of encoder and decoder co...
Detecting Prefix Bias in LLM-based Reward Models : Abstract: Reinforcement Learning with Human Feedback (RLHF) has emerged as a key paradigm for task-specific fine-tuning of language models using human preference data. While numerous publicly availabl...
VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus : Abstract: We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner...
EARS-UDE: Evaluating Auditory Response in Sensory Overload with Universal Differential Equations : Abstract: Auditory sensory overload affects 50-70% of individuals with Autism Spectrum Disorder (ASD), yet existing approaches, such as mechanistic models (Hodgkin Huxley type, Wilson Cowan, excitatio...
Reinforcement Learning for Accelerator Beamline Control: a simulation-based approach : Abstract: Particle accelerators play a pivotal role in advancing scientific research, yet optimizing beamline configurations to maximize particle transmission remains a labor-intensive task requiring ...
Impact of clinical decision support systems (cdss) on clinical outcomes and healthcare delivery in low- and middle-income countries: protocol for a systematic review and meta-analysis : Abstract: Clinical decision support systems (CDSS) are used to improve clinical and service outcomes, yet evidence from low- and middle-income countries (LMICs) is dispersed. This protocol outlines me...
Systematic Absence of Low-Confidence Nighttime Fire Detections in VIIRS Active Fire Product: Evidence of Undocumented Algorithmic Filtering : Abstract: The Visible Infrared Imaging Radiometer Suite (VIIRS) active fire product is widely used for global fire monitoring, yet its confidence classification scheme exhibits an undocumented systema...
GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment : Abstract: Dance-to-music (D2M) generation aims to automatically compose music that is rhythmically and temporally aligned with dance movements. Existing methods typically rely on coarse rhythm embeddi...
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement : Abstract: Unlike existing methods that rely on source images as appearance references and use source speech to generate motion, this work proposes a novel approach that directly extracts information f...
Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features : Abstract: Speech Emotion Recognition (SER) is a key affective computing technology that enables emotionally intelligent artificial intelligence. While SER is challenging in general, it is particularly...
LeMat-Synth: a multi-modal toolbox to curate broad synthesis procedure databases from scientific literature : Abstract: The development of synthesis procedures remains a fundamental challenge in materials discovery, with procedural knowledge scattered across decades of scientific literature in unstructured fo...
R3GAN-based Optimal Strategy for Augmenting Small Medical Dataset : Abstract: Medical image analysis often suffers from data scarcity and class imbalance, limiting the effectiveness of deep learning models in clinical applications. Using human embryo time-lapse imagin...
VISAT: Benchmarking Adversarial and Distribution Shift Robustness in Traffic Sign Recognition with Visual Attributes : Abstract: We present VISAT, a novel open dataset and benchmarking suite for evaluating model robustness in the task of traffic sign recognition with the presence of visual attributes. Built upon the M...
Diffusion-Driven Generation of Minimally Preprocessed Brain MRI : Abstract: The purpose of this study is to present and compare three denoising diffusion probabilistic models (DDPMs) that generate 3D $T_1$-weighted MRI human brain images. Three DDPMs were trained us...
Category-Aware Semantic Caching for Heterogeneous LLM Workloads : Abstract: LLM serving systems process heterogeneous query workloads where different categories exhibit different characteristics. Code queries cluster densely in embedding space while conversational q...
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification : Abstract: Community-driven Text-to-SQL evaluation platforms play a pivotal role in tracking the state of the art of Text-to-SQL performance. The reliability of the evaluation process is critical for d...
Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility : Abstract: Federated Learning (FL) enables collaborative model training without data sharing, yet participants face a fundamental challenge, e.g., simultaneously ensuring fairness across demographic gr...
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs : Abstract: Speculative decoding has become a widely adopted as an effective technique for lossless inference acceleration when deploying large language models (LLMs). While on-the-fly self-speculative ...
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token : Abstract: Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings to bypass safety guardrails. While these g...
Leveraging Foundation Models for Enhancing Robot Perception and Action : Abstract: This thesis investigates how foundation models can be systematically leveraged to enhance robotic capabilities, enabling more effective localization, interaction, and manipulation in unstruc...
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench : Abstract: Reading measurement instruments is effortless for humans and requires relatively little domain expertise, yet it remains surprisingly challenging for current vision-language models (VLMs) as...
BI-DCGAN: A Theoretically Grounded Bayesian Framework for Efficient and Diverse GANs : Abstract: Generative Adversarial Networks (GANs) are proficient at generating synthetic data but continue to suffer from mode collapse, where the generator produces a narrow range of outputs that fool...
How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison : Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produc...
Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence : Abstract: Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams fr...
Scale-Aware Curriculum Learning for Ddata-Efficient Lung Nodule Detection with YOLOv11 : Abstract: Lung nodule detection in chest CT is crucial for early lung cancer diagnosis, yet existing deep learning approaches face challenges when deployed in clinical settings with limited annotated ...
RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification : Abstract: As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-...
Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction : Abstract: Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-...
LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks : Abstract: The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and security breaches. Artificial Intelligence...
Can machines think efficiently? : Abstract: The Turing Test is no longer adequate for distinguishing human and machine intelligence. With advanced artificial intelligence systems already passing the original Turing Test and contributi...
Using Salient Object Detection to Identify Manipulative Cookie Banners that Circumvent GDPR : Abstract: The main goal of this paper is to study how often cookie banners that comply with the General Data Protection Regulation (GDPR) contain aesthetic manipulation, a design tactic to draw users'...
Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence : Abstract: We introduce a methodology for the identification of notifiable events in the domain of healthcare. The methodology harnesses semantic frames to define fine-grained patterns and search them ...
Overview of the MEDIQA-OE 2025 Shared Task on Medical Order Extraction from Doctor-Patient Consultations : Abstract: Clinical documentation increasingly uses automatic speech recognition and summarization, yet converting conversations into actionable medical orders for Electronic Health Records remains une...
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget : Abstract: This work tackles a critical challenge in AI safety research under limited compute: given a fixed computation budget, how can one maximize the strength of iterative adversarial attacks? Coar...
LLMs are Overconfident: Evaluating Confidence Interval Calibration with FermiEval : Abstract: Large language models (LLMs) excel at numerical estimation but struggle to correctly quantify uncertainty. We study how well LLMs construct confidence intervals around their own answers and ...
AIOT based Smart Education System: A Dual Layer Authentication and Context-Aware Tutoring Framework for Learning Environments : Abstract: The AIoT-Based Smart Education System integrates Artificial Intelligence and IoT to address persistent challenges in contemporary classrooms: attendance fraud, lack of personalization, stude...
A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms : Abstract: Multi-armed bandit (MAB) problems serve as a fundamental building block for more complex reinforcement learning algorithms. However, evaluating and comparing MAB algorithms remains challengi...
Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase : Abstract: While world models are increasingly positioned as a pathway to overcoming data scarcity in domains such as robotics, open training infrastructure for world modeling remains nascent. We intro...
A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics : Abstract: Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especia...
Elastic Architecture Search for Efficient Language Models : Abstract: As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significa...
Dataset Creation and Baseline Models for Sexism Detection in Hausa : Abstract: Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, th...
Detecting Data Contamination in LLMs via In-Context Learning : Abstract: We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes betwe...
Consistency Training Helps Stop Sycophancy and Jailbreaks : Abstract: An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy inappropriate requests which are wrapped wi...
Towards a Measure of Algorithm Similarity : Abstract: Given two algorithms for the same problem, can we determine whether they are meaningfully different? In full generality, the question is uncomputable, and empirically it is muddied by compet...
Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation : Abstract: Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often limits trust, particularly in decisions that ...
QiNN-QJ: A Quantum-inspired Neural Network with Quantum Jump for Multimodal Sentiment Analysis : Abstract: Quantum theory provides non-classical principles, such as superposition and entanglement, that inspires promising paradigms in machine learning. However, most existing quantum-inspired fusio...
Expressive Range Characterization of Open Text-to-Audio Models : Abstract: Text-to-audio models are a type of generative model that produces audio output in response to a given textual prompt. Although level generators and the properties of the functional content t...
AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys : Abstract: Conventional online surveys provide limited personalization, often resulting in low engagement and superficial responses. Although AI survey chatbots improve convenience, most are still reac...
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding : Abstract: Recent advances in neural decoding have enabled the reconstruction of visual experiences from brain activity, positioning fMRI-to-image reconstruction as a promising bridge between neuroscie...
Exploring Landscapes for Better Minima along Valleys : Abstract: Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the com...
MARIA: A Framework for Marginal Risk Assessment without Ground Truth in AI Systems : Abstract: Before deploying an AI system to replace an existing process, it must be compared with the incumbent to ensure improvement without added risk. Traditional evaluation relies on ground truth f...
Generating Accurate and Detailed Captions for High-Resolution Images : Abstract: Vision-language models (VLMs) often struggle to generate accurate and detailed captions for high-resolution images since they are typically pre-trained on low-resolution inputs (e.g., 224x22...
H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models : Abstract: Diffusion models have emerged as state-of-the-art in image generation, but their practical deployment is hindered by the significant computational cost of their iterative denoising process. ...
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler : Abstract: Harmful fine-tuning poses critical safety risks to fine-tuning-as-a-service for large language models. Existing defense strategies preemptively build robustness via attack simulation but suf...
FMint-SDE: A Multimodal Foundation Model for Accelerating Numerical Simulation of SDEs via Error Correction : Abstract: Fast and accurate simulation of dynamical systems is a fundamental challenge across scientific and engineering domains. Traditional numerical integrators often face a trade-off between accur...
Dual-level Progressive Hardness-Aware Reweighting for Cross-View Geo-Localization : Abstract: Cross-view geo-localization (CVGL) between drone and satellite imagery remains challenging due to severe viewpoint gaps and the presence of hard negatives, which are visually similar but geo...
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications : Abstract: Model inversion, which aims to reconstruct the original training data from pre-trained discriminative models, is especially useful when the original training data is unavailable due to priva...
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures : Abstract: As Large Language Models (LLMs) are increasingly integrated into automated, multi-stage pipelines, risk patterns that arise from unvalidated trust between processing stages become a practica...
Vectorized Online POMDP Planning : Abstract: Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning unde...
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models : Abstract: The proliferation of memes on social media necessitates the capabilities of multimodal Large Language Models (mLLMs) to effectively understand multimodal harmfulness. Existing evaluation app...
Feature-Function Curvature Analysis: A Geometric Framework for Explaining Differentiable Models : Abstract: Explainable AI (XAI) is critical for building trust in complex machine learning models, yet mainstream attribution methods often provide an incomplete, static picture of a model's final stat...
Multi-Modal Feature Fusion for Spatial Morphology Analysis of Traditional Villages via Hierarchical Graph Neural Networks : Abstract: Villages areas hold significant importance in the study of human-land relationships. However, with the advancement of urbanization, the gradual disappearance of spatial characteristics and t...
Privacy-Aware Continual Self-Supervised Learning on Multi-Window Chest Computed Tomography for Domain-Shift Robustness : Abstract: We propose a novel continual self-supervised learning (CSSL) framework for simultaneously learning diverse features from multi-window-obtained chest computed tomography (CT) images and ensur...
Soft Task-Aware Routing of Experts for Equivariant Representation Learning : Abstract: Equivariant representation learning aims to capture variations induced by input transformations in the representation space, whereas invariant representation learning encodes semantic inform...
DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries : Abstract: Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully...
Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes : Abstract: Application modernization in legacy languages such as COBOL, PL/I, and REXX faces an acute shortage of resources, both in expert availability and in high-quality human evaluation data. While...
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs : Abstract: Evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning, for example in conversational settings, is hampered by the e...
Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication : Abstract: Brain-to-speech (BTS) systems represent a groundbreaking approach to human communication by enabling the direct transformation of neural activity into linguistic expressions. While recent no...
Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation : Abstract: Dataset distillation condenses large datasets into synthetic subsets, achieving performance comparable to training on the full dataset while substantially reducing storage and computation co...
Languages are Modalities: Cross-Lingual Alignment via Encoder Injection : Abstract: Instruction-tuned Large Language Models (LLMs) underperform on low resource, non-Latin scripts due to tokenizer fragmentation and weak cross-lingual coupling. We present LLINK (Latent Langua...
Higher-order Linear Attention : Abstract: The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provi...
MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models : Abstract: As large language models (LLMs) enter the medical domain, most benchmarks evaluate them on question answering or descriptive reasoning, overlooking quantitative reasoning critical to clinica...
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models? : Abstract: Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, yet they still suffer from a multilingual reasoning gap, performing better in high-resource languages ...
FOCUS: Efficient Keyframe Selection for Long Video Understanding : Abstract: Multimodal large language models (MLLMs) represent images and video frames as visual tokens. Scaling from single images to hour-long videos, however, inflates the token budget far beyond pra...
HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction : Abstract: Accurate prediction of Drug-Target Affinity (DTA) is crucial for reducing experimental costs and accelerating early screening in computational drug discovery. While sequence-based deep learn...
Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments : Abstract: Enterprise systems are crucial for enhancing productivity and decision-making among employees and customers. Integrating LLM based systems into enterprise systems enables intelligent automat...
Un-Attributability: Computing Novelty From Retrieval & Semantic Similarity : Abstract: Understanding how language-model outputs relate to the pretraining corpus is central to studying model behavior. Most training data attribution (TDA) methods ask which training examples caus...
CASR-Net: An Image Processing-focused Deep Learning-based Coronary Artery Segmentation and Refinement Network for X-ray Coronary Angiogram : Abstract: Early detection of coronary artery disease (CAD) is critical for reducing mortality and improving patient treatment planning. While angiographic image analysis from X-rays is a common and co...
Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis : Abstract: We consider the problem of ultra-low bit rate visual communication for remote vision analysis, human interactions and control in challenging scenarios with very low communication bandwidth, ...
Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V : Abstract: We present a practical pipeline for fine-tuning open-source video diffusion transformers to synthesize cinematic scenes for television and film production from small datasets. The proposed t...
Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity : Abstract: Chain-of-thought (CoT) outputs let us read a model's step-by-step reasoning. Since any long, serial reasoning process must pass through this textual trace, the quality of the CoT is a direct...
Spiking Neural Networks: The Future of Brain-Inspired Computing : Abstract: Spiking Neural Networks (SNNs) represent the latest generation of neural computation, offering a brain-inspired alternative to conventional Artificial Neural Networks (ANNs). Unlike ANNs, wh...
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs : Abstract: Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs). It typically locates knowledge storage modules and then modifies their ...
FedMuon: Accelerating Federated Learning with Matrix Orthogonalization : Abstract: The core bottleneck of Federated Learning (FL) lies in the communication rounds. That is, how to achieve more effective local updates is crucial for reducing communication rounds. Existing F...
Atlas-Alignment: Making Interpretability Transferable Across Language Models : Abstract: Interpretability is crucial for building safe, reliable, and controllable language models, yet existing interpretability pipelines remain costly and difficult to scale. Interpreting a new mo...
Who Does Your Algorithm Fail? Investigating Age and Ethnic Bias in the MAMA-MIA Dataset : Abstract: Deep learning models aim to improve diagnostic workflows, but fairness evaluation remains underexplored beyond classification, e.g., in image segmentation. Unaddressed segmentation bias can ...
Learning Soft Robotic Dynamics with Active Exploration : Abstract: Soft robots offer unmatched adaptability and safety in unstructured environments, yet their compliant, high-dimensional, and nonlinear dynamics make modeling for control notoriously difficul...
Mitigating Semantic Collapse in Partially Relevant Video Retrieval : Abstract: Partially Relevant Video Retrieval (PRVR) seeks videos where only part of the content matches a text query. Existing methods treat every annotated text-video pair as a positive and all other...
CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging : Abstract: Vision Transformers (ViTs) have demonstrated strong potential in medical imaging; however, their high computational demands and tendency to overfit on small datasets limit their applicabilit...
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision : Abstract: Supervised fine-tuning (SFT) on long chain-of-thought (CoT) trajectories has emerged as a crucial technique for enhancing the reasoning abilities of large language models (LLMs). However, th...
CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions : Abstract: Large Language Model (LLM) agents have evolved from basic text generation to autonomously completing complex tasks through interaction with external tools. However, current benchmarks mainly...
Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base : Abstract: Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step...
The Denario project: Deep knowledge AI agents for scientific discovery : Abstract: We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature...
Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS Operations : Abstract: Cyber-physical systems increasingly rely on Foundational Models such as Large Language Models (LLMs) and Vision-Language Models (VLMs) to increase autonomy through enhanced perception, infer...
SUSTAINABLE Platform: Seamless Smart Farming Integration Towards Agronomy Automation : Abstract: The global agricultural sector is undergoing a transformative shift, driven by increasing food demands, climate variability and the need for sustainable practices. SUSTAINABLE is a smart far...
Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models : Abstract: Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizati...
e1: Learning Adaptive Control of Reasoning Effort : Abstract: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of ...
Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement : Abstract: Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, ...
CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning : Abstract: State-of-the-art (SOTA) LLMs have progressed from struggling on proof-based Olympiad problems to solving most of the IMO 2025 problems, with leading systems reportedly handling 5 of 6 proble...
Glia: A Human-Inspired AI for Automated Systems Design and Optimization : Abstract: Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that...
From product to system network challenges in system of systems lifecycle management : Abstract: Today, products are no longer isolated artifacts, but nodes in networked systems. This means that traditional, linearly conceived life cycle models are reaching their limits: Interoperabilit...
Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering : Abstract: The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite ...
GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation : Abstract: While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We p...

Research Sources: 415 | Generated: 11/3/2025