AI RESEARCH PAPERS & ACADEMIC SOURCES
- Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers : Abstract: Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy. With the growing adoption of Vision Transformers...
- ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning : Abstract: Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most disc...
- NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding : Abstract: Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the u...
- ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning : Abstract: Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and im...
- Deep Neural Watermarking for Robust Copyright Protection in 3D Point Clouds : Abstract: The protection of intellectual property has become critical due to the rapid growth of three-dimensional content in digital media. Unlike traditional images or videos, 3D point clouds presen...
- MapSAM2: Adapting SAM2 for Automatic Segmentation of Historical Map Images and Time Series : Abstract: Historical maps are unique and valuable archives that document geographic features across different time periods. However, automated analysis of historical map images remains a significant c...
- Who Made This? Fake Detection and Source Attribution with Diffusion Features : Abstract: The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authentic...
- Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model : Abstract: Recently, augmenting Vision-Language-Action models (VLAs) with world modeling has shown promise in improving robotic policy learning. However, it remains challenging to jointly predict next-...
- NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception : Abstract: Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in coll...
- Gaussian Combined Distance: A Generic Metric for Object Detection : Abstract: In object detection, a well-defined similarity metric can significantly enhance model performance. Currently, the IoU-based similarity metric is the most commonly preferred choice for detect...
- Deep learning denoising unlocks quantitative insights in operando materials microscopy : Abstract: Operando microscopy provides direct insight into the dynamic chemical and physical processes that govern functional materials, yet measurement noise limits the effective resolution and under...
- Vision Transformer for Robust Occluded Person Reidentification in Complex Surveillance Scenes : Abstract: Person re-identification (ReID) in surveillance is challenged by occlusion, viewpoint distortion, and poor image quality. Most existing methods rely on complex modules or perform well only o...
- Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals : Abstract: Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectori...
- LifWavNet: Lifting Wavelet-based Network for Non-contact ECG Reconstruction from Radar : Abstract: Non-contact electrocardiogram (ECG) reconstruction from radar signals offers a promising approach for unobtrusive cardiac monitoring. We present LifWavNet, a lifting wavelet network based on...
- Audio-Visual Speech Enhancement In Complex Scenarios With Separation And Dereverberation Joint Modeling : Abstract: Audio-visual speech enhancement (AVSE) is a task that uses visual auxiliary information to extract a target speaker's speech from mixed audio. In real-world scenarios, there often exist comp...
- Generative diffusion modeling protocols for improving the Kikuchi pattern indexing in electron back-scatter diffraction : Abstract: Electron back-scatter diffraction (EBSD) has traditionally relied upon methods such as the Hough transform and dictionary Indexing to interpret diffraction patterns and extract crystallograp...
- A fragile zero-watermarking method based on dual quaternion matrix decomposition : Abstract: Medical images play a crucial role in assisting diagnosis, remote consultation, and academic research. However, during the transmission and sharing process, they face serious risks of copyri...
- SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Style Transfer : Abstract: Recent style transfer problems are still largely dominated by Generative Adversarial Network (GAN) from the perspective of cross-domain image-to-image (I2I) translation, where the pivotal is...
- GASP: Gaussian Splatting for Physic-Based Simulations : Abstract: Physics simulation is paramount for modeling and utilizing 3D scenes in various real-world applications. However, integrating with state-of-the-art 3D scene rendering techniques such as Gaus...
- EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting : Abstract: Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attemp...
- PROFIT: A Specialized Optimizer for Deep Fine Tuning : Abstract: The fine-tuning of pre-trained models has become ubiquitous in generative AI, computer vision, and robotics. Although much attention has been paid to improving the efficiency of fine-tuning ...
- MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians : Abstract: Reconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields...
- AMD-Hummingbird: Towards an Efficient Text-to-Video Model : Abstract: Text-to-Video (T2V) generation has attracted significant attention for its ability to synthesize realistic videos from textual descriptions. However, existing models struggle to balance comp...
- D$^2$USt3R: Enhancing 3D Reconstruction for Dynamic Scenes : Abstract: In this work, we address the task of 3D reconstruction in dynamic scenes, where object motions frequently degrade the quality of previous 3D pointmap regression methods, such as DUSt3R, that...
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation : Abstract: Recent advances in reinforcement learning (RL) have strengthened the reasoning capabilities of vision-language models (VLMs). However, enhancing policy exploration to better scale test-time ...
- Panoramic Out-of-Distribution Segmentation for Autonomous Driving : Abstract: Panoramic imaging enables capturing 360{\deg} images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous drivin...
- StateSpaceDiffuser: Bringing Long Context to Diffusion World Models : Abstract: World models have recently gained prominence for action-conditioned visual prediction in complex environments. However, relying on only a few recent observations causes them to lose long-ter...
- On the Theory of Conditional Feature Alignment for Unsupervised Domain-Adaptive Counting : Abstract: Object counting models suffer when deployed across domains with differing density variety, since density shifts are inherently task-relevant and violate standard domain adaptation assumption...
- LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation : Abstract: While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, a...
- Augmented Reality-based Guidance with Deformable Registration in Head and Neck Tumor Resection : Abstract: Head and neck squamous cell carcinoma (HNSCC) has one of the highest rates of recurrence cases among solid malignancies. Recurrence rates can be reduced by improving positive margins localiz...
- MeisenMeister: A Simple Two Stage Pipeline for Breast Cancer Classification on MRI : Abstract: The ODELIA Breast MRI Challenge 2025 addresses a critical issue in breast cancer screening: improving early detection through more efficient and accurate interpretation of breast MRI scans. ...
- Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing : Abstract: Existing image editing methods can handle simple editing instructions very well. To deal with complex editing instructions, they often need to jointly fine-tune the large language models (LL...
- RzenEmbed: Towards Comprehensive Multimodal Retrieval : Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has extended CLIP-based frameworks to produce powerful, universal embeddings for retrieval tasks. However, existing methods ...
- A Hybrid Deep Learning and Forensic Approach for Robust Deepfake Detection : Abstract: The rapid evolution of generative adversarial networks (GANs) and diffusion models has made synthetic media increasingly realistic, raising societal concerns around misinformation, identity ...
- E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources : Abstract: Diffusion models have shown strong capabilities in generating high-quality images from text prompts. However, these models often require large-scale training data and significant computation...
- From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration : Abstract: Scientific illustrations demand both high information density and post-editability. However, current generative models have two major limitations: Frist, image generation models output raste...
- A Multi-tiered Human-in-the-loop Approach for Interactive School Mapping Using Earth Observation and Machine Learning : Abstract: This paper presents a multi-tiered human-in-the-loop framework for interactive school mapping designed to improve the accuracy and completeness of educational facility records, particularly ...
- Referee: Reference-aware Audiovisual Deepfake Detection : Abstract: Since deepfakes generated by advanced generative models have rapidly posed serious threats, existing audiovisual deepfake detection approaches struggle to generalize to unseen forgeries. We ...
- DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting : Abstract: We present a Directional Consistency (DC)-driven Adaptive Density Control (ADC) for 3D Gaussian Splatting (DC4GS). Whereas the conventional ADC bases its primitive splitting on the magnitude...
- SYNAPSE-Net: A Unified Framework with Lesion-Aware Hierarchical Gating for Robust Segmentation of Heterogeneous Brain Lesions : Abstract: Automated segmentation of heterogeneous brain lesions from multi-modal MRI remains a critical challenge in clinical neuroimaging. Current deep learning models are typically specialized `poin...
- MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation : Abstract: In this study, we propose MoME, a Mixture of Visual Language Medical Experts, for Medical Image Segmentation. MoME adapts the successful Mixture of Experts (MoE) paradigm, widely used in Lar...
- Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning : Abstract: In open-world environments, human-object interactions (HOIs) evolve continuously, challenging conventional closed-world HOI detection models. Inspired by humans' ability to progressively acq...
- DeblurSDI: Blind Image Deblurring Using Self-diffusion : Abstract: Blind image deconvolution is a challenging ill-posed inverse problem, where both the latent sharp image and the blur kernel are unknown. Traditional methods often rely on handcrafted priors,...
- VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video : Abstract: This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote ph...
- AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception : Abstract: This paper presents the Autonomous Driving Segment Anything Model (AD-SAM), a fine-tuned vision foundation model for semantic segmentation in autonomous driving (AD). AD-SAM extends the Segm...
- Hierarchical Transformers for Unsupervised 3D Shape Abstraction : Abstract: We introduce HiT, a novel hierarchical neural field representation for 3D shapes that learns general hierarchies in a coarse-to-fine manner across different shape categories in an unsupervis...
- WildfireX-SLAM: A Large-scale Low-altitude RGB-D Dataset for Wildfire SLAM and Beyond : Abstract: 3D Gaussian splatting (3DGS) and its subsequent variants have led to remarkable progress in simultaneous localization and mapping (SLAM). While most recent 3DGS-based SLAM works focus on sma...
- Improving Cross-view Object Geo-localization: A Dual Attention Approach with Cross-view Interaction and Multi-Scale Spatial Features : Abstract: Cross-view object geo-localization has recently gained attention due to potential applications. Existing methods aim to capture spatial dependencies of query objects between different views ...
- HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition : Abstract: Three-dimensional scene generation holds significant potential in gaming, film, and virtual reality. However, most existing methods adopt a single-step generation process, making it difficul...
- AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification : Abstract: Remote sensing image scene classification remains a challenging task, primarily due to the complex spatial structures and multi-scale characteristics of ground objects. Existing approaches s...
- How Close Are We? Limitations and Progress of AI Models in Banff Lesion Scoring : Abstract: The Banff Classification provides the global standard for evaluating renal transplant biopsies, yet its semi-quantitative nature, complex criteria, and inter-observer variability present sig...
- M^3Detection: Multi-Frame Multi-Level Feature Fusion for Multi-Modal 3D Object Detection with Camera and 4D Imaging Radar : Abstract: Recent advances in 4D imaging radar have enabled robust perception in adverse weather, while camera sensors provide dense semantic information. Fusing the these complementary modalities has ...
- DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model : Abstract: Recently, diffusion models have shown their impressive ability in visual generation tasks. Besides static images, more and more research attentions have been drawn to the generation of reali...
- SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles : Abstract: Video identification attacks pose a significant privacy threat that can reveal videos that victims watch, which may disclose their hobbies, religious beliefs, political leanings, sexual orie...
- SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping : Abstract: Hyperspectral imaging (HSI) is a vital tool for fine-grained land-use and land-cover (LULC) mapping. However, the inherent heterogeneity of HSI data has long posed a major barrier to develop...
- Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery : Abstract: Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of...
- MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts : Abstract: Recent advances in language and vision have demonstrated that scaling up model capacity consistently improves performance across diverse tasks. In 3D visual geometry reconstruction, large-sc...
- Object-IR: Leveraging Object Consistency and Mesh Deformation for Self-Supervised Image Retargeting : Abstract: Eliminating geometric distortion in semantically important regions remains an intractable challenge in image retargeting. This paper presents Object-IR, a self-supervised architecture that r...
- Fusion of Heterogeneous Pathology Foundation Models for Whole Slide Image Analysis : Abstract: Whole slide image (WSI) analysis has emerged as an increasingly essential technique in computational pathology. Recent advances in the pathological foundation models (FMs) have demonstrated ...
- Rethinking Robust Adversarial Concept Erasure in Diffusion Models : Abstract: Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most ...
- Trans-defense: Transformer-based Denoiser for Adversarial Defense with Spatial-Frequency Domain Representation : Abstract: In recent times, deep neural networks (DNNs) have been successfully adopted for various applications. Despite their notable achievements, it has become evident that DNNs are vulnerable to so...
- C-LEAD: Contrastive Learning for Enhanced Adversarial Defense : Abstract: Deep neural networks (DNNs) have achieved remarkable success in computer vision tasks such as image classification, segmentation, and object detection. However, they are vulnerable to advers...
- Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes : Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities in zero-shot action recognition by learning to associate video embeddings with class embeddings. However, a significan...
- RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents : Abstract: Multi-modal Retrieval-Augmented Generation (RAG) has become a critical method for empowering LLMs by leveraging candidate visual documents. However, current methods consider the entire docum...
- HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration : Abstract: Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models,...
- Versatile and Efficient Medical Image Super-Resolution Via Frequency-Gated Mamba : Abstract: Medical image super-resolution (SR) is essential for enhancing diagnostic accuracy while reducing acquisition cost and scanning time. However, modeling both long-range anatomical structures ...
- Overcoming Prompts Pool Confusion via Parameterized Prompt for Incremental Object Detection : Abstract: Recent studies have demonstrated that incorporating trainable prompts into pretrained models enables effective incremental learning. However, the application of prompts in incremental object...
- SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction : Abstract: Surgical reconstruction of dynamic tissues from endoscopic videos is a crucial technology in robot-assisted surgery. The development of Neural Radiance Fields (NeRFs) has greatly advanced de...
- Querying functional and structural niches on spatial transcriptomics data : Abstract: Cells in multicellular organisms coordinate to form functional and structural niches. With spatial transcriptomics enabling gene expression profiling in spatial contexts, it has been reveale...
- Supervised Quadratic Feature Analysis: Information Geometry Approach for Dimensionality Reduction : Abstract: Supervised dimensionality reduction maps labeled data into a low-dimensional feature space while preserving class discriminability. A common approach is to maximize a statistical measure of ...
- A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees : Abstract: Finding an $\epsilon$-stationary point of a nonconvex function with a Lipschitz continuous Hessian is a central problem in optimization. Regularized Newton methods are a classical tool and h...
- Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm : Abstract: Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate paramet...
- Kernel Mean Embedding Topology: Weak and Strong Forms for Stochastic Kernels and Implications for Model Learning : Abstract: We introduce a novel topology, called Kernel Mean Embedding Topology, for stochastic kernels, in a weak and strong form. This topology, defined on the spaces of Bochner integrable functions ...
- Qini Curve Estimation under Clustered Network Interference : Abstract: Qini curves are a widely used tool for assessing treatment policies under allocation constraints as they visualize the incremental gain of a new treatment policy versus the cost of its imple...
- DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions : Abstract: We consider the Inverse Optimal Stopping (IOS) problem where, based on stopped expert trajectories, one aims to recover the optimal stopping region through the continuation and stopping gain...
- Manifold Learning for Hyperspectral Images : Abstract: Traditional feature extraction and projection techniques, such as Principal Component Analysis, struggle to adequately represent X-Ray Transmission (XRT) Multi-Energy (ME) images, limiting t...
- The cell as a token: high-dimensional geometry in language models and cell embeddings : Abstract: Single-cell sequencing technology maps cells to a high-dimensional space encoding their internal activity. Recently-proposed virtual cell models extend this concept, enriching cells' represe...
- HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving : Abstract: Early-Exit Large Language Models (EE-LLMs) enable high throughput inference by allowing tokens to exit early at intermediate layers. However, their throughput is limited by the computational...
- Token Distillation: Attention-aware Input Embeddings For New Tokens : Abstract: Current language models rely on static vocabularies determined at pretraining time, which can lead to decreased performance and increased computational cost for domains underrepresented in t...
- AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy : Abstract: Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research i...
- Conformal Object Detection by Sequential Risk Control : Abstract: Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in safety-critical applications is hindered by the inherent lack of reliability ...
- Game Theoretic Resilience Recommendation Framework for CyberPhysical Microgrids Using Hypergraph MetaLearning : Abstract: This paper presents a physics-aware cyberphysical resilience framework for radial microgrids under coordinated cyberattacks. The proposed approach models the attacker through a hypergraph ne...
- Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling : Abstract: Hybrid models that combine state space models (SSMs) with attention mechanisms have shown strong performance by leveraging the efficiency of SSMs and the high recall ability of attention. Ho...
- Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services : Abstract: With the increasing use of conversational AI systems, there is growing concern over privacy leaks, especially when users share sensitive personal data in interactions with Large Language Mod...
- Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral : Abstract: Several previous works concluded that the largest part of generation capabilities of large language models (LLM) are learned (early) during pre-training. However, LLMs still require further ...
- Quantitative Intertextuality from the Digital Humanities Perspective: A Survey : Abstract: The connection between texts is referred to as intertextuality in literary theory, which served as an important theoretical basis in many digital humanities studies. Over the past decade, ad...
- Recursive numeral systems are highly regular and easy to process : Abstract: Previous work has argued that recursive numeral systems optimise the trade-off between lexicon size and average morphosyntatic complexity (Deni\'c and Szymanik, 2024). However, showing that ...
- VISTA Score: Verification In Sequential Turn-based Assessment : Abstract: Hallucination--defined here as generating statements unsupported or contradicted by available evidence or conversational context--remains a major obstacle to deploying conversational AI syst...
- LLM-Centric RAG with Multi-Granular Indexing and Confidence Constraints : Abstract: This paper addresses the issues of insufficient coverage, unstable results, and limited reliability in retrieval-augmented generation under complex knowledge environments, and proposes a con...
- Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models : Abstract: This paper addresses the limitations of large-scale language models in safety alignment and robustness by proposing a fine-tuning method that combines contrastive distillation with noise-rob...
- Characterizing Selective Refusal Bias in Large Language Models : Abstract: Safety guardrails in large language models(LLMs) are developed to prevent malicious users from generating toxic content at a large scale. However, these measures can inadvertently introduce ...
- Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks : Abstract: As Natural Language Generation (NLG) continues to be widely adopted, properly assessing it has become quite difficult. Lately, using large language models (LLMs) for evaluating these generat...
- Probability Distributions Computed by Hard-Attention Transformers : Abstract: Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strin...
- Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+ : Abstract: The URIEL+ linguistic knowledge base supports multilingual research by encoding languages through geographic, genetic, and typological vectors. However, data sparsity remains prevalent, in t...
- Identifying the Periodicity of Information in Natural Language : Abstract: Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its ...
- A Unified Representation Underlying the Judgment of Large Language Models : Abstract: A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery ...
- TransAlign: Machine Translation Encoders are Strong Word Aligners, Too : Abstract: In the absence of sizable training data for most world languages and NLP tasks, translation-based strategies such as translate-test -- evaluating on noisy source language data translated fro...
- ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations : Abstract: This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unli...
- From the Rock Floor to the Cloud: A Systematic Survey of State-of-the-Art NLP in Battery Life Cycle : Abstract: We present a comprehensive systematic survey of the application of natural language processing (NLP) along the entire battery life cycle, instead of one stage or method, and introduce a nove...
- Awal -- Community-Powered Language Technology for Tamazight : Abstract: This paper presents Awal, a community-powered initiative for developing language technology resources for Tamazight. We provide a comprehensive review of the NLP landscape for Tamazight, exa...
- Dynamic Affective Memory Management for Personalized LLM Agents : Abstract: Advances in large language models are making personalized AI agents a new research focus. While current agent systems primarily rely on personalized external memory databases to deliver cust...
- Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning : Abstract: In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic eva...
- The aftermath of compounds: Investigating Compounds and their Semantic Representations : Abstract: This study investigates how well computational embeddings align with human semantic judgments in the processing of English compound words. We compare static word vectors (GloVe) and contextu...
- Effect of Domain Generalization Techniques in Low Resource Systems : Abstract: Machine learning models typically assume that training and test data follow the same distribution, an assumption that often fails in real-world scenarios due to distribution shifts. This iss...
- SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps : Abstract: We introduce SQLSpace, a human-interpretable, generalizable, compact representation for text-to-SQL examples derived with minimal human intervention. We demonstrate the utility of these repr...
- Patient-Centered Summarization Framework for AI Clinical Summarization: A Mixed-Methods Design : Abstract: Large Language Models (LLMs) are increasingly demonstrating the potential to reach human-level performance in generating clinical summaries from patient-clinician conversations. However, the...
- Multilingual BERT language model for medical tasks: Evaluation on domain-specific adaptation and cross-linguality : Abstract: In multilingual healthcare applications, the availability of domain-specific natural language processing(NLP) tools is limited, especially for low-resource languages. Although multilingual b...
- Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization : Abstract: LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-ed...
- MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval : Abstract: Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new informati...
- Culture Cartography: Mapping the Landscape of Cultural Knowledge : Abstract: To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-gro...
- Evaluating Perspectival Biases in Cross-Modal Retrieval : Abstract: Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically re...
- Semantic Frame Aggregation-based Transformer for Live Video Comment Generation : Abstract: Live commenting on video streams has surged in popularity on platforms like Twitch, enhancing viewer engagement through dynamic interactions. However, automatically generating contextually a...
- Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions : Abstract: As AI systems become increasingly integrated into human lives, endowing them with robust social intelligence has emerged as a critical frontier. A key aspect of this intelligence is discerni...
- SMOL: Professionally translated parallel data for 115 under-represented languages : Abstract: We open-source SMOL (Set of Maximal Overall Leverage), a suite of training data to unlock machine translation for low-resource languages. SMOL has been translated into 124 (and growing) unde...
- FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages : Abstract: This paper presents the winning submission of the RaaVa team to the AmericasNLP 2025 Shared Task 3 on Automatic Evaluation Metrics for Machine Translation (MT) into Indigenous Languages of A...
- Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning : Abstract: Hybrid LLM architectures that combine Attention and State Space Models (SSMs) achieve state-of-the-art accuracy and runtime performance. Recent work has demonstrated that applying compressio...
- Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation : Abstract: Vision-Language Models (VLMs) often struggle to balance visual and textual information when summarizing complex multimodal inputs, such as entire TV show episodes. In this paper, we propose ...
- Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models : Abstract: We introduce the Diffusion Chain of Lateral Thought (DCoLT), a reasoning framework for diffusion language models. DCoLT treats each intermediate step in the reverse diffusion process as a la...
- VeriFastScore: Speeding up long-form factuality evaluation : Abstract: Metrics like FactScore and VeriScore that evaluate long-form factuality operate by decomposing an input response into atomic claims and then individually verifying each claim. While effectiv...
- Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations : Abstract: Although mathematics is often considered culturally neutral, the way mathematical problems are presented can carry implicit cultural context. Existing benchmarks like GSM8K are predominantly...
- FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing : Abstract: The rapid advancement of Large Language Models (LLMs) has spurred significant progress in Large Speech-Language Models (LSLMs), enhancing their capabilities in both speech understanding and ...
- RADAR: Benchmarking Language Models on Imperfect Tabular Data : Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle da...
- PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT : Abstract: Quantitative computed tomography (QCT) plays a crucial role in assessing bone strength and fracture risk by enabling volumetric analysis of bone density distribution in the proximal femur. H...
- Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion : Abstract: This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA) to integrate an ensemble of diverse machine learning models, ach...
- Quantitative Bounds for Length Generalization in Transformers : Abstract: We study the problem of length generalization (LG) in transformers: the ability of a model trained on shorter sequences to maintain performance when evaluated on much longer, previously unse...
- Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning : Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiab...
- MLPerf Automotive : Abstract: We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a...
- Towards Understanding Self-play for LLM Reasoning : Abstract: Recent advances in large language model (LLM) reasoning, led by reinforcement learning with verifiable rewards (RLVR), have inspired self-play post-training, where models improve by generati...
- Functional embeddings enable Aggregation of multi-area SEEG recordings over subjects and sessions : Abstract: Aggregating intracranial recordings across subjects is challenging since electrode count, placement, and covered regions vary widely. Spatial normalization methods like MNI coordinates offer...
- Hierarchical Bayesian Model for Gene Deconvolution and Functional Analysis in Human Endometrium Across the Menstrual Cycle : Abstract: Bulk tissue RNA sequencing of heterogeneous samples provides averaged gene expression profiles, obscuring cell type-specific dynamics. To address this, we present a probabilistic hierarchica...
- Group-Sensitive Offline Contextual Bandits : Abstract: Offline contextual bandits allow one to learn policies from historical/offline data without requiring online interaction. However, offline policy optimization that maximizes overall expected...
- AI Agents in Drug Discovery : Abstract: Artificial intelligence (AI) agents are emerging as transformative tools in drug discovery, with the ability to autonomously reason, act, and learn through complicated research workflows. Bu...
- Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring : Abstract: This study explored the utilities of rationales generated by GPT-4.1 and GPT-5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data. Essay-based scoring was compared wit...
- FairAD: Computationally Efficient Fair Graph Clustering via Algebraic Distance : Abstract: Due to the growing concern about unsavory behaviors of machine learning models toward certain demographic groups, the notion of 'fairness' has recently drawn much attention from the communit...
- Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity Scores : Abstract: Database Management Systems (DBMSs) are fundamental for managing large-scale and heterogeneous data, and their performance is critically influenced by configuration parameters. Effective tun...
- A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions : Abstract: In this paper, we study the problem of online sparse linear regression (OSLR) where the algorithms are restricted to accessing only $k$ out of $d$ attributes per instance for prediction, whi...
- SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML Inference : Abstract: Dynamic offloading of Machine Learning (ML) model partitions across different resource orchestration services, such as Function-as-a-Service (FaaS) and Infrastructure-as-a-Service (IaaS), ca...
- MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting : Abstract: Traffic accidents represent a critical public health challenge, claiming over 1.35 million lives annually worldwide. Traditional accident prediction models treat road segments independently,...
- FedSM: Robust Semantics-Guided Feature Mixup for Bias Reduction in Federated Learning with Long-Tail Data : Abstract: Federated Learning (FL) enables collaborative model training across decentralized clients without sharing private data. However, FL suffers from biased global models due to non-IID and long-...
- ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models : Abstract: Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low e...
- ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction : Abstract: Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test d...
- Temporal Cardiovascular Dynamics for Improved PPG-Based Heart Rate Estimation : Abstract: The oscillations of the human heart rate are inherently complex and non-linear -- they are best described by mathematical chaos, and they present a challenge when applied to the practical do...
- Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift : Abstract: With the growing volume of Internet of Things (IoT) network traffic, machine learning (ML)-based anomaly detection is more relevant than ever. Traditional batch learning models face challeng...
- MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data : Abstract: The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to ad...
- Reasoning Models Sometimes Output Illegible Chains of Thought : Abstract: Language models trained via outcome-based reinforcement learning (RL) to reason using chain-of-thought (CoT) have shown remarkable performance. Monitoring such a model's CoT may allow us to ...
- MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss : Abstract: Understanding post-wildfire vegetation loss is critical for developing effective ecological recovery strategies and is often challenging due to the extended time and effort required to captu...
- Spectral Neural Graph Sparsification : Abstract: Graphs are central to modeling complex systems in domains such as social networks, molecular chemistry, and neuroscience. While Graph Neural Networks, particularly Graph Convolutional Networ...
- Simplex-to-Euclidean Bijections for Categorical Flow Matching : Abstract: We propose a method for learning and sampling from probability distributions supported on the simplex. Our approach maps the open simplex to Euclidean space via smooth bijections, leveraging...
- Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs : Abstract: The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax=b, where effective preconditioners are crucial for fast convergence. Tr...
- Active transfer learning for structural health monitoring : Abstract: Data for training structural health monitoring (SHM) systems are often expensive and/or impractical to obtain, particularly for labelled data. Population-based SHM (PBSHM) aims to address th...
- AstuteRAG-FQA: Task-Aware Retrieval-Augmented Generation Framework for Proprietary Data Challenges in Financial Question Answering : Abstract: Retrieval-Augmented Generation (RAG) shows significant promise in knowledge-intensive tasks by improving domain specificity, enhancing temporal relevance, and reducing hallucinations. Howeve...
- ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling : Abstract: Formulating optimization problems for industrial applications demands significant manual effort and domain expertise. While Large Language Models (LLMs) show promise in automating this proce...
- Panprediction: Optimal Predictions for Any Downstream Task and Loss : Abstract: Supervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model trai...
- Imbalanced Classification through the Lens of Spurious Correlations : Abstract: Class imbalance poses a fundamental challenge in machine learning, frequently leading to unreliable classification performance. While prior methods focus on data- or loss-reweighting schemes...
- W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models : Abstract: The demand for efficient natural language processing (NLP) systems has led to the development of lightweight language models. Previous work in this area has primarily focused on manual desig...
- Diabetes Lifestyle Medicine Treatment Assistance Using Reinforcement Learning : Abstract: Type 2 diabetes prevention and treatment can benefit from personalized lifestyle prescriptions. However, the delivery of personalized lifestyle medicine prescriptions is limited by the short...
- A Machine Learning-Based Framework to Shorten the Questionnaire for Assessing Autism Intervention : Abstract: Caregivers of individuals with autism spectrum disorder (ASD) often find the 77-item Autism Treatment Evaluation Checklist (ATEC) burdensome, limiting its use for routine monitoring. This st...
- Towards Gaussian processes modelling to study the late effects of radiotherapy in children and young adults with brain tumours : Abstract: Survivors of childhood cancer need lifelong monitoring for side effects from radiotherapy. However, longitudinal data from routine monitoring is often infrequently and irregularly sampled, a...
- Toward precision soil health: A regional framework for site-specific management across Missouri : Abstract: Effective soil health management is crucial for sustaining agriculture, adopting ecosystem resilience, and preserving water quality. However, Missouri's diverse landscapes limit the effectiv...
- Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition : Abstract: Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable ...
- Are Online Sports Fan Communities Becoming More Offensive? A Quantitative Review of Topics, Trends, and Toxicity of r/PremierLeague : Abstract: Online communities for sports fans have surged in popularity, with Reddit's r/PremierLeague emerging as a focal point for fans of one of the globe's most celebrated sports leagues. This boom...
- Domain decomposition architectures and Gauss-Newton training for physics-informed neural networks : Abstract: Approximating the solutions of boundary value problems governed by partial differential equations with neural networks is challenging, largely due to the difficult training process. This dif...
- GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction : Abstract: Multimodal approaches that integrate protein structure and sequence have achieved remarkable success in protein-protein interface prediction. However, extending these methods to protein-pept...
- Accelerating Radiative Transfer for Planetary Atmospheres by Orders of Magnitude with a Transformer-Based Machine Learning Model : Abstract: Radiative transfer calculations are essential for modeling planetary atmospheres. However, standard methods are computationally demanding and impose accuracy-speed trade-offs. High computati...
- Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications : Abstract: This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution,...
- Learning Generalizable Visuomotor Policy through Dynamics-Alignment : Abstract: Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models hav...
- SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories : Abstract: Objective: ServiMon is designed to offer a scalable and intelligent pipeline for data collection and auditing to monitor distributed astronomical systems such as the ASTRI Mini-Array. The sy...
- T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis : Abstract: In medical imaging, vision-language models face a critical duality: pretrained networks offer broad robustness but lack subtle, modality-specific characteristics, while fine-tuned expert mod...
- Traceable Drug Recommendation over Medical Knowledge Graphs : Abstract: Drug recommendation (DR) systems aim to support healthcare professionals in selecting appropriate medications based on patients' medical conditions. State-of-the-art approaches utilize deep ...
- When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making : Abstract: We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limi...
- Pairwise and Attribute-Aware Decision Tree-Based Preference Elicitation for Cold-Start Recommendation : Abstract: Recommender systems (RSs) are intelligent filtering methods that suggest items to users based on their inferred preferences, derived from their interaction history on the platform. Collabora...
- FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning : Abstract: Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Additi...
- On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields : Abstract: Flow Matching (FM) method in generative modeling maps arbitrary probability distributions by constructing an interpolation between them and then learning the vector field that defines ODE fo...
- Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds : Abstract: Modality alignment is critical for vision-language models (VLMs) to effectively integrate information across modalities. However, existing methods extract hierarchical features from text whi...
- Interpretable Model-Aware Counterfactual Explanations for Random Forest : Abstract: Despite their enormous predictive power, machine learning models are often unsuitable for applications in regulated industries such as finance, due to their limited capacity to provide expla...
- Estimation of aboveground biomass in a tropical dry forest: An intercomparison of airborne, unmanned, and space laser scanning : Abstract: According to the Paris Climate Change Agreement, all nations are required to submit reports on their greenhouse gas emissions and absorption every two years by 2024. Consequently, forests pl...
- Minimax-Optimal Two-Sample Test with Sliced Wasserstein : Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promis...
- pDANSE: Particle-based Data-driven Nonlinear State Estimation from Nonlinear Measurements : Abstract: We consider the problem of designing a data-driven nonlinear state estimation (DANSE) method that uses (noisy) nonlinear measurements of a process whose underlying state transition model (ST...
- Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite Networks : Abstract: The rise of ultra-dense LEO constellations creates a complex and asynchronous network environment, driven by their massive scale, dynamic topologies, and significant delays. This unique comp...
- BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization : Abstract: Transformer-based architectures have advanced text summarization, yet their quadratic complexity limits scalability on long documents. This paper introduces BiSparse-AAS (Bilinear Sparse Att...
- Representing Classical Compositions through Implication-Realization Temporal-Gestalt Graphs : Abstract: Understanding the structural and cognitive underpinnings of musical compositions remains a key challenge in music theory and computational musicology. While traditional methods focus on harm...
- Optimal Convergence Analysis of DDPM for General Distributions : Abstract: Score-based diffusion models have achieved remarkable empirical success in generating high-quality samples from target data distributions. Among them, the Denoising Diffusion Probabilistic M...
- Image Hashing via Cross-View Code Alignment in the Age of Foundation Models : Abstract: Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor s...
- Learned Static Function Data Structures : Abstract: We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Com...
- Enhancing software product lines with machine learning components : Abstract: Modern software systems increasingly integrate machine learning (ML) due to its advancements and ability to enhance data-driven decision-making. However, this integration introduces signific...
- SpecAttn: Speculating Sparse Attention : Abstract: Large Language Models (LLMs) face significant computational bottlenecks during inference due to the quadratic complexity of self-attention mechanisms, particularly as context lengths increas...
- Bayesian Optimization on Networks : Abstract: This paper studies optimization on networks modeled as metric graphs. Motivated by applications where the objective function is expensive to evaluate or only available as a black box, we dev...
- Bayesian model selection and misspecification testing in imaging inverse problems only from noisy and partial measurements : Abstract: Modern imaging techniques heavily rely on Bayesian statistical models to address difficult image reconstruction and restoration tasks. This paper addresses the objective evaluation of such m...
- On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection : Abstract: Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code v...
- Dark-Field X-Ray Imaging Significantly Improves Deep-Learning based Detection of Synthetic Early-Stage Lung Tumors in Preclinical Models : Abstract: Low-dose computed tomography (LDCT) is the current standard for lung cancer screening, yet its adoption and accessibility remain limited. Many regions lack LDCT infrastructure, and even amon...
- Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence : Abstract: Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, t...
- Accelerated Rates between Stochastic and Adversarial Online Convex Optimization : Abstract: Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental intere...
- Decoding Virtual Healthcare Success through Knowledge-Aware and Multimodal Predictive Modeling : Abstract: Online healthcare consultations have transformed how patients seek medical advice, offering convenience while introducing new challenges for ensuring consultation success. Predicting whether...
- Scaling Tractable Probabilistic Circuits: A Systems Perspective : Abstract: Probabilistic Circuits (PCs) are a general framework for tractable deep generative models, which support exact and efficient probabilistic inference on their learned distributions. Recent mo...
- Reevaluating Theoretical Analysis Methods for Optimization in Deep Learning : Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on ...
- Convergence of continuous-time stochastic gradient descent with applications to deep neural networks : Abstract: We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general suffi...
- Swing-by Dynamics in Concept Learning and Compositional Generalization : Abstract: Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalizatio...
- DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias : Abstract: In-context learning (ICL) is the remarkable ability displayed by some machine learning models to learn from examples provided in a user prompt without any model parameter updates. ICL was fi...
- AERO: Entropy-Guided Framework for Private LLM Inference : Abstract: Privacy-preserving computation enables language model inference directly on encrypted data yet suffers from prohibitive latency and communication overheads, primarily due to nonlinear functi...
- Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems : Abstract: Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian ...
- Model Inversion Attacks: A Survey of Approaches and Countermeasures : Abstract: The success of deep neural networks has driven numerous research studies and applications from Euclidean to non-Euclidean data. However, there are increasing concerns about privacy leakage, ...
- Resource-Adaptive Successive Doubling for Hyperparameter Optimization with Large Datasets on High-Performance Computing Systems : Abstract: On High-Performance Computing (HPC) systems, several hyperparameter configurations can be evaluated in parallel to speed up the Hyperparameter Optimization (HPO) process. State-of-the-art HP...
- Byzantine Resilient Federated Multi-Task Representation Learning : Abstract: In this paper, we propose BR-MTRL, a Byzantine-resilient multi-task representation learning framework that handles faulty or malicious agents. Our approach leverages representation learning ...
- Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models : Abstract: Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing chall...
- Benchmarking Ultra-Low-Power $\mu$NPUs : Abstract: Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has spa...
- Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models : Abstract: Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their perfo...
- Kernel conditional tests from learning-theoretic bounds : Abstract: We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct statistical tests of functionals of conditional distributions. These te...
- PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design : Abstract: Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create hig...
- Geometry-Aware Edge Pooling for Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have shown significant success for graph-based tasks. Motivated by the prevalence of large datasets in real-world applications, pooling layers are crucial compon...
- Graph Semi-Supervised Learning for Point Classification on Data Manifolds : Abstract: We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional...
- RTNinja: a generalized machine learning framework for analyzing random telegraph noise signals in nanoelectronic devices : Abstract: Random telegraph noise is a prevalent variability phenomenon in nanoelectronic devices, arising from stochastic carrier exchange at defect sites and critically impacting device reliability a...
- Hankel Singular Value Regularization for Highly Compressible State Space Models : Abstract: Deep neural networks using state space models as layers are well suited for long-range sequence tasks but can be challenging to compress after training. We use that regularizing the sum of H...
- Towards a Generalizable AI for Materials Discovery: Validation through Immersion Coolant Screening : Abstract: Artificial intelligence (AI) has emerged as a powerful accelerator of materials discovery, yet most existing models remain problem-specific, requiring additional data collection and retraini...
- Adversarially robust clustering with optimality guarantees : Abstract: We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are us...
- RObotic MAnipulation Network (ROMAN) -- Hybrid Hierarchical Learning for Solving Complex Sequential Tasks : Abstract: Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulat...
- ESTformer: Transformer utilising spatiotemporal dependencies for electroencephalogram super-resolution : Abstract: Towards practical applications of Electroencephalography (EEG), lightweight acquisition devices garner significant attention. However, EEG channel selection methods are commonly data-sensiti...
- Agnostic Tomography of Stabilizer Product States : Abstract: We define a quantum learning task called agnostic tomography, where given copies of an arbitrary state $\rho$ and a class of quantum states $\mathcal{C}$, the goal is to output a succinct de...
- Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces : Abstract: This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the no...
- Face Spoofing Detection using Deep Learning : Abstract: Digital image spoofing has emerged as a significant security threat in biometric authentication systems, particularly those relying on facial recognition. This study evaluates the performanc...
- On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations : Abstract: Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained...
- GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models : Abstract: The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has mot...
- RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm : Abstract: Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs). However, existing PTQ methods generally suffer from cr...
- DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers : Abstract: Existing machine unlearning (MU) approaches exhibit significant sensitivity to hyperparameters, requiring meticulous tuning that limits practical deployment. In this work, we first empirical...
- Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness : Abstract: Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raise...
- Variational Visual Question Answering for Uncertainty-Aware Selective Prediction : Abstract: Despite remarkable progress in recent years, vision language models (VLMs) remain prone to overconfidence and hallucinations on tasks such as Visual Question Answering (VQA) and Visual Reaso...
- SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training : Abstract: The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cor...
- GAIA: A Foundation Model for Operational Atmospheric Dynamics : Abstract: We introduce GAIA (Geospatial Artificial Intelligence for Atmospheres), a hybrid self-supervised geospatial foundation model that fuses Masked Autoencoders (MAE) with self-distillation with ...
- Rethinking Metrics and Benchmarks of Video Anomaly Detection : Abstract: Video Anomaly Detection (VAD), which aims to detect anomalies that deviate from expectation, has attracted increasing attention in recent years. Existing advancements in VAD primarily focus ...
- SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA : Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suf...
- Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting : Abstract: Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reve...
- Artificial Empathy: AI based Mental Health : Abstract: Many people suffer from mental health problems but not everyone seeks professional help or has access to mental health care. AI chatbots have increasingly become a go-to for individuals who ...
- Accelerating Diffusion LLMs via Adaptive Parallel Decoding : Abstract: The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretic...
- PoLAR: Polar-Decomposed Low-Rank Adapter Representation : Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To miti...
- UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare : Abstract: Healthcare providers often divide patient populations into cohorts based on shared clinical factors, such as medical history, to deliver personalized healthcare services. This idea has also ...
- Large Language Models for Combinatorial Optimization of Design Structure Matrix : Abstract: In complex engineering systems, the dependencies among components or development activities are often modeled and analyzed using Design Structure Matrix (DSM). Reorganizing elements within a...
- Graph Diffusion that can Insert and Delete : Abstract: Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation by systematically removing structural noise...
- Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning : Abstract: Large language models (LLMs) continually evolve through pre-training on ever-expanding web data, but this adaptive process also exposes them to subtle forms of misinformation. While prior wo...
- SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation : Abstract: Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models re...
- Integrating Ontologies with Large Language Models for Enhanced Control Systems in Chemical Engineering : Abstract: This work presents an ontology-integrated large language model (LLM) framework for chemical engineering that unites structured domain knowledge with generative reasoning. The proposed pipeli...
- Discovering EV Charging Site Archetypes Through Few Shot Forecasting: The First U.S.-Wide Study : Abstract: The decarbonization of transportation relies on the widespread adoption of electric vehicles (EVs), which requires an accurate understanding of charging behavior to ensure cost-effective, gr...
- MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) have exhibited remarkable progress. However, deficiencies remain compared to human intelligence, such as hallucination and shallow pattern matching. In t...
- Predicting Household Water Consumption Using Satellite and Street View Images in Two Indian Cities : Abstract: Monitoring household water use in rapidly urbanizing regions is hampered by costly, time-intensive enumeration methods and surveys. We investigate whether publicly available imagery-satellit...
- HADSF: Aspect Aware Semantic Control for Explainable Recommendation : Abstract: Recent advances in large language models (LLMs) promise more effective information extraction for review-based recommender systems, yet current methods still (i) mine free-form reviews witho...
- Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules : Abstract: Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and und...
- Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems : Abstract: Mixture-of-Experts (MoE) models improve transformer efficiency but lack a unified theoretical explanation, especially when both feed-forward and attention layers are allowed to specialize. T...
- Thought Branches: Interpreting LLM Reasoning Requires Resampling : Abstract: Most work interpreting reasoning models studies only a single chain-of-thought (CoT), yet these models define distributions over many possible CoTs. We argue that studying a single sample is...
- FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models : Abstract: AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly ap...
- InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames : Abstract: Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. Th...
- DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm : Abstract: To prevent inference attacks in Federated Learning (FL) and reduce the leakage of sensitive information, Client-level Differentially Private Federated Learning (CL-DPFL) is widely used. Howe...
- Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation : Abstract: Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major chal...
- Leveraging Generic Time Series Foundation Models for EEG Classification : Abstract: Foundation models for time series are emerging as powerful general-purpose backbones, yet their potential for domain-specific biomedical signals such as electroencephalography (EEG) remains ...
- TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control : Abstract: Large Language Models (LLMs) training is prohibitively expensive, driving interest in low-precision fully-quantized training (FQT). While novel 4-bit formats like NVFP4 offer substantial eff...
- DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models : Abstract: We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects. While recently developed Arabic and multilingual benchm...
- EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities : Abstract: Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, the...
- Sybil-Resistant Service Discovery for Agent Economies : Abstract: x402 enables Hypertext Transfer Protocol (HTTP) services like application programming interfaces (APIs), data feeds, and inference providers to accept cryptocurrency payments for access. As ...
- Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs : Abstract: This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combini...
- CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments : Abstract: As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functio...
- Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum : Abstract: The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is...
- Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning : Abstract: Spatial understanding remains a weakness of Large Vision-Language Models (LVLMs). Existing supervised fine-tuning (SFT) and recent reinforcement learning with verifiable rewards (RLVR) pipel...
- Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models : Abstract: Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to deve...
- Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation : Abstract: Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways t...
- VessShape: Few-shot 2D blood vessel segmentation by leveraging shape priors from synthetic images : Abstract: Semantic segmentation of blood vessels is an important task in medical image analysis, but its progress is often hindered by the scarcity of large annotated datasets and the poor generalizat...
- Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition : Abstract: Modern deep neural networks (DNNs) are typically trained with a global cross-entropy loss in a supervised end-to-end manner: neurons need to store their outgoing weights; training alternates...
- Community Detection on Model Explanation Graphs for Explainable AI : Abstract: Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions but often miss higher-order structure: sets of features that act in concert. We propose Modules of Influence (Mo...
- Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems : Abstract: In the rapidly evolving field of multi-agent reinforcement learning (MARL), understanding the dynamics of open systems is crucial. Openness in MARL refers to the dynam-ic nature of agent pop...
- PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting : Abstract: Recent advances in vision-language models (VLMs) have enabled impressive multimodal reasoning, yet most medical applications remain limited to 2D imaging. In this work, we extend VLMs to 3D ...
- Continuous Autoregressive Language Models : Abstract: The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design...
- A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods : Abstract: Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort req...
- Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models : Abstract: Semantic interpretability in Reinforcement Learning (RL) enables transparency and verifiability of decision-making. Achieving semantic interpretability in reinforcement learning requires (1)...
- A Framework for Objective-Driven Dynamical Stochastic Fields : Abstract: Fields offer a versatile approach for describing complex systems composed of interacting and dynamic components. In particular, some of these dynamical and stochastic systems may exhibit goa...
- Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning : Abstract: Recent studies have shown that reinforcement learning with verifiable rewards (RLVR) enhances overall accuracy (pass@1) but often fails to improve capability (pass@k) of LLMs in reasoning ta...
- Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning : Abstract: Current advances in AI and its applicability have highlighted the need to ensure its trustworthiness for legal, ethical, and even commercial reasons. Sub-symbolic machine learning algorithms...
- Don't throw the baby out with the bathwater: How and why deep learning for ARC : Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) presents a formidable challenge for AI systems. Despite the typically low performance on ARC, the deep learning paradigm remains the most effec...
- NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration : Abstract: Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. How...
- HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search : Abstract: Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipeline...
- Red Teaming AI Red Teaming : Abstract: Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI...
- Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind : Abstract: We report a structural convergence among four influential theories of mind: Kahneman's dual-system theory, Friston's predictive processing, Minsky's society of mind, and Clark's extended min...
- Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey : Abstract: As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and i...
- Continual Vision-and-Language Navigation : Abstract: Developing Vision-and-Language Navigation (VLN) agents typically assumes a \textit{train-once-deploy-once} strategy, which is unrealistic as deployed agents continually encounter novel envir...
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher : Abstract: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to ...
- RepoMasterEval: Evaluating Code Completion via Real-World Repositories : Abstract: With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on ...
- Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI : Abstract: Explainable AI (XAI) aims to make AI systems more transparent, yet many practices emphasise mathematical rigour over practical user needs. We propose an alternative to this model-centric app...
- SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks : Abstract: Direct alignment algorithms have proven an effective step for aligning language models to human-desired behaviors. Current variants of the Direct Preference Optimization objective have focus...
- A Systematic Literature Review of Spatio-Temporal Graph Neural Network Models for Time Series Forecasting and Classification : Abstract: In recent years, spatio-temporal graph neural networks (GNNs) have attracted considerable interest in the field of time series analysis, due to their ability to capture, at once, dependencie...
- Representative Social Choice: From Learning Theory to AI Alignment : Abstract: Social choice theory is the study of preference aggregation across a population, used both in mechanism design for human agents and in the democratic alignment of language models. In this st...
- LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models : Abstract: Mixture of experts (MoE) architectures have become a cornerstone for scaling up and are a key component in most large language models such as GPT-OSS, DeepSeek-V3, Llama-4, and Gemini-2.5. H...
- Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization : Abstract: The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Exis...
- SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents : Abstract: With the integration of large language models (LLMs), embodied agents have strong capabilities to understand and plan complicated natural language instructions. However, a foreseeable issue ...
- Multilingual State Space Models for Structured Question Answering in Indic Languages : Abstract: The diversity and complexity of Indic languages present unique challenges for natural language processing (NLP) tasks, particularly in the domain of question answering (QA).To address these ...
- On-device Computation of Single-lead ECG Parameters for Real-time Remote Cardiac Health Assessment: A Real-world Validation Study : Abstract: Accurate, continuous out-of-hospital electrocardiogram (ECG) parameter measurement is vital for real-time cardiac health monitoring and telemedicine. On-device computation of single-lead ECG...
- Training a Generally Curious Agent : Abstract: Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information g...
- Scalable Best-of-N Selection for Large Language Models via Self-Certainty : Abstract: Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs) through increased test-time computation. Current state-of-the-art methods often...
- Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing : Abstract: This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded pert...
- (How) Do Language Models Track State? : Abstract: Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that seem to require tracking the unobserved state of an evolving world. How do they do this? W...
- A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models : Abstract: Automatically annotating job data with standardized occupations from taxonomies, known as occupation classification, is crucial for labor market analysis. However, this task is often hindere...
- Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics : Abstract: Accurate emotion recognition is pivotal for nuanced and engaging human-computer interactions, yet remains difficult to achieve, especially in dynamic, conversation-like settings. In this stu...
- Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines : Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents wit...
- Discriminative Rule Learning for Outcome-Guided Process Model Discovery : Abstract: Event logs extracted from information systems offer a rich foundation for understanding and improving business processes. In many real-world applications, it is possible to distinguish betwe...
- An In-depth Study of LLM Contributions to the Bin Packing Problem : Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based gen...
- ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use : Abstract: Recently, large language models (LLMs) have demonstrated remarkable problem-solving capabilities by autonomously integrating with external tools for collaborative reasoning. However, due to ...
- Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints : Abstract: Modelling pedestrian-driver interactions is critical for understanding human road user behaviour and developing safe autonomous vehicle systems. Existing approaches often rely on rule-based ...
- Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry : Abstract: A fundamental bottleneck in human-AI collaboration is the "intention expression gap," the difficulty for humans to effectively convey complex, high-dimensional thoughts to AI. This challenge...
- DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains : Abstract: Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like ``overthinking'' simple problems and ``underthinking'' complex ones. Whi...
- GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language : Abstract: Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face cha...
- Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance : Abstract: Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets...
- SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning : Abstract: Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely ...
- InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research : Abstract: AI agents could accelerate scientific discovery by automating hypothesis formation, experiment design, coding, execution, and analysis, yet existing benchmarks probe narrow skills in simplif...
- VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation : Abstract: Automation of Register Transfer Level (RTL) design can help developers meet increasing computational demands. Large Language Models (LLMs) show promise for Hardware Description Language (HDL...
- Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning : Abstract: Multimodal large language models (MLLMs) have advanced embodied agents by enabling direct perception, reasoning, and planning task-oriented actions from visual inputs. However, such vision d...
- Validity Is What You Need : Abstract: While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist ...
- Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training : Abstract: Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them t...
- MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design : Abstract: Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representation...
- A Neural Architecture Search Method using Auxiliary Evaluation Metric based on ResNet Architecture : Abstract: This paper proposes a neural architecture search space using ResNet as a framework, with search objectives including parameters for convolution, pooling, fully connected layers, and connecti...
- A Transformer-based Neural Architecture Search Method : Abstract: This paper presents a neural architecture search method based on Transformer architecture, searching cross multihead attention computation ways for different number of encoder and decoder co...
- Detecting Prefix Bias in LLM-based Reward Models : Abstract: Reinforcement Learning with Human Feedback (RLHF) has emerged as a key paradigm for task-specific fine-tuning of language models using human preference data. While numerous publicly availabl...
- VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus : Abstract: We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner...
- EARS-UDE: Evaluating Auditory Response in Sensory Overload with Universal Differential Equations : Abstract: Auditory sensory overload affects 50-70% of individuals with Autism Spectrum Disorder (ASD), yet existing approaches, such as mechanistic models (Hodgkin Huxley type, Wilson Cowan, excitatio...
- Reinforcement Learning for Accelerator Beamline Control: a simulation-based approach : Abstract: Particle accelerators play a pivotal role in advancing scientific research, yet optimizing beamline configurations to maximize particle transmission remains a labor-intensive task requiring ...
- Impact of clinical decision support systems (cdss) on clinical outcomes and healthcare delivery in low- and middle-income countries: protocol for a systematic review and meta-analysis : Abstract: Clinical decision support systems (CDSS) are used to improve clinical and service outcomes, yet evidence from low- and middle-income countries (LMICs) is dispersed. This protocol outlines me...
- Systematic Absence of Low-Confidence Nighttime Fire Detections in VIIRS Active Fire Product: Evidence of Undocumented Algorithmic Filtering : Abstract: The Visible Infrared Imaging Radiometer Suite (VIIRS) active fire product is widely used for global fire monitoring, yet its confidence classification scheme exhibits an undocumented systema...
- GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment : Abstract: Dance-to-music (D2M) generation aims to automatically compose music that is rhythmically and temporally aligned with dance movements. Existing methods typically rely on coarse rhythm embeddi...
- See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement : Abstract: Unlike existing methods that rely on source images as appearance references and use source speech to generate motion, this work proposes a novel approach that directly extracts information f...
- Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features : Abstract: Speech Emotion Recognition (SER) is a key affective computing technology that enables emotionally intelligent artificial intelligence. While SER is challenging in general, it is particularly...
- LeMat-Synth: a multi-modal toolbox to curate broad synthesis procedure databases from scientific literature : Abstract: The development of synthesis procedures remains a fundamental challenge in materials discovery, with procedural knowledge scattered across decades of scientific literature in unstructured fo...
- R3GAN-based Optimal Strategy for Augmenting Small Medical Dataset : Abstract: Medical image analysis often suffers from data scarcity and class imbalance, limiting the effectiveness of deep learning models in clinical applications. Using human embryo time-lapse imagin...
- VISAT: Benchmarking Adversarial and Distribution Shift Robustness in Traffic Sign Recognition with Visual Attributes : Abstract: We present VISAT, a novel open dataset and benchmarking suite for evaluating model robustness in the task of traffic sign recognition with the presence of visual attributes. Built upon the M...
- Diffusion-Driven Generation of Minimally Preprocessed Brain MRI : Abstract: The purpose of this study is to present and compare three denoising diffusion probabilistic models (DDPMs) that generate 3D $T_1$-weighted MRI human brain images. Three DDPMs were trained us...
- Category-Aware Semantic Caching for Heterogeneous LLM Workloads : Abstract: LLM serving systems process heterogeneous query workloads where different categories exhibit different characteristics. Code queries cluster densely in embedding space while conversational q...
- SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification : Abstract: Community-driven Text-to-SQL evaluation platforms play a pivotal role in tracking the state of the art of Text-to-SQL performance. The reliability of the evaluation process is critical for d...
- Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility : Abstract: Federated Learning (FL) enables collaborative model training without data sharing, yet participants face a fundamental challenge, e.g., simultaneously ensuring fairness across demographic gr...
- CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs : Abstract: Speculative decoding has become a widely adopted as an effective technique for lossless inference acceleration when deploying large language models (LLMs). While on-the-fly self-speculative ...
- Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token : Abstract: Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings to bypass safety guardrails. While these g...
- Leveraging Foundation Models for Enhancing Robot Perception and Action : Abstract: This thesis investigates how foundation models can be systematically leveraged to enhance robotic capabilities, enabling more effective localization, interaction, and manipulation in unstruc...
- Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench : Abstract: Reading measurement instruments is effortless for humans and requires relatively little domain expertise, yet it remains surprisingly challenging for current vision-language models (VLMs) as...
- BI-DCGAN: A Theoretically Grounded Bayesian Framework for Efficient and Diverse GANs : Abstract: Generative Adversarial Networks (GANs) are proficient at generating synthetic data but continue to suffer from mode collapse, where the generator produces a narrow range of outputs that fool...
- How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison : Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produc...
- Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence : Abstract: Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams fr...
- Scale-Aware Curriculum Learning for Ddata-Efficient Lung Nodule Detection with YOLOv11 : Abstract: Lung nodule detection in chest CT is crucial for early lung cancer diagnosis, yet existing deep learning approaches face challenges when deployed in clinical settings with limited annotated ...
- RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification : Abstract: As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-...
- Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction : Abstract: Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-...
- LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks : Abstract: The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and security breaches. Artificial Intelligence...
- Can machines think efficiently? : Abstract: The Turing Test is no longer adequate for distinguishing human and machine intelligence. With advanced artificial intelligence systems already passing the original Turing Test and contributi...
- Using Salient Object Detection to Identify Manipulative Cookie Banners that Circumvent GDPR : Abstract: The main goal of this paper is to study how often cookie banners that comply with the General Data Protection Regulation (GDPR) contain aesthetic manipulation, a design tactic to draw users'...
- Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence : Abstract: We introduce a methodology for the identification of notifiable events in the domain of healthcare. The methodology harnesses semantic frames to define fine-grained patterns and search them ...
- Overview of the MEDIQA-OE 2025 Shared Task on Medical Order Extraction from Doctor-Patient Consultations : Abstract: Clinical documentation increasingly uses automatic speech recognition and summarization, yet converting conversations into actionable medical orders for Electronic Health Records remains une...
- Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget : Abstract: This work tackles a critical challenge in AI safety research under limited compute: given a fixed computation budget, how can one maximize the strength of iterative adversarial attacks? Coar...
- LLMs are Overconfident: Evaluating Confidence Interval Calibration with FermiEval : Abstract: Large language models (LLMs) excel at numerical estimation but struggle to correctly quantify uncertainty. We study how well LLMs construct confidence intervals around their own answers and ...
- AIOT based Smart Education System: A Dual Layer Authentication and Context-Aware Tutoring Framework for Learning Environments : Abstract: The AIoT-Based Smart Education System integrates Artificial Intelligence and IoT to address persistent challenges in contemporary classrooms: attendance fraud, lack of personalization, stude...
- A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms : Abstract: Multi-armed bandit (MAB) problems serve as a fundamental building block for more complex reinforcement learning algorithms. However, evaluating and comparing MAB algorithms remains challengi...
- Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase : Abstract: While world models are increasingly positioned as a pathway to overcoming data scarcity in domains such as robotics, open training infrastructure for world modeling remains nascent. We intro...
- A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics : Abstract: Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especia...
- Elastic Architecture Search for Efficient Language Models : Abstract: As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significa...
- Dataset Creation and Baseline Models for Sexism Detection in Hausa : Abstract: Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, th...
- Detecting Data Contamination in LLMs via In-Context Learning : Abstract: We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes betwe...
- Consistency Training Helps Stop Sycophancy and Jailbreaks : Abstract: An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy inappropriate requests which are wrapped wi...
- Towards a Measure of Algorithm Similarity : Abstract: Given two algorithms for the same problem, can we determine whether they are meaningfully different? In full generality, the question is uncomputable, and empirically it is muddied by compet...
- Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation : Abstract: Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often limits trust, particularly in decisions that ...
- QiNN-QJ: A Quantum-inspired Neural Network with Quantum Jump for Multimodal Sentiment Analysis : Abstract: Quantum theory provides non-classical principles, such as superposition and entanglement, that inspires promising paradigms in machine learning. However, most existing quantum-inspired fusio...
- Expressive Range Characterization of Open Text-to-Audio Models : Abstract: Text-to-audio models are a type of generative model that produces audio output in response to a given textual prompt. Although level generators and the properties of the functional content t...
- AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys : Abstract: Conventional online surveys provide limited personalization, often resulting in low engagement and superficial responses. Although AI survey chatbots improve convenience, most are still reac...
- ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding : Abstract: Recent advances in neural decoding have enabled the reconstruction of visual experiences from brain activity, positioning fMRI-to-image reconstruction as a promising bridge between neuroscie...
- Exploring Landscapes for Better Minima along Valleys : Abstract: Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the com...
- MARIA: A Framework for Marginal Risk Assessment without Ground Truth in AI Systems : Abstract: Before deploying an AI system to replace an existing process, it must be compared with the incumbent to ensure improvement without added risk. Traditional evaluation relies on ground truth f...
- Generating Accurate and Detailed Captions for High-Resolution Images : Abstract: Vision-language models (VLMs) often struggle to generate accurate and detailed captions for high-resolution images since they are typically pre-trained on low-resolution inputs (e.g., 224x22...
- H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models : Abstract: Diffusion models have emerged as state-of-the-art in image generation, but their practical deployment is hindered by the significant computational cost of their iterative denoising process. ...
- Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler : Abstract: Harmful fine-tuning poses critical safety risks to fine-tuning-as-a-service for large language models. Existing defense strategies preemptively build robustness via attack simulation but suf...
- FMint-SDE: A Multimodal Foundation Model for Accelerating Numerical Simulation of SDEs via Error Correction : Abstract: Fast and accurate simulation of dynamical systems is a fundamental challenge across scientific and engineering domains. Traditional numerical integrators often face a trade-off between accur...
- Dual-level Progressive Hardness-Aware Reweighting for Cross-View Geo-Localization : Abstract: Cross-view geo-localization (CVGL) between drone and satellite imagery remains challenging due to severe viewpoint gaps and the presence of hard negatives, which are visually similar but geo...
- Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications : Abstract: Model inversion, which aims to reconstruct the original training data from pre-trained discriminative models, is especially useful when the original training data is unavailable due to priva...
- Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures : Abstract: As Large Language Models (LLMs) are increasingly integrated into automated, multi-stage pipelines, risk patterns that arise from unvalidated trust between processing stages become a practica...
- Vectorized Online POMDP Planning : Abstract: Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning unde...
- MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models : Abstract: The proliferation of memes on social media necessitates the capabilities of multimodal Large Language Models (mLLMs) to effectively understand multimodal harmfulness. Existing evaluation app...
- Feature-Function Curvature Analysis: A Geometric Framework for Explaining Differentiable Models : Abstract: Explainable AI (XAI) is critical for building trust in complex machine learning models, yet mainstream attribution methods often provide an incomplete, static picture of a model's final stat...
- Multi-Modal Feature Fusion for Spatial Morphology Analysis of Traditional Villages via Hierarchical Graph Neural Networks : Abstract: Villages areas hold significant importance in the study of human-land relationships. However, with the advancement of urbanization, the gradual disappearance of spatial characteristics and t...
- Privacy-Aware Continual Self-Supervised Learning on Multi-Window Chest Computed Tomography for Domain-Shift Robustness : Abstract: We propose a novel continual self-supervised learning (CSSL) framework for simultaneously learning diverse features from multi-window-obtained chest computed tomography (CT) images and ensur...
- Soft Task-Aware Routing of Experts for Equivariant Representation Learning : Abstract: Equivariant representation learning aims to capture variations induced by input transformations in the representation space, whereas invariant representation learning encodes semantic inform...
- DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries : Abstract: Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully...
- Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes : Abstract: Application modernization in legacy languages such as COBOL, PL/I, and REXX faces an acute shortage of resources, both in expert availability and in high-quality human evaluation data. While...
- Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs : Abstract: Evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning, for example in conversational settings, is hampered by the e...
- Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication : Abstract: Brain-to-speech (BTS) systems represent a groundbreaking approach to human communication by enabling the direct transformation of neural activity into linguistic expressions. While recent no...
- Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation : Abstract: Dataset distillation condenses large datasets into synthetic subsets, achieving performance comparable to training on the full dataset while substantially reducing storage and computation co...
- Languages are Modalities: Cross-Lingual Alignment via Encoder Injection : Abstract: Instruction-tuned Large Language Models (LLMs) underperform on low resource, non-Latin scripts due to tokenizer fragmentation and weak cross-lingual coupling. We present LLINK (Latent Langua...
- Higher-order Linear Attention : Abstract: The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provi...
- MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models : Abstract: As large language models (LLMs) enter the medical domain, most benchmarks evaluate them on question answering or descriptive reasoning, overlooking quantitative reasoning critical to clinica...
- Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models? : Abstract: Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, yet they still suffer from a multilingual reasoning gap, performing better in high-resource languages ...
- FOCUS: Efficient Keyframe Selection for Long Video Understanding : Abstract: Multimodal large language models (MLLMs) represent images and video frames as visual tokens. Scaling from single images to hour-long videos, however, inflates the token budget far beyond pra...
- HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction : Abstract: Accurate prediction of Drug-Target Affinity (DTA) is crucial for reducing experimental costs and accelerating early screening in computational drug discovery. While sequence-based deep learn...
- Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments : Abstract: Enterprise systems are crucial for enhancing productivity and decision-making among employees and customers. Integrating LLM based systems into enterprise systems enables intelligent automat...
- Un-Attributability: Computing Novelty From Retrieval & Semantic Similarity : Abstract: Understanding how language-model outputs relate to the pretraining corpus is central to studying model behavior. Most training data attribution (TDA) methods ask which training examples caus...
- CASR-Net: An Image Processing-focused Deep Learning-based Coronary Artery Segmentation and Refinement Network for X-ray Coronary Angiogram : Abstract: Early detection of coronary artery disease (CAD) is critical for reducing mortality and improving patient treatment planning. While angiographic image analysis from X-rays is a common and co...
- Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis : Abstract: We consider the problem of ultra-low bit rate visual communication for remote vision analysis, human interactions and control in challenging scenarios with very low communication bandwidth, ...
- Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V : Abstract: We present a practical pipeline for fine-tuning open-source video diffusion transformers to synthesize cinematic scenes for television and film production from small datasets. The proposed t...
- Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity : Abstract: Chain-of-thought (CoT) outputs let us read a model's step-by-step reasoning. Since any long, serial reasoning process must pass through this textual trace, the quality of the CoT is a direct...
- Spiking Neural Networks: The Future of Brain-Inspired Computing : Abstract: Spiking Neural Networks (SNNs) represent the latest generation of neural computation, offering a brain-inspired alternative to conventional Artificial Neural Networks (ANNs). Unlike ANNs, wh...
- Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs : Abstract: Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs). It typically locates knowledge storage modules and then modifies their ...
- FedMuon: Accelerating Federated Learning with Matrix Orthogonalization : Abstract: The core bottleneck of Federated Learning (FL) lies in the communication rounds. That is, how to achieve more effective local updates is crucial for reducing communication rounds. Existing F...
- Atlas-Alignment: Making Interpretability Transferable Across Language Models : Abstract: Interpretability is crucial for building safe, reliable, and controllable language models, yet existing interpretability pipelines remain costly and difficult to scale. Interpreting a new mo...
- Who Does Your Algorithm Fail? Investigating Age and Ethnic Bias in the MAMA-MIA Dataset : Abstract: Deep learning models aim to improve diagnostic workflows, but fairness evaluation remains underexplored beyond classification, e.g., in image segmentation. Unaddressed segmentation bias can ...
- Learning Soft Robotic Dynamics with Active Exploration : Abstract: Soft robots offer unmatched adaptability and safety in unstructured environments, yet their compliant, high-dimensional, and nonlinear dynamics make modeling for control notoriously difficul...
- Mitigating Semantic Collapse in Partially Relevant Video Retrieval : Abstract: Partially Relevant Video Retrieval (PRVR) seeks videos where only part of the content matches a text query. Existing methods treat every annotated text-video pair as a positive and all other...
- CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging : Abstract: Vision Transformers (ViTs) have demonstrated strong potential in medical imaging; however, their high computational demands and tendency to overfit on small datasets limit their applicabilit...
- VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision : Abstract: Supervised fine-tuning (SFT) on long chain-of-thought (CoT) trajectories has emerged as a crucial technique for enhancing the reasoning abilities of large language models (LLMs). However, th...
- CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions : Abstract: Large Language Model (LLM) agents have evolved from basic text generation to autonomously completing complex tasks through interaction with external tools. However, current benchmarks mainly...
- Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base : Abstract: Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step...
- The Denario project: Deep knowledge AI agents for scientific discovery : Abstract: We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature...
- Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS Operations : Abstract: Cyber-physical systems increasingly rely on Foundational Models such as Large Language Models (LLMs) and Vision-Language Models (VLMs) to increase autonomy through enhanced perception, infer...
- SUSTAINABLE Platform: Seamless Smart Farming Integration Towards Agronomy Automation : Abstract: The global agricultural sector is undergoing a transformative shift, driven by increasing food demands, climate variability and the need for sustainable practices. SUSTAINABLE is a smart far...
- Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models : Abstract: Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizati...
- e1: Learning Adaptive Control of Reasoning Effort : Abstract: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of ...
- Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement : Abstract: Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, ...
- CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning : Abstract: State-of-the-art (SOTA) LLMs have progressed from struggling on proof-based Olympiad problems to solving most of the IMO 2025 problems, with leading systems reportedly handling 5 of 6 proble...
- Glia: A Human-Inspired AI for Automated Systems Design and Optimization : Abstract: Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that...
- From product to system network challenges in system of systems lifecycle management : Abstract: Today, products are no longer isolated artifacts, but nodes in networked systems. This means that traditional, linearly conceived life cycle models are reaching their limits: Interoperabilit...
- Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering : Abstract: The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite ...
- GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation : Abstract: While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We p...
Research Sources: 415 | Generated: 11/3/2025
