AI Research News Feeds for November 17th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation : Abstract: Recent advances in multimodal large language models (MLLMs) have enabled impressive progress in vision-language understanding, yet their high computational cost limits deployment in resource...
A Comparison of Lightweight Deep Learning Models for Particulate-Matter Nowcasting in the Indian Subcontinent & Surrounding Regions : Abstract: This paper is a submission for the Weather4Cast~2025 complementary Pollution Task and presents an efficient framework for 6-hour lead-time nowcasting of PM$_1$, PM$_{2.5}$, and PM$_{10}$ acr...
Computationally-efficient deep learning models for nowcasting of precipitation: A solution for the Weather4cast 2025 challenge : Abstract: This study presents a transfer-learning framework based on Convolutional Gated Recurrent Units (ConvGRU) for short-term rainfall prediction in the Weather4Cast 2025 competition. A single SEV...
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery : Abstract: Geospatial chain of thought (CoT) reasoning is essential for advancing Visual Question Answering (VQA) on satellite imagery, particularly in climate related applications such as disaster mon...
One-to-N Backdoor Attack in 3D Point Cloud via Spherical Trigger : Abstract: Backdoor attacks represent a critical threat to deep learning systems, particularly in safety-sensitive 3D domains such as autonomous driving and robotics. However, existing backdoor attacks...
MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI : Abstract: Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modali...
RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) has recently gained great attention in the 3D scene representation for its high-quality real-time rendering capabilities. However, when the input comprises spars...
Positional Bias in Multimodal Embedding Models: Do They Favor the Beginning, the Middle, or the End? : Abstract: Positional bias - where models overemphasize certain positions regardless of content - has been shown to negatively impact model performance across various tasks. While recent research has e...
DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding : Abstract: The generalization of 3D deep learning across multiple domains remains limited by the limited scale of existing datasets and the high heterogeneity of multi-source point clouds. Point clouds...
Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing : Abstract: In recent years, image editing has garnered growing attention. However, general image editing models often fail to produce satisfactory results when confronted with new styles. The challenge...
Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression : Abstract: Existing Vision Language Models (VLMs) architecturally rooted in "flatland" perception, fundamentally struggle to comprehend real-world 3D spatial intelligence. This failure stems from a dua...
Arcee: Differentiable Recurrent State Chain for Generative Vision Modeling with Mamba SSMs : Abstract: State-space models (SSMs), Mamba in particular, are increasingly adopted for long-context sequence modeling, providing linear-time aggregation via an input-dependent, causal selective-scan o...
CountSteer: Steering Attention for Object Counting in Diffusion Models : Abstract: Text-to-image diffusion models generate realistic and coherent images but often fail to follow numerical instructions in text, revealing a gap between language and visual representation. Int...
GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving : Abstract: Vision-language models have recently emerged as promising planners for autonomous driving, where success hinges on topology-aware reasoning over spatial structure and dynamic interactions fr...
{\Phi}eat: Physically-Grounded Feature Representation : Abstract: Foundation models have emerged as effective backbones for many vision tasks. However, current self-supervised features entangle high-level semantics with low-level physical factors, such as ...
Coordinative Learning with Ordinal and Relational Priors for Volumetric Medical Image Segmentation : Abstract: Volumetric medical image segmentation presents unique challenges due to the inherent anatomical structure and limited availability of annotations. While recent methods have shown promise by ...
RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image : Abstract: Gaze redirection methods aim to generate realistic human face images with controllable eye movement. However, recent methods often struggle with 3D consistency, efficiency, or quality, limit...
SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing : Abstract: The advancement of artificial intelligence generated content (AIGC) has created a pressing need for robust image watermarking that can withstand both conventional signal processing and novel...
6D Strawberry Pose Estimation: Real-time and Edge AI Solutions Using Purely Synthetic Training Data : Abstract: Automated and selective harvesting of fruits has become an important area of research, particularly due to challenges such as high costs and a shortage of seasonal labor in advanced economie...
DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding : Abstract: Large Vision-Language Models (LVLMs) have demonstrated strong multimodal reasoning capabilities on long and complex documents. However, their high memory footprint makes them impractical for...
YCB-Ev SD: Synthetic event-vision dataset for 6DoF object pose estimation : Abstract: We introduce YCB-Ev SD, a synthetic dataset of event-camera data at standard definition (SD) resolution for 6DoF object pose estimation. While synthetic data has become fundamental in frame-...
Free3D: 3D Human Motion Emerges from Single-View 2D Supervision : Abstract: Recent 3D human motion generation models demonstrate remarkable reconstruction accuracy yet struggle to generalize beyond training distributions. This limitation arises partly from the use o...
Unsupervised Segmentation of Micro-CT Scans of Polyurethane Structures By Combining Hidden-Markov-Random Fields and a U-Net : Abstract: Extracting digital material representations from images is a necessary prerequisite for a quantitative analysis of material properties. Different segmentation approaches have been extensivel...
Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis : Abstract: Video-based Affective Computing (VAC), vital for emotion analysis and human-computer interaction, suffers from model instability and representational degradation due to complex emotional dyn...
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model : Abstract: Multimodal Large Language Models are increasingly applied to biomedical imaging, yet scientific reasoning for microscopy remains limited by the scarcity of large-scale, high-quality training...
Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models : Abstract: The rapid advancement of Multi-modal Large Language Models (MLLMs) has expanded their capabilities beyond high-level vision tasks. Nevertheless, their potential for Document Image Quality As...
Shrinking the Teacher: An Adaptive Teaching Paradigm for Asymmetric EEG-Vision Alignment : Abstract: Decoding visual features from EEG signals is a central challenge in neuroscience, with cross-modal alignment as the dominant approach. We argue that the relationship between visual and brain...
Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs : Abstract: Referring Expression Comprehension (REC) requires models to localize objects in images based on natural language descriptions. Research on the area remains predominantly English-centric, des...
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation : Abstract: Recent advances in unified multimodal models (UMMs) have enabled impressive progress in visual comprehension and generation. However, existing datasets and benchmarks focus primarily on sing...
Hi-DREAM: Brain Inspired Hierarchical Diffusion for fMRI Reconstruction via ROI Encoder and visuAl Mapping : Abstract: Mapping human brain activity to natural images offers a new window into vision and cognition, yet current diffusion-based decoders face a core difficulty: most condition directly on fMRI fea...
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models : Abstract: Multimodal large language models (MLLMs) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When quer...
Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification : Abstract: Multimodal classification in remote sensing often suffers from missing modalities caused by environmental interference, sensor failures, or atmospheric effects, which severely degrade classi...
Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery : Abstract: Recent advances in generative modeling have substantially enhanced 3D urban generation, enabling applications in digital twins, virtual cities, and large-scale simulations. However, existing...
Multimodal Posterior Sampling-based Uncertainty in PD-L1 Segmentation from H&E Images : Abstract: Accurate assessment of PD-L1 expression is critical for guiding immunotherapy, yet current immunohistochemistry (IHC) based methods are resource-intensive. We present nnUNet-B: a Bayesian se...
OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning : Abstract: Ultrasound (US) is one of the most widely used medical imaging modalities, thanks to its low cost, portability, real-time feedback, and absence of ionizing radiation. However, US image inter...
Bridging Hidden States in Vision-Language Models : Abstract: Vision-Language Models (VLMs) are a new family of models that align image content with natural language. Existing approaches typically fuse either (a) early: by mixing tokens/features inside...
LARM: A Large Articulated-Object Reconstruction Model : Abstract: Modeling 3D articulated objects with realistic geometry, textures, and kinematics is essential for a wide range of applications. However, existing optimization-based reconstruction methods o...
DualVision ArthroNav: Investigating Opportunities to Enhance Localization and Reconstruction in Image-based Arthroscopy Navigation via External Cameras : Abstract: Arthroscopic procedures can greatly benefit from navigation systems that enhance spatial awareness, depth perception, and field of view. However, existing optical tracking solutions impose s...
Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues : Abstract: The adoption of pre-trained visual representations (PVRs), leveraging features from large-scale vision models, has become a popular paradigm for training visuomotor policies. However, these ...
From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring : Abstract: Image deblurring is vital in computer vision, aiming to recover sharp images from blurry ones caused by motion or camera shake. While deep learning approaches such as CNNs and Vision Transfo...
Boosting Neural Video Representation via Online Structural Reparameterization : Abstract: Neural Video Representation~(NVR) is a promising paradigm for video compression, showing great potential in improving video storage and transmission efficiency. While recent advances have ma...
AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization : Abstract: Recent advancements in Audio-Video Large Language Models (AV-LLMs) have enhanced their capabilities in tasks like audio-visual question answering and multimodal dialog systems. Video and aud...
Deep Learning-Enhanced Analysis for Delineating Anticoagulant Essay Efficacy Using Phase Microscopy : Abstract: The coagulation of blood after it is drawn from the body poses a significant challenge for hematological analysis, potentially leading to inaccurate test results and altered cellular charact...
Unsupervised Motion-Compensated Decomposition for Cardiac MRI Reconstruction via Neural Representation : Abstract: Cardiac magnetic resonance (CMR) imaging is widely used to characterize cardiac morphology and function. To accelerate CMR imaging, various methods have been proposed to recover high-quality...
Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective : Abstract: As embodied agents operate in increasingly complex environments, the ability to perceive, track, and reason about individual object instances over time becomes essential, especially in tasks...
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities : Abstract: Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardi...
DENTEX: Dental Enumeration and Tooth Pathosis Detection Benchmark for Panoramic X-ray : Abstract: Panoramic X-rays are frequently used in dentistry for treatment planning, but their interpretation can be both time-consuming and prone to error. Artificial intelligence (AI) has the potenti...
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications : Abstract: Understanding how the surrounding environment changes is crucial for performing downstream tasks safely and reliably in autonomous driving applications. Recent occupancy estimation technique...
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos : Abstract: Understanding how humans would behave during hand-object interaction is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works have bee...
MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos : Abstract: Understanding human intentions and actions through egocentric videos is important on the path to embodied artificial intelligence. As a branch of egocentric vision techniques, hand trajector...
RiverScope: High-Resolution River Masking Dataset : Abstract: Surface water dynamics play a critical role in Earth's climate system, influencing ecosystems, agriculture, disaster resilience, and sustainable development. Yet monitoring rivers and surfac...
NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning : Abstract: While vision-language models (VLMs) excel at tasks involving single images or short videos, they still struggle with Long Video Question Answering (LVQA) due to its demand for complex multi-...
TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types : Abstract: We present TEyeD, the world's largest unified public data set of eye images taken with head-mounted devices. TEyeD was acquired with seven different head-mounted eye trackers. Among them, tw...
ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization : Abstract: Mixture-of-Experts (MoE) architectures expand model capacity by sparsely activating experts but face two core challenges: misalignment between router logits and each expert's internal struct...
Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning : Abstract: Class-incremental learning (CIL) enables models to continuously learn new categories from sequential tasks without forgetting previously acquired knowledge. While recent advances in vision-l...
Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation : Abstract: Autoregressive (AR) models, the theoretical performance benchmark for learned lossless image compression, are often dismissed as impractical due to prohibitive computational cost. This work ...
CLUE: Controllable Latent space of Unprompted Embeddings for Diversity Management in Text-to-Image Synthesis : Abstract: Text-to-image synthesis models require the ability to generate diverse images while maintaining stability. To overcome this challenge, a number of methods have been proposed, including the c...
EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation : Abstract: Emotion plays a pivotal role in video-based expression, but existing video generation systems predominantly focus on low-level visual metrics while neglecting affective dimensions. Although ...
MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis : Abstract: Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology, achieving strong diagnostic performance through patch-l...
Draft and Refine with Visual Experts : Abstract: While recent Large Vision-Language Models (LVLMs) exhibit strong multimodal reasoning abilities, they often produce ungrounded or hallucinated responses because they rely too heavily on ling...
SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation : Abstract: While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for s...
SUPER Decoder Block for Reconstruction-Aware U-Net Variants : Abstract: Skip-connected encoder-decoder architectures (U-Net variants) are widely adopted for inverse problems but still suffer from information loss, limiting recovery of fine high-frequency details...
EmbryoDiff: A Conditional Diffusion Framework with Multi-Focal Feature Fusion for Fine-Grained Embryo Developmental Stage Recognition : Abstract: Identification of fine-grained embryo developmental stages during In Vitro Fertilization (IVF) is crucial for assessing embryo viability. Although recent deep learning methods have achieved ...
Accelerating Controllable Generation via Hybrid-grained Cache : Abstract: Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computatio...
MPCGNet: A Multiscale Feature Extraction and Progressive Feature Aggregation Network Using Coupling Gates for Polyp Segmentation : Abstract: Automatic segmentation methods of polyps is crucial for assisting doctors in colorectal polyp screening and cancer diagnosis. Despite the progress made by existing methods, polyp segmentatio...
Hyperbolic Hierarchical Alignment Reasoning Network for Text-3D Retrieval : Abstract: With the daily influx of 3D data on the internet, text-3D retrieval has gained increasing attention. However, current methods face two major challenges: Hierarchy Representation Collapse (HR...
NP-LoRA: Null Space Projection Unifies Subject and Style in LoRA Fusion : Abstract: Low-Rank Adaptation (LoRA) fusion has emerged as a key technique for reusing and composing learned subject and style representations for controllable generation without costly retraining. Ho...
CareCom: Generative Image Composition with Calibrated Reference Features : Abstract: Image composition aims to seamlessly insert foreground object into background. Despite the huge progress in generative image composition, the existing methods are still struggling with simul...
Evaluating Latent Generative Paradigms for High-Fidelity 3D Shape Completion from a Single Depth Image : Abstract: While generative models have seen significant adoption across a wide range of data modalities, including 3D data, a consensus on which model is best suited for which task has yet to be reach...
Phys-Liquid: A Physics-Informed Dataset for Estimating 3D Geometry and Volume of Transparent Deformable Liquids : Abstract: Estimating the geometric and volumetric properties of transparent deformable liquids is challenging due to optical complexities and dynamic surface deformations induced by container movement...
SplineSplat: 3D Ray Tracing for Higher-Quality Tomography : Abstract: We propose a method to efficiently compute tomographic projections of a 3D volume represented by a linear combination of shifted B-splines. To do so, we propose a ray-tracing algorithm that ...
A Space-Time Transformer for Precipitation Forecasting : Abstract: Meteorological agencies around the world rely on real-time flood guidance to issue live-saving advisories and warnings. For decades traditional numerical weather prediction (NWP) models have...
Machine-Learning Based Detection of Coronary Artery Calcification Using Synthetic Chest X-Rays : Abstract: Coronary artery calcification (CAC) is a strong predictor of cardiovascular events, with CT-based Agatston scoring widely regarded as the clinical gold standard. However, CT is costly and im...
Detection of Bark Beetle Attacks using Hyperspectral PRISMA Data and Few-Shot Learning : Abstract: Bark beetle infestations represent a serious challenge for maintaining the health of coniferous forests. This paper proposes a few-shot learning approach leveraging contrastive learning to d...
Toward Generalized Detection of Synthetic Media: Limitations, Challenges, and the Path to Multimodal Solutions : Abstract: Artificial intelligence (AI) in media has advanced rapidly over the last decade. The introduction of Generative Adversarial Networks (GANs) improved the quality of photorealistic image gener...
Stroke Modeling Enables Vectorized Character Generation with Large Vectorized Glyph Model : Abstract: Vectorized glyphs are widely used in poster design, network animation, art display, and various other fields due to their scalability and flexibility. In typography, they are often seen as s...
Hindsight Distillation Reasoning with Knowledge Encouragement Preference for Knowledge-based Visual Question Answering : Abstract: Knowledge-based Visual Question Answering (KBVQA) necessitates external knowledge incorporation beyond cross-modal understanding. Existing KBVQA methods either utilize implicit knowledge in ...
Reverberation: Learning the Latencies Before Forecasting Trajectories : Abstract: Bridging the past to the future, connecting agents both spatially and temporally, lies at the core of the trajectory prediction task. Despite great efforts, it remains challenging to explici...
Explainable Deep Convolutional Multi-Type Anomaly Detection : Abstract: Most explainable anomaly detection methods often identify anomalies but lack the capability to differentiate the type of anomaly. Furthermore, they often require the costly training and main...
CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios : Abstract: Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS)...
Dynamic Gaussian Scene Reconstruction from Unsynchronized Videos : Abstract: Multi-view video reconstruction plays a vital role in computer vision, enabling applications in film production, virtual reality, and motion analysis. While recent advances such as 4D Gaussi...
Neuro-Spectral Architectures for Causal Physics-Informed Networks : Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs). However, standard MLP-based PINNs often fail to converge when...
NervePool: A Simplicial Pooling Layer : Abstract: For deep learning problems on graph-structured data, pooling layers are important for down sampling, reducing computational cost, and to minimize overfitting. We define a pooling layer, nerv...
CHNNet: An Artificial Neural Network With Connected Hidden Neurons : Abstract: In contrast to biological neural circuits, conventional artificial neural networks are commonly organized as strictly hierarchical architectures that exclude direct connections among neurons...
DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control : Abstract: Delayed Markov decision processes (DMDPs) fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. In reliance on these st...
Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems : Abstract: We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the $\ell_1$ setting. We propose $(\varepsilon, δ)$-DP algorithms based on stochastic mirror ...
Bayesian ICA with super-Gaussian Source Priors : Abstract: Independent Component Analysis (ICA) plays a central role in modern machine learning as a flexible framework for feature extraction. We introduce a horseshoe-type prior with a latent Polya-G...
Adaptive Parametric Activation: Unifying and Generalising Activation Functions Across Tasks : Abstract: The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classif...
Optimizing importance weighting in the presence of sub-population shifts : Abstract: A distribution shift between the training and test data can severely harm performance of machine learning models. Importance weighting addresses this issue by assigning different weights to ...
The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent : Abstract: Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of...
CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC : Abstract: Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks. We enhance the Local Deadline Partiti...
Information Extraction From Fiscal Documents Using LLMs : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in text comprehension, but their ability to process complex, hierarchical tabular data remains underexplored. We presen...
Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency : Abstract: Visual hallucination, where Multimodal Large Language Models fabricate details inconsistent with image content, critically undermines their reliability. Existing fine-tuning methods offer li...
Large language models in materials science and the need for open-source approaches : Abstract: Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mi...
SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI : Abstract: Investigating the effects of climate change and global warming caused by GHG emissions have been a primary concern worldwide. These emissions are largely contributed to by the production, us...
A methodological analysis of prompt perturbations and their effect on attack success rates : Abstract: This work aims to investigate how different Large Language Models (LLMs) alignment methods affect the models' responses to prompt attacks. We selected open source models based on the most co...
Modeling and Predicting Multi-Turn Answer Instability in Large Language Models : Abstract: As large language models (LLMs) are adopted in an increasingly wide range of applications, user-model interactions have grown in both frequency and scale. Consequently, research has focused ...
Where does an LLM begin computing an instruction? : Abstract: Following an instruction involves distinct sub-processes, such as reading content, reading the instruction, executing it, and producing an answer. We ask where, along the layer stack, instru...
"As Eastern Powers, I will veto." : An Investigation of Nation-level Bias of Large Language Models in International Relations : Abstract: This paper systematically examines nation-level biases exhibited by Large Language Models (LLMs) within the domain of International Relations (IR). Leveraging historical records from the Uni...
Faithful Summarization of Consumer Health Queries: A Cross-Lingual Framework with LLMs : Abstract: Summarizing consumer health questions (CHQs) can ease communication in healthcare, but unfaithful summaries that misrepresent medical details pose serious risks. We propose a framework that ...
Sabi\'a: Um Chatbot de Intelig\^encia Artificial Generativa para Suporte no Dia a Dia do Ensino Superior : Abstract: Students often report difficulties in accessing day-to-day academic information, which is usually spread across numerous institutional documents and websites. This fragmentation results in a...
LLM-as-a-Grader: Practical Insights from Large Language Model for Short-Answer and Report Evaluation : Abstract: Large Language Models (LLMs) are increasingly explored for educational tasks such as grading, yet their alignment with human evaluation in real classrooms remains underexamined. In this stud...
Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders : Abstract: Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with ...
From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems : Abstract: LLMs are increasingly employed as judges across a variety of tasks, including those involving everyday social interactions. Yet, it remains unclear whether such LLM-judges can reliably asses...
MedPath: Multi-Domain Cross-Vocabulary Hierarchical Paths for Biomedical Entity Linking : Abstract: Progress in biomedical Named Entity Recognition (NER) and Entity Linking (EL) is currently hindered by a fragmented data landscape, a lack of resources for building explainable models, and t...
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models : Abstract: Tool-augmented Language Models (TaLMs) can invoke external tools to solve problems beyond their parametric capacity. However, it remains unclear whether these tool-enabled gains reflect trus...
Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions : Abstract: While large language models (LLMs) offer promising capabilities for automating academic workflows, existing systems for academic peer review remain constrained by text-only inputs, limited c...
Can LLMs Detect Their Own Hallucinations? : Abstract: Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hall...
Analysing Personal Attacks in U.S. Presidential Debates : Abstract: Personal attacks have become a notable feature of U.S. presidential debates and play an important role in shaping public perception during elections. Detecting such attacks can improve trans...
Enhancing Meme Emotion Understanding with Multi-Level Modality Enhancement and Dual-Stage Modal Fusion : Abstract: With the rapid rise of social media and Internet culture, memes have become a popular medium for expressing emotional tendencies. This has sparked growing interest in Meme Emotion Understand...
Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition : Abstract: Automatic speech recognition (ASR) systems have achieved remarkable performance in common conditions but often struggle to leverage long-context information in contextualized scenarios that ...
Adverbs Revisited: Enhancing WordNet Coverage of Adverbs with a Supersense Taxonomy : Abstract: WordNet offers rich supersense hierarchies for nouns and verbs, yet adverbs remain underdeveloped, lacking a systematic semantic classification. We introduce a linguistically grounded supers...
LANE: Lexical Adversarial Negative Examples for Word Sense Disambiguation : Abstract: Fine-grained word meaning resolution remains a critical challenge for neural language models (NLMs) as they often overfit to global sentence representations, failing to capture local semanti...
destroR: Attacking Transfer Models with Obfuscous Examples to Discard Perplexity : Abstract: Advancements in Machine Learning & Neural Networks in recent years have led to widespread implementations of Natural Language Processing across a variety of fields with remarkable success, s...
LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models : Abstract: The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we ...
Studies with impossible languages falsify LMs as models of human language : Abstract: According to Futrell and Mahowald [arXiv:2501.17047], both infants and language models (LMs) find attested languages easier to learn than impossible languages that have unnatural structures....
MajinBook: An open catalogue of digital world literature with likes : Abstract: This data paper introduces MajinBook, an open catalogue designed to facilitate the use of shadow libraries--such as Library Genesis and Z-Library--for computational social science and cultur...
Proactive Hearing Assistants that Isolate Egocentric Conversations : Abstract: We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric b...
W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search : Abstract: Large Language Models (LLMs) demonstrate impressive capabilities, yet their outputs often suffer from misalignment with human preferences due to the inadequacy of weak supervision and a lack...
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning : Abstract: Frontier model progress is often measured by academic benchmarks, which offer a limited view of performance in real-world professional contexts. Existing evaluations often fail to assess ope...
CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation : Abstract: Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist: accent bias, wh...
Discovering Meaningful Units with Visually Grounded Semantics from Image Captions : Abstract: Fine-grained knowledge is crucial for vision-language models to obtain a better understanding of the real world. While there has been work trying to acquire this kind of knowledge in the spa...
Language-Aided State Estimation : Abstract: Natural language data, such as text and speech, have become readily available through social networking services and chat platforms. By leveraging human observations expressed in natural lan...
From Synthetic Scenes to Real Performance: Enhancing Spatial Reasoning in VLMs : Abstract: Fine-tuning Vision-Language Models (VLMs) is a common strategy to improve performance following an ad-hoc data collection and annotation of real-world scenes. However, this process is often ...
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding : Abstract: Comprehending long visual documents, where information is distributed across extensive pages of text and visual elements, is a critical but challenging task for modern Vision-Language Models...
Metric Learning Encoding Models: A Multivariate Framework for Interpreting Neural Representations : Abstract: Understanding how explicit theoretical features are encoded in opaque neural systems is a central challenge now common to neuroscience and AI. We introduce Metric Learning Encoding Models (M...
Improving the Downstream Performance of Mixture-of-Experts Transformers via Weak Vanilla Transformers : Abstract: Recently, Mixture of Experts (MoE) Transformers have garnered increasing attention due to their advantages in model capacity and computational efficiency. However, studies have indicated tha...
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness : Abstract: Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to ...
RASTeR: Robust, Agentic, and Structured Temporal Reasoning : Abstract: Temporal question answering (TQA) remains a challenge for large language models (LLMs), particularly when retrieved content may be irrelevant, outdated, or temporally inconsistent. This is e...
Computational Analysis of Gender Depiction in the Comedias of Calder\'on de la Barca : Abstract: In theatre, playwrights use the portrayal of characters to explore culturally based gender norms. In this paper, we develop quantitative methods to study gender depiction in the non-religiou...
A Mathematical Framework for AI Singularity: Conditions, Bounds, and Control of Recursive Improvement : Abstract: AI systems improve by drawing on more compute, data, energy, and better training methods. This paper asks a precise, testable version of the "runaway growth" question: under what measurable ...
Semantic VLM Dataset for Safe Autonomous Driving : Abstract: CAR-Scenes is a frame-level dataset for autonomous driving that enables training and evaluation of vision-language models (VLMs) for interpretable, scene-level understanding. We annotate 5,1...
Expert Consensus-based Video-Based Assessment Tool for Workflow Analysis in Minimally Invasive Colorectal Surgery: Development and Validation of ColoWorkflow : Abstract: Minimally invasive colorectal surgery is characterized by procedural variability, a difficult learning curve, and complications that impact quality and outcomes. Video-based assessment (VBA)...
Frequency-Aware Vision-Language Multimodality Generalization Network for Remote Sensing Image Classification : Abstract: The booming remote sensing (RS) technology is giving rise to a novel multimodality generalization task, which requires the model to overcome data heterogeneity while possessing powerful cros...
GFT: Graph Feature Tuning for Efficient Point Cloud Analysis : Abstract: Parameter-efficient fine-tuning (PEFT) significantly reduces computational and memory costs by updating only a small subset of the model's parameters, enabling faster adaptation to new tasks...
YOLO-Drone: An Efficient Object Detection Approach Using the GhostHead Network for Drone Images : Abstract: Object detection using images or videos captured by drones is a promising technology with significant potential across various industries. However, a major challenge is that drone images are...
PhaseWin Search Framework Enable Efficient Object-Level Interpretation : Abstract: Attribution is essential for interpreting object-level foundation models. Recent methods based on submodular subset selection have achieved high faithfulness, but their efficiency limitation...
Out-of-Distribution Detection with Positive and Negative Prompt Supervision Using Large Language Models : Abstract: Out-of-distribution (OOD) detection is committed to delineating the classification boundaries between in-distribution (ID) and OOD images. Recent advances in vision-language models (VLMs) ha...
Facial Expression Recognition with YOLOv11 and YOLOv12: A Comparative Study : Abstract: Facial Expression Recognition remains a challenging task, especially in unconstrained, real-world environments. This study investigates the performance of two lightweight models, YOLOv11n an...
Heterogeneous Complementary Distillation : Abstract: Knowledge distillation (KD)transfers the dark knowledge from a complex teacher to a compact student. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to Res...
Divide, Conquer and Unite: Hierarchical Style-Recalibrated Prototype Alignment for Federated Medical Image Segmentation : Abstract: Federated learning enables multiple medical institutions to train a global model without sharing data, yet feature heterogeneity from diverse scanners or protocols remains a major challenge....
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models : Abstract: Vision-language models (VLMs) struggle with 3D-related tasks such as spatial cognition and physical understanding, which are crucial for real-world applications like robotics and embodied ag...
DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition : Abstract: Micro expression recognition (MER) is crucial for inferring genuine emotion. Applying a multimodal large language model (MLLM) to this task enables spatio-temporal analysis of facial motion ...
Language-Guided Graph Representation Learning for Video Summarization : Abstract: With the rapid growth of video content on social media, video summarization has become a crucial task in multimedia processing. However, existing methods face challenges in capturing global ...
Towards Universal Neural Operators through Multiphysics Pretraining : Abstract: Although neural operators are widely used in data-driven physical simulations, their training remains computationally expensive. Recent advances address this issue via downstream learning, w...
Benchmarking Quantum Kernels Across Diverse and Complex Data : Abstract: Quantum kernel methods are a promising branch of quantum machine learning, yet their practical advantage on diverse, high-dimensional, real-world data remains unverified. Current research ha...
SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces? : Abstract: Equation discovery from data is a core challenge in machine learning for science, requiring the recovery of concise symbolic expressions that govern complex physical and geometric phenomena....
EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence : Abstract: Low-latency delivery of satellite imagery is essential for time-critical applications such as disaster response, intelligence, and infrastructure monitoring. However, traditional pipelines r...
ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries : Abstract: Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. Howe...
Private Zeroth-Order Optimization with Public Data : Abstract: One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite ...
Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go : Abstract: Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, esp...
Multi-Joint Physics-Informed Deep Learning Framework for Time-Efficient Inverse Dynamics : Abstract: Time-efficient estimation of muscle activations and forces across multi-joint systems is critical for clinical assessment and assistive device control. However, conventional approaches are c...
Multi-View Polymer Representations for the Open Polymer Prediction : Abstract: We address polymer property prediction with a multi-view design that exploits complementary representations. Our system integrates four families: (i) tabular RDKit/Morgan descriptors, (ii) g...
Graph Attention Network for Predicting Duration of Large-Scale Power Outages Induced by Natural Disasters : Abstract: Natural disasters such as hurricanes, wildfires, and winter storms have induced large-scale power outages in the U.S., resulting in tremendous economic and societal impacts. Accurately predi...
Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework : Abstract: Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to acc...
Cascading Bandits With Feedback : Abstract: Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probab...
Flow matching-based generative models for MIMO channel estimation : Abstract: Diffusion model (DM)-based channel estimation, which generates channel samples via a posteriori sampling stepwise with denoising process, has shown potential in high-precision channel state ...
From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging : Abstract: Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users ...
Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm : Abstract: Unsupervised domain adaptation (UDA) aims to transfer knowledge from a label-rich source domain to an unlabeled target domain by addressing domain shifts. Most UDA approaches emphasize trans...
Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning : Abstract: Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting e...
Sheaf Cohomology of Linear Predictive Coding Networks : Abstract: Predictive coding (PC) replaces global backpropagation with local optimization over weights and activations. We show that linear PC networks admit a natural formulation as cellular sheaves: ...
SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems : Abstract: The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network l...
Improving Continual Learning of Knowledge Graph Embeddings via Informed Initialization : Abstract: Many Knowledege Graphs (KGs) are frequently updated, forcing their Knowledge Graph Embeddings (KGEs) to adapt to these changes. To address this problem, continual learning techniques for KGE...
Anomaly Detection in High-Dimensional Bank Account Balances via Robust Methods : Abstract: Detecting point anomalies in bank account balances is essential for financial institutions, as it enables the identification of potential fraud, operational issues, or other irregularities. ...
Deep Learning for Short-Term Precipitation Prediction in Four Major Indian Cities: A ConvLSTM Approach with Explainable AI : Abstract: Deep learning models for precipitation forecasting often function as black boxes, limiting their adoption in real-world weather prediction. To enhance transparency while maintaining accuracy...
Adaptive Symmetrization of the KL Divergence : Abstract: Many tasks in machine learning can be described as or reduced to learning a probability distribution given a finite set of samples. A common approach is to minimize a statistical divergence ...
Training Neural Networks at Any Scale : Abstract: This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified al...
Power Ensemble Aggregation for Improved Extreme Event AI Prediction : Abstract: This paper addresses the critical challenge of improving predictions of climate extreme events, specifically heat waves, using machine learning methods. Our work is framed as a classificatio...
On-line learning of dynamic systems: sparse regression meets Kalman filtering : Abstract: Learning governing equations from data is central to understanding the behavior of physical systems across diverse scientific disciplines, including physics, biology, and engineering. The Si...
Dynamic Deep Graph Learning for Incomplete Multi-View Clustering with Masked Graph Reconstruction Loss : Abstract: The prevalence of real-world multi-view data makes incomplete multi-view clustering (IMVC) a crucial research. The rapid development of Graph Neural Networks (GNNs) has established them as o...
LoRaCompass: Robust Reinforcement Learning to Efficiently Search for a LoRa Tag : Abstract: The Long-Range (LoRa) protocol, known for its extensive range and low power, has increasingly been adopted in tags worn by mentally incapacitated persons (MIPs) and others at risk of going m...
When to Stop Federated Learning: Zero-Shot Generation of Synthetic Validation Data with Generative AI for Early Stopping : Abstract: Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, FL methods typically run for a predefined number of global r...
A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates : Abstract: In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal...
Sparse Methods for Vector Embeddings of TPC Data : Abstract: Time Projection Chambers (TPCs) are versatile detectors that reconstruct charged-particle tracks in an ionizing medium, enabling sensitive measurements across a wide range of nuclear physics...
Neural Network-Powered Finger-Drawn Biometric Authentication : Abstract: This paper investigates neural network-based biometric authentication using finger-drawn digits on touchscreen devices. We evaluated CNN and autoencoder architectures for user authentication...
Heterogeneous Attributed Graph Learning via Neighborhood-Aware Star Kernels : Abstract: Attributed graphs, typically characterized by irregular topologies and a mix of numerical and categorical attributes, are ubiquitous in diverse domains such as social networks, bioinformatic...
Toward Scalable Early Cancer Detection: Evaluating EHR-Based Predictive Models Against Traditional Screening Criteria : Abstract: Current cancer screening guidelines cover only a few cancer types and rely on narrowly defined criteria such as age or a single risk factor like smoking history, to identify high-risk indivi...
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits : Abstract: Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including byte-level LLMs, which are tokeniser-free but prohibitive...
Toward Multi-Fidelity Machine Learning Force Field for Cathode Materials : Abstract: Machine learning force fields (MLFFs), which employ neural networks to map atomic structures to system energies, effectively combine the high accuracy of first-principles calculation with th...
On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization : Abstract: On-device fine-tuning is a critical capability for edge AI systems, which must support adaptation to different agentic tasks under stringent memory constraints. Conventional backpropagation ...
When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering : Abstract: Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat gen...
Robust inverse material design with physical guarantees using the Voigt-Reuss Net : Abstract: We propose a spectrally normalized surrogate for forward and inverse mechanical homogenization with hard physical guarantees. Leveraging the Voigt-Reuss bounds, we factor their difference vi...
SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming : Abstract: Phase-time arrays, which integrate phase shifters (PSs) and true-time delays (TTDs), have emerged as a cost-effective architecture for generating frequency-dependent rainbow beams in wideban...
Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning : Abstract: Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that general...
Multicalibration yields better matchings : Abstract: Consider the problem of finding the best matching in a weighted graph where we only have access to predictions of the actual stochastic weights, based on an underlying context. If the predic...
Differentiation Strategies for Acoustic Inverse Problems: Admittance Estimation and Shape Optimization : Abstract: We demonstrate a practical differentiable programming approach for acoustic inverse problems through two applications: admittance estimation and shape optimization for resonance damping. Fir...
Low-Bit, High-Fidelity: Optimal Transport Quantization for Flow Matching : Abstract: Flow Matching (FM) generative models offer efficient simulation-free training and deterministic sampling, but their practical deployment is challenged by high-precision parameter requirement...
DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference : Abstract: Diffusion models produce high quality images but inference is costly due to many denoising steps and heavy matrix operations. We present DiffPro, a post-training, hardware-faithful framework...
FairReweighing: Density Estimation-Based Reweighing Framework for Improving Separation in Fair Regression : Abstract: There has been a prevalence of applying AI software in both high-stakes public-sector and industrial contexts. However, the lack of transparency has raised concerns about whether these data-...
MoCap2Radar: A Spatiotemporal Transformer for Synthesizing Micro-Doppler Radar Signatures from Motion Capture : Abstract: We present a pure machine learning process for synthesizing radar spectrograms from Motion-Capture (MoCap) data. We formulate MoCap-to-spectrogram translation as a windowed sequence-to-seque...
Quantifying and Improving Adaptivity in Conformal Prediction through Input Transformations : Abstract: Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to examp...
Data-efficient U-Net for Segmentation of Carbide Microstructures in SEM Images of Steel Alloys : Abstract: Understanding reactor-pressure-vessel steel microstructure is crucial for predicting mechanical properties, as carbide precipitates both strengthen the alloy and can initiate cracks. In scan...
Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation : Abstract: Modern language models fail a fundamental requirement of trustworthy intelligence: knowing when not to answer. Despite achieving impressive accuracy on benchmarks, these models produce confi...
FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models : Abstract: Blocking communication presents a major hurdle in running MoEs efficiently in distributed settings. To address this, we present FarSkip-Collective which modifies the architecture of modern m...
Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications : Abstract: Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protec...
Multistability of Self-Attention Dynamics in Transformers : Abstract: In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to ...
Optimizing Mixture of Block Attention : Abstract: Mixture of Block Attention (MoBA) (Lu et al., 2025) is a promising building block for efficiently processing long contexts in LLMs by enabling queries to sparsely attend to a small subset of...
Patent Representation Learning via Self-supervision : Abstract: This paper presents a simple yet effective contrastive learning framework for learning patent embeddings by leveraging multiple views from within the same document. We first identify a paten...
Bayesian Evaluation of Large Language Model Behavior : Abstract: It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to ad...
Forecasting Spoken Language Development in Children with Cochlear Implants Using Preimplantation MRI : Abstract: Cochlear implants (CI) significantly improve spoken language in children with severe-to-profound sensorineural hearing loss (SNHL), yet outcomes remain more variable than in children with no...
Fast Data Attribution for Text-to-Image Models : Abstract: Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable comput...
Neural Local Wasserstein Regression : Abstract: We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optim...
Architecting software monitors for control-flow anomaly detection through large language models and conformance checking : Abstract: Context: Ensuring high levels of dependability in modern computer-based systems has become increasingly challenging due to their complexity. Although systems are validated at design time, th...
ICX360: In-Context eXplainability 360 Toolkit : Abstract: Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting transcripts to answering doctors' questions....
MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix Cores : Abstract: The rapidly growing computation demands of deep neural networks (DNNs) have driven hardware vendors to integrate matrix multiplication accelerators (MMAs), such as NVIDIA Tensor Cores and AM...
Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data : Abstract: Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical di...
CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology : Abstract: Biomedical text embeddings have primarily been developed using research literature from PubMed, yet clinical cardiology practice relies heavily on procedural knowledge and specialized termin...
CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding : Abstract: Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (...
PROMISE: Prompt-Attentive Hierarchical Contrastive Learning for Robust Cross-Modal Representation with Missing Modalities : Abstract: Multimodal models integrating natural language and visual information have substantially improved generalization of representation models. However, their effectiveness significantly declines...
One-Shot Transfer Learning for Nonlinear PDEs with Perturbative PINNs : Abstract: We propose a framework for solving nonlinear partial differential equations (PDEs) by combining perturbation theory with one-shot transfer learning in Physics-Informed Neural Networks (PINNs...
PRSM: A Measure to Evaluate CLIP's Robustness Against Paraphrases : Abstract: Contrastive Language-Image Pre-training (CLIP) is a widely used multimodal model that aligns text and image representations through large-scale training. While it performs strongly on zero-s...
Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths : Abstract: This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$...
Questioning the Stability of Visual Question Answering : Abstract: Visual Language Models (VLMs) have achieved remarkable progress, yet their reliability under small, meaning-preserving input changes remains poorly understood. We present the first large-sca...
Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint : Abstract: Linear models are widely used in high-stakes decision-making due to their simplicity and interpretability. Yet when fairness constraints such as demographic parity are introduced, their effe...
StochEP: Stochastic Equilibrium Propagation for Spiking Convergent Recurrent Neural Networks : Abstract: Spiking Neural Networks (SNNs) promise energy-efficient, sparse, biologically inspired computation. Training them with Backpropagation Through Time (BPTT) and surrogate gradients achieves st...
SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Systemic Weaknesses : Abstract: Wi-Fi Channel State Information (CSI) has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational feasibility. However, the field lacks a consol...
BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning : Abstract: Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable repr...
VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation : Abstract: We introduce VoxTell, a vision-language model for text-prompted volumetric medical image segmentation. It maps free-form descriptions, from single words to full clinical sentences, to 3D mas...
Synergy vs. Noise: Performance-Guided Multimodal Fusion For Biochemical Recurrence-Free Survival in Prostate Cancer : Abstract: Multimodal deep learning (MDL) has emerged as a transformative approach in computational pathology. By integrating complementary information from multiple data sources, MDL models have demon...
Adaptive Intrusion Detection for Evolving RPL IoT Attacks Using Incremental Learning : Abstract: The routing protocol for low-power and lossy networks (RPL) has become the de facto routing standard for resource-constrained IoT systems, but its lightweight design exposes critical vulnera...
Non-Euclidean SGD for Structured Optimization: Unified Analysis and Improved Rates : Abstract: Recently, several instances of non-Euclidean SGD, including SignSGD, Lion, and Muon, have attracted significant interest from the optimization community due to their practical success in tra...
Learning and Testing Convex Functions : Abstract: We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statist...
CVChess: A Deep Learning Framework for Converting Chessboard Images to Forsyth-Edwards Notation : Abstract: Chess has experienced a large increase in viewership since the pandemic, driven largely by the accessibility of online learning platforms. However, no equivalent assistance exists for physic...
Estimating Total Effects in Bipartite Experiments with Spillovers and Partial Eligibility : Abstract: We study randomized experiments in bipartite systems where only a subset of treatment-side units are eligible for assignment while all units continue to interact, generating interference. We...
On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector : Abstract: It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural netw...
Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions : Abstract: Neural Additive Models (NAMs) have recently demonstrated promising predictive performance while maintaining interpretability. However, their capacity is limited to capturing only first-order...
A Global Geometric Analysis of Maximal Coding Rate Reduction : Abstract: The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the deriv...
An Empirical Study on Improving SimCLR's Nonlinear Projection Head using Pretrained Autoencoder Embeddings : Abstract: This paper focuses on improving the effectiveness of the standard 2-layer MLP projection head featured in the SimCLR framework through the use of pretrained autoencoder embeddings. Given a c...
Self-Supervised Learning of Iterative Solvers for Constrained Optimization : Abstract: The real-time solution of parametric optimization problems is critical for applications that demand high accuracy under tight real-time constraints, such as model predictive control. To this...
Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information : Abstract: Policy design in non-stationary Markov Decision Processes (MDPs) is inherently challenging due to the complexities introduced by time-varying system transition and reward, which make it diff...
Evolutionary Retrofitting : Abstract: AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying evolutionary optimization to refine fully trained machine learning models by optimizing a set of carefully chosen...
SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models : Abstract: Layer pruning has emerged as a potent approach to remove redundant layers in the pre-trained network on the purpose of reducing network size and improve computational efficiency. However, ex...
Differentiable Sparse Identification of Lagrangian Dynamics : Abstract: Data-driven discovery of governing equations from data remains a fundamental challenge in nonlinear dynamics. Although sparse regression techniques have advanced system identification, they ...
Movement-Specific Analysis for FIM Score Classification Using Spatio-Temporal Deep Learning : Abstract: The functional independence measure (FIM) is widely used to evaluate patients' physical independence in activities of daily living. However, traditional FIM assessment imposes a significant ...
Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions : Abstract: Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and...
Transformers know more than they can tell -- Learning the Collatz sequence : Abstract: We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n+1}=u_n/2$ if $u_...
Scalable Population Training for Zero-Shot Coordination : Abstract: Zero-shot coordination(ZSC) has become a hot topic in reinforcement learning research recently. It focuses on the generalization ability of agents, requiring them to coordinate well with col...
VIDEOP2R: Video Understanding from Perception to Reasoning : Abstract: Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability o...
AV-Dialog: Spoken Dialogue Models with Audio-Visual Input : Abstract: Dialogue models falter in noisy, multi-speaker environments, often producing irrelevant responses and awkward turn-taking. We present AV-Dialog, the first multimodal dialog framework that us...
Utilizing LLMs for Industrial Process Automation: A Case Study on Modifying RAPID Programs : Abstract: How to best use Large Language Models (LLMs) for software engineering is covered in many publications in recent years. However, most of this work focuses on widely-used general purpose progr...
Specification, Application, and Operationalization of a Metamodel of Fairness : Abstract: This paper presents the AR fairness metamodel, aimed at formally representing, analyzing, and comparing fairness scenarios. The metamodel provides an abstract representation of fairness, ena...
OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation : Abstract: The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility. It links two independent...
Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA : Abstract: In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect be...
Enhancing Group Recommendation using Soft Impute Singular Value Decomposition : Abstract: The growing popularity of group activities increased the need to develop methods for providing recommendations to a group of users based on the collective preferences of the group members. S...
3D Gaussian and Diffusion-Based Gaze Redirection : Abstract: High-fidelity gaze redirection is critical for generating augmented data to improve the generalization of gaze estimators. 3D Gaussian Splatting (3DGS) models like GazeGaussian represent the...
Virtual Width Networks : Abstract: We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples r...
HealSplit: Towards Self-Healing through Adversarial Distillation in Split Federated Learning : Abstract: Split Federated Learning (SFL) is an emerging paradigm for privacy-preserving distributed learning. However, it remains vulnerable to sophisticated data poisoning attacks targeting local fea...
Toward Gaze Target Detection of Young Autistic Children : Abstract: The automatic detection of gaze targets in autistic children through artificial intelligence can be impactful, especially for those who lack access to a sufficient number of professionals to...
KGQuest: Template-Driven QA Generation from Knowledge Graphs with LLM-Based Refinement : Abstract: The generation of questions and answers (QA) from knowledge graphs (KG) plays a crucial role in the development and testing of educational platforms, dissemination tools, and large language ...
SQuaD: The Software Quality Dataset : Abstract: Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limite...
D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Amplitude and Pixel Spaces : Abstract: Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model...
Building the Web for Agents: A Declarative Framework for Agent-Web Interaction : Abstract: The increasing deployment of autonomous AI agents on the web is hampered by a fundamental misalignment: agents must infer affordances from human-oriented user interfaces, leading to brittle,...
Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation : Abstract: Foundation models applied in robotics, particularly \textbf{Vision--Language--Action (VLA)} models, hold great promise for achieving general-purpose manipulation. Yet, systematic real-world ...
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) achieve impressive performance once optimized on massive datasets. Such datasets often contain sensitive or copyrighted content, raising significant ...
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising : Abstract: We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across al...
iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference : Abstract: Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi...
Large-scale modality-invariant foundation models for brain MRI analysis: Application to lesion segmentation : Abstract: The field of computer vision is undergoing a paradigm shift toward large-scale foundation model pre-training via self-supervised learning (SSL). Leveraging large volumes of unlabeled brain M...
LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models : Abstract: Natural Language Processing (NLP) has transformed the financial industry, enabling advancements in areas such as textual analysis, risk management, and forecasting. Large language models (LL...
NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery : Abstract: Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates sc...
M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text : Abstract: The generation of highly fluent text by Large Language Models (LLMs) poses a significant challenge to information integrity and academic research. In this paper, we introduce the Multi-Domai...
Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions : Abstract: Retrieval-augmented generation (RAG) has rapidly emerged as a transformative approach for integrating large language models into clinical and biomedical workflows. However, privacy risks, su...
Variational Quantum Algorithms for Particle Track Reconstruction : Abstract: Quantum Computing is a rapidly developing field with the potential to tackle the increasing computational challenges faced in high-energy physics. In this work, we explore the potential and ...
The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models : Abstract: Our work addresses the ambiguity between generalization and memorization in text-to-image diffusion models, focusing on a specific case we term multimodal iconicity. This refers to instances...
Retrofit: Continual Learning with Bounded Forgetting for Security Applications : Abstract: Modern security analytics are increasingly powered by deep learning models, but their performance often degrades as threat landscapes evolve and data representations shift. While continual l...
Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies : Abstract: Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias ...
Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents : Abstract: The evolution of Visual Large Language Models (VLLMs) has revolutionized the automatic understanding of Visually Rich Documents (VRDs), which contain both textual and visual elements. Althou...
Context-aware Adaptive Visualizations for Critical Decision Making : Abstract: Effective decision-making often relies on timely insights from complex visual data. While Information Visualization (InfoVis) dashboards can support this process, they rarely adapt to users'...
Inferring response times of perceptual decisions with Poisson variational autoencoders : Abstract: Many properties of perceptual decision making are well-modeled by deep neural networks. However, such architectures typically treat decisions as instantaneous readouts, overlooking the tempo...
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation : Abstract: Recent text-to-image (T2I) models have made remarkable progress in generating visually realistic and semantically coherent images. However, they still suffer from randomness and inconsistenc...
Intrinsic Dimension Estimation for Radio Galaxy Zoo using Diffusion Models : Abstract: In this work, we estimate the intrinsic dimension (iD) of the Radio Galaxy Zoo (RGZ) dataset using a score-based diffusion model. We examine how the iD estimates vary as a function of Bayesi...
PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models : Abstract: Large vision-language models (LVLMs) are powerful, yet they remain unreliable due to object hallucinations. In this work, we show that in many hallucinatory predictions the LVLM effectively ...
Volumetric Ergodic Control : Abstract: Ergodic control synthesizes optimal coverage behaviors over spatial distributions for nonlinear systems. However, existing formulations model the robot as a non-volumetric point, but in prac...
Human-AI collaborative autonomous synthesis with pulsed laser deposition for remote epitaxy : Abstract: Autonomous laboratories typically rely on data-driven decision-making, occasionally with human-in-the-loop oversight to inject domain expertise. Fully leveraging AI agents, however, requires...
A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication : Abstract: In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server. Periodically, a sampled subset of devices ...
Private Frequency Estimation Via Residue Number Systems : Abstract: We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size $k$ and $n$ users, our $\varepsilon$...
Towards Efficient and Reliable AI Through Neuromorphic Principles : Abstract: Artificial intelligence (AI) research today is largely driven by ever-larger neural network models trained on graphics processing units (GPUs). This paradigm has yielded remarkable progress,...
Semantic Web: Past, Present, and Future (with Machine Learning on Knowledge Graphs and Language Models on Knowledge Graphs) : Abstract: Ever since the vision was formulated, the Semantic Web has inspired many generations of innovations. Semantic technologies have been used to share vast amounts of information on the Web, enh...
CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models : Abstract: The discovery of symbolic solutions -- mathematical expressions, logical rules, and algorithmic structures -- is fundamental to advancing scientific and engineering progress. However, trad...
NetGent: Agent-Based Automation of Network Application Workflows : Abstract: We present NetGent, an AI-agent framework for automating complex application workflows to generate realistic network traffic datasets. Developing generalizable ML models for networking requi...
Synthetic Data-Driven Prompt Tuning for Financial QA over Tables and Documents : Abstract: Financial documents like earning reports or balance sheets often involve long tables and multi-page reports. Large language models have become a new tool to help numerical reasoning and unde...
Data Complexity of Querying Description Logic Knowledge Bases under Cost-Based Semantics : Abstract: In this paper, we study the data complexity of querying inconsistent weighted description logic (DL) knowledge bases under recently-introduced cost-based semantics. In a nutshell, the idea i...
GreatSplicing: A Semantically Rich Splicing Dataset : Abstract: In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing tr...
Survey in Characterization of Semantic Change : Abstract: Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new mea...
Partial Information Decomposition for Data Interpretability and Feature Selection : Abstract: In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods...
Posterior Label Smoothing for Node Classification : Abstract: Label smoothing is a widely studied regularization technique in machine learning. However, its potential for node classification in graph-structured data, spanning homophilic to heterophilic...
Are language models rational? The case of coherence norms and belief revision : Abstract: Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coheren...
Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition : Abstract: Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptivel...
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image : Abstract: Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, de...
LDC: Learning to Generate Research Idea with Dynamic Control : Abstract: Recent advancements in large language models (LLMs) have demonstrated their potential in automating the scientific research ideation. Existing approaches primarily focus on prompting techniq...
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation : Abstract: Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be ...
Advanced Torrential Loss Function for Precipitation Forecasting : Abstract: Accurate precipitation forecasting is becoming increasingly important in the context of climate change. In response, machine learning-based approaches have recently gained attention as an em...
Efficient Learning-Based Control of a Legged Robot in Lunar Gravity : Abstract: Legged robots are promising candidates for exploring challenging areas on low-gravity bodies such as the Moon, Mars, or asteroids, thanks to their advanced mobility on unstructured terrain. ...
Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework : Abstract: The rapid proliferation of IoT nodes equipped with microphones and capable of performing on-device audio classification exposes highly sensitive data while operating under tight resource con...
SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models : Abstract: Spatial perception is central to auditory intelligence, enabling accurate understanding of real-world acoustic scenes and advancing human-level perception of the world around us. While recen...
LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices : Abstract: Real-time energy forecasting on edge devices represents a major challenge for smart grid optimization and intelligent buildings. We present LAD-BNet (Lag-Aware Dual-Branch Network), an innov...
LT-Soups: Bridging Head and Tail Classes via Subsampled Model Soups : Abstract: Real-world datasets typically exhibit long-tailed (LT) distributions, where a few head classes dominate and many tail classes are severely underrepresented. While recent work shows that para...
Satisficing and Optimal Generalised Planning via Goal Regression (Extended Version) : Abstract: Generalised planning (GP) refers to the task of synthesising programs that solve families of related planning problems. We introduce a novel, yet simple method for GP: given a set of trainin...
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models : Abstract: The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unpreceden...
Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning : Abstract: Hallucination continues to pose a major obstacle in the reasoning capabilities of large language models (LLMs). Although the Multi-Agent Debate (MAD) paradigm offers a promising solution by ...
STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models : Abstract: Table reasoning with the large language models (LLMs) is a fundamental path toward building intelligent systems that can understand and analyze over structured data. While recent progress ha...
UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios : Abstract: Autonomous aerial systems increasingly rely on large language models (LLMs) for mission planning, perception, and decision-making, yet the lack of standardized and physically grounded benchm...
AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery : Abstract: The discovery of novel Ionic Liquids (ILs) is hindered by critical challenges in property prediction, including limited data, poor model accuracy, and fragmented workflows. Leveraging the po...
A Workflow for Full Traceability of AI Decisions : Abstract: An ever increasing number of high-stake decisions are made or assisted by automated systems employing brittle artificial intelligence technology. There is a substantial risk that some of the...
Can You Tell the Difference? Contrastive Explanations for ABox Entailments : Abstract: We introduce the notion of contrastive ABox explanations to answer questions of the type "Why is a an instance of C, but b is not?". While there are various approaches for explaining positiv...
EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment : Abstract: Large Vision-Language Models (LVLMs) exhibit powerful reasoning capabilities but suffer sophisticated jailbreak vulnerabilities. Fundamentally, aligning LVLMs is not just a safety challenge ...
RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms : Abstract: Navigating human-populated environments without causing discomfort is a critical capability for socially-aware agents. While rule-based approaches offer interpretability through predefined p...
KarmaTS: A Universal Simulation Platform for Multivariate Time Series with Functional Causal Dynamics : Abstract: We introduce KarmaTS, an interactive framework for constructing lag-indexed, executable spatiotemporal causal graphical models for multivariate time series (MTS) simulation. Motivated by the...
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism : Abstract: Recent progress in large language models (LLMs) has been propelled by reinforcement learning with verifiable rewards (RLVR) and test-time scaling. However, the limited output length of LLMs ...
Robust and Efficient Communication in Multi-Agent Reinforcement Learning : Abstract: Multi-agent reinforcement learning (MARL) has made significant strides in enabling coordinated behaviors among autonomous agents. However, most existing approaches assume that communication ...
CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction : Abstract: Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on th...
Experience-Guided Adaptation of Inference-Time Reasoning Strategies : Abstract: Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at ...
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping : Abstract: The deployment of decision-making AI agents presents a critical challenge in maintaining alignment with human values or guidelines while operating in complex, dynamic environments. Agents tr...
Unsupervised Cycle Detection in Agentic Applications : Abstract: Agentic applications powered by Large Language Models exhibit non-deterministic behaviors that can form hidden execution cycles, silently consuming resources without triggering explicit erro...
Data Analysis and Performance Evaluation of Simulation Deduction Based on LLMs : Abstract: Data analysis and performance evaluation of simulation deduction plays a pivotal role in modern warfare, which enables military personnel to gain invaluable insights into the potential effec...
Cognitively-Inspired Episodic Memory Architectures for Accurate and Efficient Character AI : Abstract: Large language models show promise for embodying historical characters in dialogue systems, but existing approaches face a critical trade-off: simple retrieval-augmented generation produces ...
Hybrid Quantum Transformer for Language Generation : Abstract: Although quantum computing has been increasingly applied to replace classical computation, most existing quantum or hybrid models remain confined to simple tasks, with no successful applicat...
Empirical Characterization of Temporal Constraint Processing in LLMs : Abstract: When deploying LLMs in agentic architectures requiring real-time decisions under temporal constraints, we assume they reliably determine whether action windows remain open or have closed. Th...
Spectral Neuro-Symbolic Reasoning II: Semantic Node Merging, Entailment Filtering, and Knowledge Graph Alignment : Abstract: This report extends the Spectral Neuro-Symbolic Reasoning (Spectral NSR) framework by introducing three semantically grounded enhancements: (1) transformer-based node merging using contextua...
Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models : Abstract: While Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, aligning these models with varying human preferences across mul...
Evaluating Open-Weight Large Language Models for Structured Data Extraction from Narrative Medical Reports Across Multiple Use Cases and Languages : Abstract: Large language models (LLMs) are increasingly used to extract structured information from free-text clinical records, but prior work often focuses on single tasks, limited models, and Englis...
Test-Time Steering for Lossless Text Compression via Weighted Product of Experts : Abstract: Lossless compression techniques are crucial in an era of rapidly growing data. Traditional universal compressors like gzip offer low computational overhead, high speed, and broad applicabili...
Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish : Abstract: Large language models (LLMs) have achieved impressive results in high-resource languages like English, yet their effectiveness in low-resource and morphologically rich languages remains unde...
Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models : Abstract: Guard models are a critical component of LLM safety, but their sensitivity to superficial linguistic variations remains a key vulnerability. We show that even meaning-preserving paraphrases ...
Evaluating LLM Understanding via Structured Tabular Decision Simulations : Abstract: Large language models (LLMs) often achieve impressive predictive accuracy, yet correctness alone does not imply genuine understanding. True LLM understanding, analogous to human expertise, r...
Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment : Abstract: Code-switching (CS) speech translation (ST) refers to translating speech that alternates between two or more languages into a target language text, which poses significant challenges due to ...
Continual Learning of Domain Knowledge from Human Feedback in Text-to-SQL : Abstract: Large Language Models (LLMs) can generate SQL queries from natural language questions but struggle with database-specific schemas and tacit domain knowledge. We introduce a framework for con...
Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification : Abstract: In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). Ho...
Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models : Abstract: Mixture-of-Experts (MoE) Large Language Models (LLMs) efficiently scale-up the model while keeping relatively low inference cost. As MoE models only activate part of the experts, related wor...
Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents : Abstract: Large Language Models (LLMs) in multi-agent systems (MAS) have shown promise for complex tasks, yet current training methods lack principled ways to connect system-level evaluation with agen...
Equilibrium Dynamics and Mitigation of Gender Bias in Synthetically Generated Data : Abstract: Recursive prompting with large language models enables scalable synthetic dataset generation but introduces the risk of bias amplification. We investigate gender bias dynamics across three g...
Saying the Unsaid: Revealing the Hidden Language of Multimodal Systems Through Telephone Games : Abstract: Recent closed-source multimodal systems have made great advances, but their hidden language for understanding the world remains opaque because of their black-box architectures. In this paper...
Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models : Abstract: Contemporary benchmarks are struggling to keep pace with the development of large language models (LLMs). Although they are indispensable to evaluate model performance on various tasks, it i...
Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate : Abstract: Voice-based artificial intelligence is increasingly expected to adhere to human social conventions, but can it learn implicit cues that are not explicitly programmed? This study investigates...
$\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling : Abstract: Transformers have revolutionized natural language processing, but their quadratic complexity with respect to sequence length remains a fundamental bottleneck for long-range modeling. While s...
Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning : Abstract: Parameter-Efficient finetuning (PEFT) enhances model performance on downstream tasks by updating a minimal subset of parameters. Representation finetuning (ReFT) methods further improve effi...
Towards Uncertainty Quantification in Generative Model Learning : Abstract: While generative models have become increasingly prevalent across various domains, fundamental concerns regarding their reliability persist. A crucial yet understudied aspect of these models...
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging : Abstract: Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model mergi...
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models : Abstract: Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have also introduced their computational effic...
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization : Abstract: Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an attacker-desired output. Existing prompt i...
Understanding the Nature of Depth-1 Equivariant Quantum Circuit : Abstract: The Equivariant Quantum Circuit (EQC) for the Travelling Salesman Problem (TSP) has been shown to achieve near-optimal performance in solving small TSP problems (up to 20 nodes) using only t...
Surrogate-Based Differentiable Pipeline for Shape Optimization : Abstract: Gradient-based optimization of engineering designs is limited by non-differentiable components in the typical computer-aided engineering (CAE) workflow, which calculates performance metrics ...
TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English : Abstract: In this paper, we introduce TEDxTN, the first publicly available Tunisian Arabic to English speech translation dataset. This work is in line with the ongoing effort to mitigate the data scar...
Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation : Abstract: The Neural Tangent Kernel (NTK) characterizes how a model's state evolves over Gradient Descent. Computing the full NTK matrix is often infeasible, especially for recurrent architectures. He...
Discounted Cuts: A Stackelberg Approach to Network Disruption : Abstract: We study a Stackelberg variant of the classical Most Vital Links problem, modeled as a one-round adversarial game between an attacker and a defender. The attacker strategically removes up to...
The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns : Abstract: Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations. While prior works have proposed confidence representation methods...
FlowPath: Learning Data-Driven Manifolds with Invertible Flows for Robust Irregularly-sampled Time Series Classification : Abstract: Modeling continuous-time dynamics from sparse and irregularly-sampled time series remains a fundamental challenge. Neural controlled differential equations provide a principled framework for...
Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning : Abstract: Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-v...
Optimal Welfare in Noncooperative Network Formation under Attack : Abstract: Communication networks are essential for our economy and our everyday lives. This makes them lucrative targets for attacks. Today, we see an ongoing battle between criminals that try to disr...
Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English : Abstract: Automated emotion detection is widely used in applications ranging from well-being monitoring to high-stakes domains like mental health and hiring. However, models often rely on annotations ...
STAMP: Spatial-Temporal Adapter with Multi-Head Pooling : Abstract: Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation m...
Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs : Abstract: Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs), but it often suffers from negative interference when models have diverged during trainin...
Adaptive Digital Twin of Sheet Metal Forming via Proper Orthogonal Decomposition-Based Koopman Operator with Model Predictive Control : Abstract: Digital Twin (DT) technologies are transforming manufacturing by enabling real-time prediction, monitoring, and control of complex processes. Yet, applying DT to deformation-based metal form...
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation : Abstract: Unit testing in High-Performance Computing (HPC) is critical but challenged by parallelism, complex algorithms, and diverse hardware. Traditional methods often fail to address non-determinis...
Accuracy-Preserving CNN Pruning Method under Limited Data Availability : Abstract: Convolutional Neural Networks (CNNs) are widely used in image recognition and have succeeded in various domains. CNN models have become larger-scale to improve accuracy and generalization pe...
Generative Artificial Intelligence Adoption Among Bangladeshi Journalists: Exploring Journalists' Awareness, Acceptance, Usage, and Organizational Stance on Generative AI : Abstract: Newsrooms and journalists across the world are adopting Generative AI (GenAI). Drawing on in-depth interviews with 23 journalists, this study identifies Bangladeshi journalists' awareness, a...
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling : Abstract: This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divid...
Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations : Abstract: The integration of graphs with Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has recently gained attention, as intermediate goals (subgoals) can be effectively sampled from gr...
A Multifaceted Analysis of Negative Bias in Large Language Models through the Lens of Parametric Knowledge : Abstract: Negative bias refers to the tendency of large language models (LLMs) to excessively generate negative responses in binary decision tasks (e.g., yes-no question answering). Previous research ...
MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition : Abstract: Multimodal emotion recognition plays a key role in many domains, including mental health monitoring, educational interaction, and human-computer interaction. However, existing methods often ...
DINOv3 as a Frozen Encoder for CRPS-Oriented Probabilistic Rainfall Nowcasting : Abstract: This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight proba...
CLIPPan: Adapting CLIP as A Supervisor for Unsupervised Pansharpening : Abstract: Despite remarkable advancements in supervised pansharpening neural networks, these methods face domain adaptation challenges of resolution due to the intrinsic disparity between simulated re...
Expert-Guided Prompting and Retrieval-Augmented Generation for Emergency Medical Service Question Answering : Abstract: Large language models (LLMs) have shown promise in medical question answering, yet they often overlook the domain-specific expertise that professionals depend on, such as the clinical subjec...
Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy : Abstract: This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's Taxonomy. A small dataset of 600 sentences labeled with six cognitive categories...
Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D : Abstract: Large language models (LLMs) have demonstrated capabilities across diverse domains, yet their performance on rare disease diagnosis from narrative medical cases remains underexplored. We int...
Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio : Abstract: Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully reproduces input text and mimics specified...
GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning : Abstract: Graph unlearning has emerged as a promising solution for complying with "the right to be forgotten" regulations by enabling the removal of sensitive information upon request. However, this s...
Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting : Abstract: LLM-based agents are increasingly deployed in multi-agent systems (MAS). As these systems move toward real-world applications, their security becomes paramount. Existing research largely eva...
Text-guided Weakly Supervised Framework for Dynamic Facial Expression Recognition : Abstract: Dynamic facial expression recognition (DFER) aims to identify emotional states by modeling the temporal changes in facial movements across video sequences. A key challenge in DFER is the man...
How Data Quality Affects Machine Learning Models for Credit Risk Assessment : Abstract: Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investiga...
PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs : Abstract: Video LLMs suffer from temporal inconsistency: small shifts in frame timing can flip attention and suppress relevant frames. We trace this instability to the common extension of Rotary Posit...
Binary Verification for Zero-Shot Vision : Abstract: We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multi...
DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains : Abstract: The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these t...
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets : Abstract: Aligning large language models (LLMs) is a central objective of post-training, often achieved through reward modeling and reinforcement learning methods. Among these, direct preference optim...
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition : Abstract: Recognizing speaker intent in long audio dialogues among speakers has a wide range of applications, but is a non-trivial AI task due to complex inter-dependencies in speaker utterances and s...
MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification : Abstract: Audio classification plays an essential role in sentiment analysis and emotion recognition, especially for analyzing customer attitudes in marketing phone calls. Efficiently categorizing cus...
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models : Abstract: Despite the remarkable success of Vision-Language Models (VLMs), their performance on a range of complex visual tasks is often hindered by a "visual processing bottleneck": a propensity to l...
Automata-Based Steering of Large Language Models for Diverse Structured Generation : Abstract: Large language models (LLMs) are increasingly tasked with generating structured outputs. While structured generation methods ensure validity, they often lack output diversity, a critical lim...
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis : Abstract: Healthcare AI systems face major vulnerabilities to data poisoning that current defenses and regulations cannot adequately address. We analyzed eight attack scenarios in four categories: arc...
AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning : Abstract: Multimodal Large Language Models (MLLMs) have shown promise in single-agent vision tasks, yet benchmarks for evaluating multi-agent collaborative perception remain scarce. This gap is critic...
Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types : Abstract: Artificial intelligence is revealing what medicine never intended to encode. Deep vision models, trained on chest X-rays, can now detect not only disease but also invisible traces of social ...
CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging : Abstract: Recent advances in multimodal large language models have enabled unified processing of visual and textual inputs, offering promising applications in general-purpose medical AI. However, thei...
SemanticNN: Compressive and Error-Resilient Semantic Offloading for Extremely Weak Devices : Abstract: With the rapid growth of the Internet of Things (IoT), integrating artificial intelligence (AI) on extremely weak embedded devices has garnered significant attention, enabling improved real-...
Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB : Abstract: We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector $e$ can be decomposed as $\tilde{e} + μ$, where $μ$ is almost identical across ...
Enhancing Graph Representations with Neighborhood-Contextualized Message-Passing : Abstract: Graph neural networks (GNNs) have become an indispensable tool for analyzing relational data. In the literature, classical GNNs may be classified into three variants: convolutional, attentio...
PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI : Abstract: 4D flow magnetic resonance imaging (MRI) is a reliable, non-invasive approach for estimating blood flow velocities, vital for cardiovascular diagnostics. Unlike conventional MRI focused on a...
LiteAttention: A Temporal Sparse Attention for Diffusion Transformers : Abstract: Diffusion Transformers, particularly for video generation, achieve remarkable quality but suffer from quadratic attention complexity, leading to prohibitive latency. Existing acceleration me...
From Retinal Pixels to Patients: Evolution of Deep Learning Research in Diabetic Retinopathy Screening : Abstract: Diabetic Retinopathy (DR) remains a leading cause of preventable blindness, with early detection critical for reducing vision loss worldwide. Over the past decade, deep learning has transfor...
S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation : Abstract: Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generatio...
The Second Law of Intelligence: Controlling Ethical Entropy in Autonomous Systems : Abstract: We propose that unconstrained artificial intelligence obeys a Second Law analogous to thermodynamics, where ethical entropy, defined as a measure of divergence from intended goals, increases...
Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents : Abstract: Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and groundin...
Picking a Representative Set of Solutions in Multiobjective Optimization: Axioms, Algorithms, and Experiments : Abstract: Many real-world decision-making problems involve optimizing multiple objectives simultaneously, rendering the selection of the most preferred solution a non-trivial problem: All Pareto optim...
Structure-Aware Encodings of Argumentation Properties for Clique-width : Abstract: Structural measures of graphs, such as treewidth, are central tools in computational complexity resulting in efficient algorithms when exploiting the parameter. It is even known that modern ...
Potential Outcome Rankings for Counterfactual Decision Making : Abstract: Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potentia...
From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models : Abstract: Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reaso...
HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environments : Abstract: Operational safety at mission-critical work sites is a top priority given the complex and hazardous nature of daily tasks. This paper presents the Human-Agent Risk Navigation and Event Safet...
HyperComplEx: Adaptive Multi-Space Knowledge Graph Embeddings : Abstract: Knowledge graphs have emerged as fundamental structures for representing complex relational data across scientific and enterprise domains. However, existing embedding methods face critical l...
Advanced Tool for Traffic Crash Analysis: An AI-Driven Multi-Agent Approach to Pre-Crash Reconstruction : Abstract: Traffic collision reconstruction traditionally relies on human expertise, often yielding inconsistent results when analyzing incomplete multimodal data. This study develops a multi-agent AI ...
Enhancing Demand-Oriented Regionalization with Agentic AI and Local Heterogeneous Data for Adaptation Planning : Abstract: Conventional planning units or urban regions, such as census tracts, zip codes, or neighborhoods, often do not capture the specific demands of local communities and lack the flexibility to i...
LLM enhanced graph inference for long-term disease progression modelling : Abstract: Understanding the interactions between biomarkers among brain regions during neurodegenerative disease is essential for unravelling the mechanisms underlying disease progression. For example...
Multi-Agent Legal Verifier Systems for Data Transfer Planning : Abstract: Legal compliance in AI-driven data transfer planning is becoming increasingly critical under stringent privacy regulations such as the Japanese Act on the Protection of Personal Information ...
Requirements for Aligned, Dynamic Resolution of Conflicts in Operational Constraints : Abstract: Deployed, autonomous AI systems must often evaluate multiple plausible courses of action (extended sequences of behavior) in novel or under-specified contexts. Despite extensive training, th...
AI Agent-Driven Framework for Automated Product Knowledge Graph Construction in E-Commerce : Abstract: The rapid expansion of e-commerce platforms generates vast amounts of unstructured product data, creating significant challenges for information retrieval, recommendation systems, and data a...
Faster Symmetry Breaking Constraints for Abstract Structures : Abstract: In constraint programming and related paradigms, a modeller specifies their problem in a modelling language for a solver to search and return its solution(s). Using high-level modelling lang...
Key Decision-Makers in Multi-Agent Debates: Who Holds the Power? : Abstract: Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies rema...
Autonomous Vehicle Path Planning by Searching With Differentiable Simulation : Abstract: Planning allows an agent to safely refine its actions before executing them in the real world. In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traff...
ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving : Abstract: We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired ext...

Research Sources: 406 | Generated: 11/17/2025