AI Research News Feeds for January 8th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

Learn2Reg 2024: New Benchmark Datasets Driving Progress on New Challenges : Abstract: Medical image registration is critical for clinical applications, and fair benchmarking of different methods is essential for monitoring ongoing progress in the field. To date, the Learn2Reg...
Plasticine: A Traceable Diffusion Model for Medical Image Translation : Abstract: Domain gaps arising from variations in imaging devices and population distributions pose significant challenges for machine learning in medical image analysis. Existing image-to-image transl...
Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data : Abstract: Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset di...
Efficient 3D affinely equivariant CNNs with adaptive fusion of augmented spherical Fourier-Bessel bases : Abstract: Filter-decomposition-based group equivariant convolutional neural networks (CNNs) have shown promising stability and data efficiency for 3D image feature extraction. However, these networks,...
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models : Abstract: Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Yet, the zero-shot performance is less competitive than a fully supervised one. Thus, ...
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos : Abstract: Generalist Vision-Language-Action models are currently hindered by the scarcity of robotic data compared to the abundance of human video demonstrations. Existing Latent Action Models attempt...
A low-complexity method for efficient depth-guided image deblurring : Abstract: Image deblurring is a challenging problem in imaging due to its highly ill-posed nature. Deep learning models have shown great success in tackling this problem but the quest for the best ima...
Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations : Abstract: Deep learning has achieved significant advancements in medical image segmentation. Currently, obtaining accurate segmentation outcomes is critically reliant on large-scale datasets with high...
GeoDiff-SAR: A Geometric Prior Guided Diffusion Model for SAR Image Generation : Abstract: Synthetic Aperture Radar (SAR) imaging results are highly sensitive to observation geometries and the geometric parameters of targets. However, existing generative methods primarily operate ...
Edit2Restore:Few-Shot Image Restoration via Parameter-Efficient Adaptation of Pre-trained Editing Models : Abstract: Image restoration has traditionally required training specialized models on thousands of paired examples per degradation type. We challenge this paradigm by demonstrating that powerful pre-t...
Choreographing a World of Dynamic Objects : Abstract: Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a...
ImLoc: Revisiting Visual Localization with Image-based Representation : Abstract: Existing visual localization methods are typically either 2D image-based, which are easy to build and maintain but limited in effective geometric reasoning, or 3D structure-based, which achi...
ToTMNet: FFT-Accelerated Toeplitz Temporal Mixing Network for Lightweight Remote Photoplethysmography : Abstract: Remote photoplethysmography (rPPG) estimates a blood volume pulse (BVP) waveform from facial videos captured by commodity cameras. Although recent deep models improve robustness compared to ...
Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning : Abstract: Direct Preference Optimization (DPO) has recently improved Text-to-Video (T2V) generation by enhancing visual fidelity and text alignment. However, current methods rely on non-differentiable...
GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning : Abstract: The evolution of Remote Sensing Vision-Language Models(RS-VLMs) emphasizes the importance of transitioning from perception-centric recognition toward high-level deductive reasoning to enhanc...
Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction : Abstract: We present Gen3R, a method that bridges the strong priors of foundational reconstruction models and video diffusion models for scene-level 3D generation. We repurpose the VGGT reconstruction...
Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model : Abstract: Recent advances in video reward models and post-training strategies have improved text-to-video (T2V) generation. While these models typically assess visual quality, motion quality, and text...
Pad\'e Neurons for Efficient Neural Models : Abstract: Neural networks commonly employ the McCulloch-Pitts neuron model, which is a linear model followed by a point-wise non-linear activation. Various researchers have already advanced inherently...
PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography : Abstract: Commercial-grade poster design demands the seamless integration of aesthetic appeal with precise, informative content delivery. Current automated poster generation systems face significant l...
FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion : Abstract: Hands are central to interacting with our surroundings and conveying gestures, making their inclusion essential for full-body motion synthesis. Despite this, existing human motion synthesis ...
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation : Abstract: Existing 1D visual tokenizers for autoregressive (AR) generation largely follow the design principles of language modeling, as they are built directly upon transformers whose priors originat...
HemBLIP: A Vision-Language Model for Interpretable Leukemia Cell Morphology Analysis : Abstract: Microscopic evaluation of white blood cell morphology is central to leukemia diagnosis, yet current deep learning models often act as black boxes, limiting clinical trust and adoption. We in...
A Comparative Study of 3D Model Acquisition Methods for Synthetic Data Generation of Agricultural Products : Abstract: In the manufacturing industry, computer vision systems based on artificial intelligence (AI) are widely used to reduce costs and increase production. Training these AI models requires a larg...
MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction : Abstract: Reinforcement learning based post-training paradigms for Video Large Language Models (VideoLLMs) have achieved significant success by optimizing for visual-semantic tasks such as captioning ...
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing : Abstract: Existing text-guided image editing methods primarily rely on end-to-end pixel-level inpainting paradigm. Despite its success in simple scenarios, this paradigm still significantly struggles ...
HyperCOD: The First Challenging Benchmark and Baseline for Hyperspectral Camouflaged Object Detection : Abstract: RGB-based camouflaged object detection struggles in real-world scenarios where color and texture cues are ambiguous. While hyperspectral image offers a powerful alternative by capturing fine...
MATANet: A Multi-context Attention and Taxonomy-Aware Network for Fine-Grained Underwater Recognition of Marine Species : Abstract: Fine-grained classification of marine animals supports ecology, biodiversity and habitat conservation, and evidence-based policy-making. However, existing methods often overlook contextual i...
Towards Real-world Lens Active Alignment with Unlabeled Data via Domain Adaptation : Abstract: Active Alignment (AA) is a key technology for the large-scale automated assembly of high-precision optical systems. Compared with labor-intensive per-model on-device calibration, a digital-t...
BREATH-VL: Vision-Language-Guided 6-DoF Bronchoscopy Localization via Semantic-Geometric Fusion : Abstract: Vision-language models (VLMs) have recently shown remarkable performance in navigation and localization tasks by leveraging large-scale pretraining for semantic understanding. However, apply...
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance : Abstract: Current video generation models produce high-quality aesthetic videos but often struggle to learn representations of real-world physics dynamics, resulting in artifacts such as unnatural obj...
MGPC: Multimodal Network for Generalizable Point Cloud Completion With Modality Dropout and Progressive Decoding : Abstract: Point cloud completion aims to recover complete 3D geometry from partial observations caused by limited viewpoints and occlusions. Existing learning-based works, including 3D Convolutional N...
VideoMemory: Toward Consistent Video Generation via Memory Integration : Abstract: Maintaining consistent characters, props, and environments across multiple shots is a central challenge in narrative video generation. Existing models can produce high-quality short clips bu...
CrackSegFlow: Controllable Flow-Matching Synthesis for Generalizable Crack Segmentation with the CSF-50K Benchmark : Abstract: Automated crack segmentation is essential for scalable condition assessment of pavements and civil infrastructure, yet practical deployment is limited by scarce pixel-level labels and severe...
Shape Classification using Approximately Convex Segment Features : Abstract: The existing object classification techniques based on descriptive features rely on object alignment to compute the similarity of objects for classification. This paper replaces the necessit...
Unveiling Text in Challenging Stone Inscriptions: A Character-Context-Aware Patching Strategy for Binarization : Abstract: Binarization is a popular first step towards text extraction in historical artifacts. Stone inscription images pose severe challenges for binarization due to poor contrast between etched cha...
Adaptive Attention Distillation for Robust Few-Shot Segmentation under Environmental Perturbations : Abstract: Few-shot segmentation (FSS) aims to rapidly learn novel class concepts from limited examples to segment specific targets in unseen images, and has been widely applied in areas such as medica...
Detecting AI-Generated Images via Distributional Deviations from Real Images : Abstract: The rapid advancement of generative models has significantly enhanced the quality of AI-generated images, raising concerns about misinformation and the erosion of public trust. Detecting AI-...
SpatiaLoc: Leveraging Multi-Level Spatial Enhanced Descriptors for Cross-Modal Localization : Abstract: Cross-modal localization using text and point clouds enables robots to localize themselves via natural language descriptions, with applications in autonomous navigation and interaction betwe...
CloudMatch: Weak-to-Strong Consistency Learning for Semi-Supervised Cloud Detection : Abstract: Due to the high cost of annotating accurate pixel-level labels, semi-supervised learning has emerged as a promising approach for cloud detection. In this paper, we propose CloudMatch, a semi...
Physics-Constrained Cross-Resolution Enhancement Network for Optics-Guided Thermal UAV Image Super-Resolution : Abstract: Optics-guided thermal UAV image super-resolution has attracted significant research interest due to its potential in all-weather monitoring applications. However, existing methods typically ...
Semantic Belief-State World Model for 3D Human Motion Prediction : Abstract: Human motion prediction has traditionally been framed as a sequence regression problem where models extrapolate future joint coordinates from observed pose histories. While effective over sh...
G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation : Abstract: Semantic segmentation on point clouds is critical for 3D scene understanding. However, sparse and irregular point distributions provide limited appearance evidence, making geometry-only feat...
REFA: Real-time Egocentric Facial Animations for Virtual Reality : Abstract: We present a novel system for real-time tracking of facial expressions using egocentric views captured from a set of infrared cameras embedded in a virtual reality (VR) headset. Our technolo...
Understanding Reward Hacking in Text-to-Image Reinforcement Learning : Abstract: Reinforcement learning (RL) has become a standard approach for post-training large language models and, more recently, for improving image generation models, which uses reward functions to e...
ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing : Abstract: Instruction-driven image editing with unified multimodal generative models has advanced rapidly, yet their underlying visual reasoning remains limited, leading to suboptimal performance on r...
WeedRepFormer: Reparameterizable Vision Transformers for Real-Time Waterhemp Segmentation and Gender Classification : Abstract: We present WeedRepFormer, a lightweight multi-task Vision Transformer designed for simultaneous waterhemp segmentation and gender classification. Existing agricultural models often struggle ...
GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have become widely deployed, yet their safety alignment remains fragile under adversarial inputs. Previous work has shown that increasing inference s...
Better, But Not Sufficient: Testing Video ANNs Against Macaque IT Dynamics : Abstract: Feedforward artificial neural networks (ANNs) trained on static images remain the dominant models of the the primate ventral visual stream, yet they are intrinsically limited to static compu...
A Novel Unified Approach to Deepfake Detection : Abstract: The advancements in the field of AI is increasingly giving rise to various threats. One of the most prominent of them is the synthesis and misuse of Deepfakes. To sustain trust in this digit...
Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views : Abstract: Soft boundaries, like thin hairs, are commonly observed in natural and computer-generated imagery, but they remain challenging for 3D vision due to the ambiguous mixing of foreground and bac...
RelightAnyone: A Generalized Relightable 3D Gaussian Head Model : Abstract: 3D Gaussian Splatting (3DGS) has become a standard approach to reconstruct and render photorealistic 3D head avatars. A major challenge is to relight the avatars to match any scene illuminat...
InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem? : Abstract: The integration of sentences poses an intriguing challenge within the realm of NLP, but it has not garnered the attention it deserves. Existing methods that focus on sentence arrangement, te...
Stable Language Guidance for Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identif...
SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems : Abstract: The continued promise of Large Language Models (LLMs), particularly in their natural language understanding and generation capabilities, has driven a rapidly increasing interest in identifyi...
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control : Abstract: Recent commercial systems such as Suno demonstrate strong capabilities in long-form song generation, while academic research remains largely non-reproducible due to the lack of publicly avai...
EASLT: Emotion-Aware Sign Language Translation : Abstract: Sign Language Translation (SLT) is a complex cross-modal task requiring the integration of Manual Signals (MS) and Non-Manual Signals (NMS). While recent gloss-free SLT methods have made str...
STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark : Abstract: Tasks in the aerospace industry heavily rely on searching and reusing large volumes of technical documents, yet there is no public information retrieval (IR) benchmark that reflects the term...
RiskCueBench: Benchmarking Anticipatory Reasoning from Early Risk Cues in Video-Language Models : Abstract: With the rapid growth of video centered social media, the ability to anticipate risky events from visual data is a promising direction for ensuring public safety and preventing real world ac...
How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference : Abstract: Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely rely on coarse classifications that focus ma...
Roles of MLLMs in Visually Rich Document Retrieval for RAG: A Survey : Abstract: Visually rich documents (VRDs) challenge retrieval-augmented generation (RAG) with layout-dependent semantics, brittle OCR, and evidence spread across complex figures and structured tables. ...
SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents : Abstract: The rapid development of AI agent has spurred the development of advanced research tools, such as Deep Research. Achieving this require a nuanced understanding of the relations within scient...
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection : Abstract: We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextu...
LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation : Abstract: We present LLMberjack, a platform for creating multi-party conversations starting from existing debates, originally structured as reply trees. The system offers an interactive interface that...
SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks : Abstract: Recently, people have suffered and become increasingly aware of the unreliability gap in LLMs for open and knowledge-intensive tasks, and thus turn to search-augmented LLMs to mitigate this ...
KDCM: Reducing Hallucination in LLMs through Explicit Reasoning Structures : Abstract: To mitigate hallucinations in large language models (LLMs), we propose a framework that focuses on errors induced by prompts. Our method extends a chain-style knowledge distillation approach...
Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion : Abstract: The bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders the development of truly unified ...
Modular Prompt Optimization: Optimizing Structured Prompts with Section-Local Textual Gradients : Abstract: Prompt quality plays a central role in controlling the behavior, reliability, and reasoning performance of large language models (LLMs), particularly for smaller open-source instruction-tune...
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life : Abstract: As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to human behavior, perpetually overhanging h...
Analyzing and Improving Cross-lingual Knowledge Transfer for Machine Translation : Abstract: Multilingual machine translation systems aim to make knowledge accessible across languages, yet learning effective cross-lingual representations remains challenging. These challenges are esp...
SpeakerSleuth: Evaluating Large Audio-Language Models as Judges for Multi-turn Speaker Consistency : Abstract: Large Audio-Language Models (LALMs) as judges have emerged as a prominent approach for evaluating speech generation quality, yet their ability to assess speaker consistency across multi-turn...
Simulated Students in Tutoring Dialogues: Substance or Illusion? : Abstract: Advances in large language models (LLMs) enable many new innovations in education. However, evaluating the effectiveness of new technology requires real students, which is time-consuming and...
VotIE: Information Extraction from Meeting Minutes : Abstract: Municipal meeting minutes record key decisions in local democratic processes. Unlike parliamentary proceedings, which typically adhere to standardized formats, they encode voting outcomes in...
Benchmark^2: Systematic Evaluation of LLM Benchmarks : Abstract: The rapid proliferation of benchmarks for evaluating large language models (LLMs) has created an urgent need for systematic methods to assess benchmark quality itself. We propose Benchmark^2...
RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection : Abstract: To efficiently combat the spread of LLM-generated misinformation, we present RADAR, a retrieval-augmented detector with adversarial refinement for robust fake news detection. Our approach em...
Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models : Abstract: The deployment of Large Vision-Language Models (LVLMs) for real-world document question answering is often constrained by dynamic, user-defined policies that dictate information disclosure b...
When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering : Abstract: Multiple-choice question answering (MCQA) is easy to evaluate but adds a meta-task: models must both solve the problem and output the symbol that *represents* the answer, conflating reasonin...
Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval : Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, but existing approaches indiscriminately trigger retrieval and rely on single-...
Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification : Abstract: Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size a...
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning : Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, sele...
PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media : Abstract: Detecting hyperpartisan narratives and Population Replacement Conspiracy Theories (PRCT) is essential to addressing the spread of misinformation. These complex narratives pose a significant ...
What Does Loss Optimization Actually Teach, If Anything? Knowledge Dynamics in Continual Pre-training of LLMs : Abstract: Continual Pre-Training (CPT) is widely used for acquiring and updating factual knowledge in LLMs. This practice treats loss as a proxy for knowledge learning, while offering no grounding int...
Rethinking Table Pruning in TableQA: From Sequential Revisions to Gold Trajectory-Supervised Parallel Search : Abstract: Table Question Answering (TableQA) benefits significantly from table pruning, which extracts compact sub-tables by eliminating redundant cells to streamline downstream reasoning. However, ex...
Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage esti...
VietMed-MCQ: A Consistency-Filtered Data Synthesis Framework for Vietnamese Traditional Medicine Evaluation : Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in general medical domains. However, their performance significantly degrades in specialized, culturally specific domain...
HearSay Benchmark: Do Audio LLMs Leak What They Hear? : Abstract: While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications remain largely unexplored. This paper takes ...
Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations : Abstract: We explore the intrinsic dimension (ID) of LLM representations as a marker of linguistic complexity, asking if different ID profiles across LLM layers differentially characterize formal and ...
Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations : Abstract: Large Language Models (LLMs) can produce verbalized self-explanations, yet prior studies suggest that such rationales may not reliably reflect the model's true decision process. We ask wheth...
Whose Facts Win? LLM Source Preferences under Knowledge Conflicts : Abstract: As large language models (LLMs) are more frequently used in retrieval-augmented generation pipelines, it is increasingly relevant to study their behavior under knowledge conflicts. Thus far,...
Stuttering-Aware Automatic Speech Recognition for Indonesian Language : Abstract: Automatic speech recognition systems have achieved remarkable performance on fluent speech but continue to degrade significantly when processing stuttered speech, a limitation that is partic...
MIND: From Passive Mimicry to Active Reasoning through Capability-Aware Multi-Perspective CoT Distillation : Abstract: While Large Language Models (LLMs) have emerged with remarkable capabilities in complex tasks through Chain-of-Thought reasoning, practical resource constraints have sparked interest in tran...
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR : Abstract: DeepSeek-OCR utilizes an optical 2D mapping approach to achieve high-ratio vision-text compression, claiming to decode text tokens exceeding ten times the input visual tokens. While this sug...
AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions : Abstract: Existing Unmanned Aerial Vehicle (UAV) Vision-Language Navigation (VLN) datasets face issues such as dependence on virtual environments, lack of naturalness in instructions, and limited scal...
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models : Abstract: As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets su...
Evaluation Framework for AI Creativity: A Case Study Based on Story Generation : Abstract: Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation frame...
DisastQA: A Comprehensive Benchmark for Evaluating Question Answering in Disaster Management : Abstract: Accurate question answering (QA) in disaster management requires reasoning over uncertain and conflicting information, a setting poorly captured by existing benchmarks built on clean evidenc...
eTracer: Towards Traceable Text Generation via Claim-Level Grounding : Abstract: How can system-generated responses be efficiently verified, especially in the high-stakes biomedical domain? To address this challenge, we introduce eTracer, a plug-and-play framework that e...
SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation : Abstract: Chain-of-Thought (CoT) prompting improves reasoning but often produces long and redundant traces that substantially increase inference cost. We present SyncThink, a training-free and plug-an...
ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs : Abstract: We propose an efficient layer-specific optimization (ELO) method designed to enhance continual pretraining (CP) for specific languages in multilingual large language models (MLLMs). This app...
LLM-MC-Affect: LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight : Abstract: Emotional coordination is a core property of human interaction that shapes how relational meaning is constructed in real time. While text-based affect inference has become increasingly feasi...
Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning : Abstract: Large Language Model (LLM)-based agents significantly extend the utility of LLMs by interacting with dynamic environments. However, enabling agents to continually learn new tasks without cat...
Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases : Abstract: This paper presents the first systematic comparison investigating whether Large Reasoning Models (LRMs) are superior judge to non-reasoning LLMs. Our empirical analysis yields four key findi...
Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation : Abstract: Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADDs), moving beyond \textit{black-box} classifiers by providing some level of transparenc...
DiVA: Fine-grained Factuality Verification with Agentic-Discriminative Verifier : Abstract: Despite the significant advancements of Large Language Models (LLMs), their factuality remains a critical challenge, fueling growing interest in factuality verification. Existing research on...
OLA: Output Language Alignment in Code-Switched LLM Interactions : Abstract: Code-switching, alternating between languages within a conversation, is natural for multilingual users, yet poses fundamental challenges for large language models (LLMs). When a user code-sw...
PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics : Abstract: The increasing integration of large language models (LLMs) into mental health applications necessitates robust frameworks for evaluating professional safety alignment. Current evaluative app...
How Do Large Language Models Learn Concepts During Continual Pre-Training? : Abstract: Human beings primarily understand the world through concepts (e.g., dog), abstract mental representations that structure perception, reasoning, and learning. However, how large language mode...
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs : Abstract: Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes p...
EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory : Abstract: Despite recent advances in understanding and leveraging long-range conversational memory, existing benchmarks still lack systematic evaluation of large language models(LLMs) across diverse m...
DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing : Abstract: The evolution of Large Language Models (LLMs) towards autonomous agents has catalyzed progress in Deep Research. While retrieval capabilities are well-benchmarked, the post-retrieval synthes...
PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models : Abstract: Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic...
Self-Explaining Hate Speech Detection with Moral Rationales : Abstract: Hate speech detection models rely on surface-level lexical features, increasing vulnerability to spurious correlations and limiting robustness, cultural contextualization, and interpretabili...
Prompting Underestimates LLM Capability for Time Series Classification : Abstract: Prompt-based evaluations suggest that large language models (LLMs) perform poorly on time series classification, raising doubts about whether they encode meaningful temporal structure. We sh...
Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks : Abstract: Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does n...
The Critical Role of Aspects in Measuring Document Similarity : Abstract: We introduce ASPECTSIM, a simple and interpretable framework that requires conditioning document similarity on an explicitly specified aspect, which is different from the traditional holisti...
PCoA: A New Benchmark for Medical Aspect-Based Summarization With Phrase-Level Context Attribution : Abstract: Verifying system-generated summaries remains challenging, as effective verification requires precise attribution to the source context, which is especially crucial in high-stakes medical dom...
Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models : Abstract: Long-horizon applications increasingly require large language models (LLMs) to answer queries when relevant evidence is sparse and dispersed across very long contexts. Existing memory system...
Rendering Data Unlearnable by Exploiting LLM Alignment Mechanisms : Abstract: Large language models (LLMs) are increasingly trained on massive, heterogeneous text corpora, raising serious concerns about the unauthorised use of proprietary or personal data during model...
Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation : Abstract: Procedural content generation has enabled vast virtual worlds through levels, maps, and quests, but large-scale character generation remains underexplored. We identify two alignment-induced ...
Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization : Abstract: Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively rejec...
Causal Invariance Learning via Efficient Nonconvex Optimization : Abstract: Identifying the causal relationship among variables from observational data is an important yet challenging task. This work focuses on identifying the direct causes of an outcome and estimat...
SQL2Circuits: Estimating Cardinalities, Execution Times, and Costs for SQL Queries with Quantum Natural Language Processing : Abstract: Recent advances in quantum computing have led to progress in exploring quantum applications across diverse fields, including databases and data management. This work presents a quantum machi...
Inference in conditioned dynamics through causality restoration : Abstract: Computing observables from conditioned dynamics is typically computationally hard, because, although obtaining independent samples efficiently from the unconditioned dynamics is usually feas...
CktGen: Automated Analog Circuit Design with Generative Artificial Intelligence : Abstract: The automatic synthesis of analog circuits presents significant challenges. Most existing approaches formulate the problem as a single-objective optimization task, overlooking that design sp...
Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions : Abstract: Federated Learning (FL) is a pivotal approach in decentralized machine learning, especially when data privacy is crucial and direct data sharing is impractical. While FL is typically associa...
Graph Reinforcement Learning for Power Grids: A Comprehensive Survey : Abstract: The increasing share of renewable energy and distributed electricity generation requires the development of deep learning approaches to address the lack of flexibility inherent in traditiona...
Tipping Point Forecasting in Non-Stationary Dynamics on Function Spaces : Abstract: Tipping points are abrupt, drastic, and often irreversible changes in the evolution of non-stationary and chaotic dynamical systems. For instance, increased greenhouse gas concentrations are...
Scanner-Induced Domain Shifts Undermine the Robustness of Pathology Foundation Models : Abstract: Pathology foundation models (PFMs) have become central to computational pathology, aiming to offer general encoders for feature extraction from whole-slide images (WSIs). Despite strong benc...
FLEx: Language Modeling with Few-shot Language Explanations : Abstract: Language models have become effective at a wide range of tasks, from math problem solving to open-domain question answering. However, they still make mistakes, and these mistakes are often r...
A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification : Abstract: Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on thre...
A Single-Loop Bilevel Deep Learning Method for Optimal Control of Obstacle Problems : Abstract: Optimal control of obstacle problems arises in a wide range of applications and is computationally challenging due to its nonsmoothness, nonlinearity, and bilevel structure. Classical numeri...
Equivariant Neural Networks for Force-Field Models of Lattice Systems : Abstract: Machine-learning (ML) force fields enable large-scale simulations with near-first-principles accuracy at substantially reduced computational cost. Recent work has extended ML force-field app...
Cells on Autopilot: Adaptive Cell (Re)Selection via Reinforcement Learning : Abstract: The widespread deployment of 5G networks, together with the coexistence of 4G/LTE networks, provides mobile devices a diverse set of candidate cells to connect to. However, associating mobil...
Unsupervised Modular Adaptive Region Growing and RegionMix Classification for Wind Turbine Segmentation : Abstract: Reliable operation of wind turbines requires frequent inspections, as even minor surface damages can degrade aerodynamic performance, reduce energy output, and accelerate blade wear. Central...
Using Small Language Models to Reverse-Engineer Machine Learning Pipelines Structures : Abstract: Background: Extracting the stages that structure Machine Learning (ML) pipelines from source code is key for gaining a deeper understanding of data science practices. However, the diversity ...
Provably Finding a Hidden Dense Submatrix among Many Planted Dense Submatrices via Convex Programming : Abstract: We consider the densest submatrix problem, which seeks the submatrix of fixed size of a given binary matrix that contains the most nonzero entries. This problem is a natural generalization o...
Lightweight and perceptually-guided voice conversion for electro-laryngeal speech : Abstract: Electro-laryngeal (EL) speech is characterized by constant pitch, limited prosody, and mechanical noise, reducing naturalness and intelligibility. We propose a lightweight adaptation of the ...
Bayesian Monocular Depth Refinement via Neural Radiance Fields : Abstract: Monocular depth estimation has applications in many fields, such as autonomous navigation and extended reality, making it an essential computer vision task. However, current methods often pr...
From No-Regret to Strategically Robust Learning in Repeated Auctions : Abstract: In Bayesian single-item auctions, a monotone bidding strategy--one that prescribes a higher bid for a higher value type--can be equivalently represented as a partition of the quantile space ...
Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition : Abstract: In this paper, we propose GesFi, a novel WiFi-based gesture recognition system that introduces WiFi latent domain mining to redefine domains directly from the data itself. GesFi first proces...
EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging : Abstract: Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and the...
From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs : Abstract: Large language models (LLMs) have achieved notable performance in code synthesis; however, data-aware augmentation remains a limiting factor, handled via heuristic design or brute-force appr...
Physically Consistent Machine Learning for Melting Temperature Prediction of Refractory High-Entropy Alloys : Abstract: Predicting the melting temperature (Tm) of multi-component and high-entropy alloys (HEAs) is critical for high-temperature applications but computationally expensive using traditional CALPHA...
Compact Example-Based Explanations for Language Models : Abstract: Training data influence estimation methods quantify the contribution of training documents to a model's output, making them a promising source of information for example-based explanations. ...
Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning : Abstract: Decarbonization of isolated or off-grid energy systems through phase-in of large shares of intermittent solar or wind generation requires co-installation of energy storage or continued use o...
NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models : Abstract: Neuron-level interpretation in large language models (LLMs) is fundamentally challenged by widespread polysemanticity, where individual neurons respond to multiple distinct semantic concepts...
TRec: Egocentric Action Recognition using 2D Point Tracks : Abstract: We present a novel approach for egocentric action recognition that leverages 2D point tracks as an additional motion cue. While most existing methods rely on RGB appearance, human pose estim...
Learning from Limited Labels: Transductive Graph Label Propagation for Indian Music Analysis : Abstract: Supervised machine learning frameworks rely on extensive labeled datasets for robust performance on real-world tasks. However, there is a lack of large annotated datasets in audio and music ...
Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection : Abstract: Monocular 3D object detection offers a low-cost alternative to LiDAR, yet remains less accurate due to the difficulty of estimating metric depth from a single image. We systematically evalua...
Shielded RecRL: Explanation Generation for Recommender Systems without Ranking Degradation : Abstract: We introduce Shielded RecRL, a reinforcement learning approach to generate personalized explanations for recommender systems without sacrificing the system's original ranking performance. Un...
Provably Convergent Decentralized Optimization over Directed Graphs under Generalized Smoothness : Abstract: Decentralized optimization has become a fundamental tool for large-scale learning systems; however, most existing methods rely on the classical Lipschitz smoothness assumption, which is ofte...
Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach : Abstract: Bikeability assessment is essential for advancing sustainable urban transportation and creating cyclist-friendly cities, and it requires incorporating users' perceptions of safety and comfor...
Online Learning with Limited Information in the Sliding Window Model : Abstract: Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that cap...
CALM: Culturally Self-Aware Language Models : Abstract: Cultural awareness in language models is the capacity to understand and adapt to diverse cultural contexts. However, most existing approaches treat culture as static background knowledge, ov...
Latent Geometry of Taste: Scalable Low-Rank Matrix Factorization : Abstract: Scalability and data sparsity remain critical bottlenecks for collaborative filtering on massive interaction datasets. This work investigates the latent geometry of user preferences using th...
Experimental Comparison of Light-Weight and Deep CNN Models Across Diverse Datasets : Abstract: Our results reveal that a well-regularized shallow architecture can serve as a highly competitive baseline across heterogeneous domains - from smart-city surveillance to agricultural variety...
Measures of classification bias derived from sample size analysis : Abstract: We propose the use of a simple intuitive principle for measuring algorithmic classification bias: the significance of the differences in a classifier's error rates across the various demogra...
Provable Acceleration of Distributed Optimization with Local Updates : Abstract: In conventional distributed optimization, each agent performs a single local update between two communication rounds with its neighbors to synchronize solutions. Inspired by the success of u...
DeepLeak: Privacy Enhancing Hardening of Model Explanations Against Membership Leakage : Abstract: Machine learning (ML) explainability is central to algorithmic transparency in high-stakes settings such as predictive diagnostics and loan approval. However, these same domains require rigo...
PIVONet: A Physically-Informed Variational Neuro ODE Model for Efficient Advection-Diffusion Fluid Simulation : Abstract: We present PIVONet (Physically-Informed Variational ODE Neural Network), a unified framework that integrates Neural Ordinary Differential Equations (Neuro-ODEs) with Continuous Normalizing F...
A path to natural language through tokenisation and transformers : Abstract: Natural languages exhibit striking regularities in their statistical structure, including notably the emergence of Zipf's and Heaps' laws. Despite this, it remains broadly unclear how these ...
Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation : Abstract: PCA can be used for rotation invariant features, describing a shape with its $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ covariance matrix approximating shape by ellipsoid, allowing for rotation in...
On the Identifiability of Regime-Switching Models with Multi-Lag Dependencies : Abstract: Identifiability is central to the interpretability of deep latent variable models, ensuring parameterisations are uniquely determined by the data-generating distribution. However, it remains...
Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset : Abstract: Advances in generative models and sequence learning have greatly promoted research in dance motion generation, yet current methods still suffer from coarse semantic control and poor coherenc...
TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering : Abstract: Large language models remain vulnerable to jailbreak attacks, and single-layer defenses often trade security for usability. We present TRYLOCK, the first defense-in-depth architecture that c...
MetagenBERT: a Transformer-based Architecture using Foundational genomic Large Language Models for novel Metagenome Representation : Abstract: Metagenomic disease prediction commonly relies on species abundance tables derived from large, incomplete reference catalogs, constraining resolution and discarding valuable information cont...
WRAVAL -- WRiting Assist eVALuation : Abstract: The emergence of Large Language Models (LLMs) has shifted language model evaluation toward reasoning and problem-solving tasks as measures of general intelligence. Small Language Models (SLM...
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models : Abstract: This paper introduces Jailbreak-Zero, a novel red teaming methodology that shifts the paradigm of Large Language Model (LLM) safety evaluation from a constrained example-based approach to a ...
Lightweight Test-Time Adaptation for EMG-Based Gesture Recognition : Abstract: Reliable long-term decoding of surface electromyography (EMG) is hindered by signal drift caused by electrode shifts, muscle fatigue, and posture changes. While state-of-the-art models achie...
Robust Physics Discovery from Highly Corrupted Data: A PINN Framework Applied to the Nonlinear Schr\"odinger Equation : Abstract: We demonstrate a deep learning framework capable of recovering physical parameters from the Nonlinear Schrodinger Equation (NLSE) under severe noise conditions. By integrating Physics-Inform...
Agentic Rubrics as Contextual Verifiers for SWE Agents : Abstract: Verification is critical for improving agents: it provides the reward signal for Reinforcement Learning and enables inference-time gains through Test-Time Scaling (TTS). Despite its importan...
MORPHFED: Federated Learning for Cross-institutional Blood Morphology Analysis : Abstract: Automated blood morphology analysis can support hematological diagnostics in low- and middle-income countries (LMICs) but remains sensitive to dataset shifts from staining variability, imagi...
Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models : Abstract: Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We ...
Minimum distance classification for nonlinear dynamical systems : Abstract: We address the problem of classifying trajectory data generated by some nonlinear dynamics, where each class corresponds to a distinct dynamical system. We propose Dynafit, a kernel-based me...
Using Legacy Polysomnography Data to Train a Radar System to Quantify Sleep in Older Adults and People living with Dementia : Abstract: Objective: Ultra-wideband radar technology offers a promising solution for unobtrusive and cost-effective in-home sleep monitoring. However, the limited availability of radar sleep data pose...
LinkD: AutoRegressive Diffusion Model for Mechanical Linkage Synthesis : Abstract: Designing mechanical linkages to achieve target end-effector trajectories presents a fundamental challenge due to the intricate coupling between continuous node placements, discrete topologi...
Symbolic Regression for Shared Expressions: Introducing Partial Parameter Sharing : Abstract: Symbolic Regression aims to find symbolic expressions that describe datasets. Due to better interpretability, it is a machine learning paradigm particularly powerful for scientific discovery...
Modeling Behavioral Patterns in News Recommendations Using Fuzzy Neural Networks : Abstract: News recommender systems are increasingly driven by black-box models, offering little transparency for editorial decision-making. In this work, we introduce a transparent recommender system ...
Stage-specific cancer survival prediction enriched by explainable machine learning : Abstract: Despite the fact that cancer survivability rates vary greatly between stages, traditional survival prediction models have frequently been trained and assessed using examples from all combine...
Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences : Abstract: One-shot federated learning (OSFL) reduces the communication cost and privacy risks of iterative federated learning by constructing a global model with a single round of communication. Howev...
Detecting Semantic Backdoors in a Mystery Shopping Scenario : Abstract: Detecting semantic backdoors in classification models--where some classes can be activated by certain natural, but out-of-distribution inputs--is an important problem that has received relat...
Quantum vs. Classical Machine Learning: A Benchmark Study for Financial Prediction : Abstract: In this paper, we present a reproducible benchmarking framework that systematically compares QML models with architecture-matched classical counterparts across three financial tasks: (i) dir...
Prompt Tuning without Labeled Samples for Zero-Shot Node Classification in Text-Attributed Graphs : Abstract: Node classification is a fundamental problem in information retrieval with many real-world applications, such as community detection in social networks, grouping articles published online an...
Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations : Abstract: Models trained on tabular data are widely used in sensitive domains, increasing the demand for explanation methods to meet transparency needs. CFIRE is a recent algorithm in this domain that...
Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables : Abstract: Weather forecasts sit upstream of high-stakes decisions in domains such as grid operations, aviation, agriculture, and emergency response. Yet forecast users often face a difficult trade-off...
EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning : Abstract: Domain-specific large language models (LLMs), typically developed by fine-tuning a pre-trained general-purpose LLM on specialized datasets, represent a significant advancement in applied AI....
ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an important paradigm for unlocking reasoning capabilities in large language models, exemplified by the success of OpenAI...
The Geometry of the Pivot: A Note on Lazy Pivoted Cholesky and Farthest Point Sampling : Abstract: Low-rank approximations of large kernel matrices are ubiquitous in machine learning, particularly for scaling Gaussian Processes to massive datasets. The Pivoted Cholesky decomposition is a ...
Rethinking Recurrent Neural Networks for Time Series Forecasting: A Reinforced Recurrent Encoder with Prediction-Oriented Proximal Policy Optimization : Abstract: Time series forecasting plays a crucial role in contemporary engineering information systems for supporting decision-making across various industries, where Recurrent Neural Networks (RNNs) ...
Stochastic Voronoi Ensembles for Anomaly Detection : Abstract: Anomaly detection aims to identify data instances that deviate significantly from majority of data, which has been widely used in fraud detection, network security, and industrial quality co...
Quantum Classical Ridgelet Neural Network For Time Series Model : Abstract: In this study, we present a quantum computing method that incorporates ridglet transforms into the quantum processing pipelines for time series data. Here, the Ridgelet neural network is int...
Kantorovich-Type Stochastic Neural Network Operators for the Mean-Square Approximation of Certain Second-Order Stochastic Processes : Abstract: Artificial neural network operators (ANNOs) have been widely used for approximating deterministic input-output functions; however, their extension to random dynamics remains comparatively un...
Learning Shortest Paths When Data is Scarce : Abstract: Digital twins and other simulators are increasingly used to support routing decisions in large-scale networks. However, simulator outputs often exhibit systematic bias, while ground-truth me...
Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias : Abstract: This monograph introduces a novel approach to polyphonic music generation by addressing the "Missing Middle" problem through structural inductive bias. Focusing on Beethoven's piano sonatas ...
A Comparative Study of Traditional Machine Learning, Deep Learning, and Large Language Models for Mental Health Forecasting using Smartphone Sensing Data : Abstract: Smartphone sensing offers an unobtrusive and scalable way to track daily behaviors linked to mental health, capturing changes in sleep, mobility, and phone use that often precede symptoms of...
Local Gradient Regulation Stabilizes Federated Learning under Client Heterogeneity : Abstract: Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its stability is fundamentally challenged by statistical heterogeneity i...
Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts : Abstract: Mixture-of-Experts models enable large language models to scale efficiently, as they only activate a subset of experts for each input. Their core mechanisms, Top-k routing and auxiliary load...
Local Intrinsic Dimensionality of Ground Motion Data for Early Detection of Complex Catastrophic Slope Failure : Abstract: Local Intrinsic Dimensionality (LID) has shown strong potential for identifying anomalies and outliers in high-dimensional data across a wide range of real-world applications, including land...
Green's-Function Spherical Neural Operators for Biological Heterogeneity : Abstract: Spherical deep learning has been widely applied to a broad range of real-world problems. Existing approaches often face challenges in balancing strong spherical geometric inductive biases wi...
From Bits to Chips: An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs : Abstract: Deploying models, especially large language models (LLMs), is becoming increasingly attractive to a broader user base, including those without specialized expertise. However, due to the reso...
Hybrid Approach for Driver Behavior Analysis with Machine Learning, Feature Optimization, and Explainable AI : Abstract: Progressive driver behavior analytics is crucial for improving road safety and mitigating the issues caused by aggressive or inattentive driving. Previous studies have employed machine learn...
VNU-Bench: A Benchmarking Dataset for Multi-Source Multimodal News Video Understanding : Abstract: News videos are carefully edited multimodal narratives that combine narration, visuals, and external quotations into coherent storylines. In recent years, there have been significant advance...
Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning : Abstract: This study highlights the potential of image-based reinforcement learning methods for addressing swarm-related tasks. In multi-agent reinforcement learning, effective policy learning depends...
Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning : Abstract: Molecular subtyping of PDAC into basal-like and classical has established prognostic and predictive value. However, its use in clinical practice is limited by cost, turnaround time, and tiss...
SIGMA: Scalable Spectral Insights for LLM Collapse : Abstract: The rapid adoption of synthetic data for training Large Language Models (LLMs) has introduced the technical challenge of "model collapse"-a degenerative process where recursive training on m...
Weather-Aware Transformer for Real-Time Route Optimization in Drone-as-a-Service Operations : Abstract: This paper presents a novel framework to accelerate route prediction in Drone-as-a-Service operations through weather-aware deep learning models. While classical path-planning algorithms, su...
Enhancing Small Dataset Classification Using Projected Quantum Kernels with Convolutional Neural Networks : Abstract: Convolutional Neural Networks (CNNs) have shown promising results in efficiency and accuracy in image classification. However, their efficacy often relies on large, labeled datasets, posing ...
Physics-Informed Gaussian Process Regression for the Constitutive Modeling of Concrete: A Data-Driven Improvement to Phenomenological Models : Abstract: Understanding and modeling the constitutive behavior of concrete is crucial for civil and defense applications, yet widely used phenomenological models such as Karagozian \& Case concrete (K...
LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference : Abstract: Kolmogorov--Arnold Networks (KAN) replace scalar weights by learnable univariate functions, often implemented with B-splines. This design can be accurate and interpretable, but it makes infe...
Web Fraud Attacks Against LLM-Driven Multi-Agent Systems : Abstract: With the proliferation of LLM-driven multi-agent systems (MAS), the security of Web links has become a critical concern. Once MAS is induced to trust a malicious link, attackers can use it a...
FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning : Abstract: Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserv...
SSSD: Simply-Scalable Speculative Decoding : Abstract: Speculative Decoding has emerged as a popular technique for accelerating inference in Large Language Models. However, most existing approaches yield only modest improvements in production se...
Computing Universal Plans for Partially Observable Multi-Agent Routing Using Answer Set Programming : Abstract: Multi-agent routing problems have gained significant attention recently due to their wide range of industrial applications, ranging from logistics warehouse automation to indoor service robo...
Instructor-inspired Machine Learning for Robust Molecular Property Prediction : Abstract: Machine learning catalyzes a revolution in chemical and biological science. However, its efficacy heavily depends on the availability of labeled data, and annotating biochemical data is extr...
Discovering the Representation Bottleneck of Graph Neural Networks : Abstract: Graph neural networks (GNNs) rely mainly on the message-passing paradigm to propagate node features and build interactions, and different graph learning problems require different ranges of ...
Multi-Agent LLM Orchestration Achieves Deterministic, High-Quality Decision Support for Incident Response : Abstract: Large language models (LLMs) promise to accelerate incident response in production systems, yet single-agent approaches generate vague, unusable recommendations. We present MyAntFarm.ai, a r...
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models : Abstract: This paper primarily demonstrate a method to quantitatively assess the alignment between multi-step, structured reasoning in large language models and human preferences. We introduce the Ali...
F{\AE}RDXEL: An Expert System for Danish Traffic Law : Abstract: We present FÆRDXEL, a tool for symbolic reasoning in the domain of Danish traffic law. FÆRDXEL combines techniques from logic programming with a novel interface that allows users to navigate...
Embedding Autonomous Agents in Resource-Constrained Robotic Platforms : Abstract: Many embedded devices operate under resource constraints and in dynamic environments, requiring local decision-making capabilities. Enabling devices to make independent decisions in such env...
Clinical Data Goes MEDS? Let's OWL make sense of it : Abstract: The application of machine learning on healthcare data is often hindered by the lack of standardized and semantically explicit representation, leading to limited interoperability and reprodu...
Klear: Unified Multi-Task Audio-Video Joint Generation : Abstract: Audio-video joint generation has progressed rapidly, yet substantial challenges still remain. Non-commercial approaches still suffer audio-visual asynchrony, poor lip-speech alignment, and u...
Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test : Abstract: As world models gain momentum in Embodied AI, an increasing number of works explore using video foundation models as predictive world models for downstream embodied tasks like 3D prediction ...
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models : Abstract: Large Language Models (LLMs) encode vast amounts of parametric knowledge during pre-training. As world knowledge evolves, effective deployment increasingly depends on their ability to faithf...
Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images : Abstract: Satellites continuously generate massive volumes of data, particularly for Earth observation, including satellite image time series (SITS). However, most deep learning models are designed to...
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training : Abstract: GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity o...
Quantifying the Impact of Modules and Their Interactions in the PSO-X Framework : Abstract: The PSO-X framework incorporates dozens of modules that have been proposed for solving single-objective continuous optimization problems using particle swarm optimization. While modular fram...
Layer-wise Positional Bias in Short-Context Language Modeling : Abstract: Language models often show a preference for using information from specific positions in the input regardless of semantic relevance. While positional bias has been studied in various context...
CSSG: Measuring Code Similarity with Semantic Graphs : Abstract: Existing code similarity metrics, such as BLEU, CodeBLEU, and TSED, largely rely on surface-level string overlap or abstract syntax tree structures, and often fail to capture deeper semantic...
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts : Abstract: Large Multimodal Models (LMMs) have demonstrated impressive capabilities in video reasoning via Chain-of-Thought (CoT). However, the robustness of their reasoning chains remains questionable...
Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models : Abstract: Aligning text-to-video diffusion models with human preferences is crucial for generating high-quality videos. Existing Direct Preference Otimization (DPO) methods rely on multi-sample rankin...
HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense : Abstract: Jailbreak attacks pose significant threats to large language models (LLMs), enabling attackers to bypass safeguards. However, existing reactive defense approaches struggle to keep up with th...
A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems : Abstract: Mixture-of-Experts (MoE) models facilitate edge deployment by decoupling model capacity from active computation, yet their large memory footprint drives the need for GPU systems with near-da...
Large-Scale Aspect-Based Sentiment Analysis with Reasoning-Infused LLMs : Abstract: We introduce Arctic-ABSA, a collection of powerful models for real-life aspect-based sentiment analysis (ABSA). Our models are tailored to commercial needs, trained on a large corpus of publ...
FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning : Abstract: Continual learning (CL) for large language models (LLMs) aims to enable sequential knowledge acquisition without catastrophic forgetting. Memory replay methods are widely used for their prac...
Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data : Abstract: Phage display is a powerful laboratory technique used to study the interactions between proteins and other molecules, whether other proteins, peptides, DNA or RNA. The under-utilisation of t...
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection : Abstract: Vision-Language Models (VLMs) have shown remarkable performance in User Interface (UI) grounding tasks, driven by their ability to process increasingly high-resolution screenshots. However, ...
A Gap Between Decision Trees and Neural Networks : Abstract: We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neura...
An Algebraic Representation Theorem for Linear GENEOs in Geometric Machine Learning : Abstract: Geometric and Topological Deep Learning are rapidly growing research areas that enhance machine learning through the use of geometric and topological structures. Within this framework, Group...
Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training : Abstract: Group Relative Policy Optimization (GRPO) has emerged as a popular algorithm for reinforcement learning with large language models (LLMs). However, upon analyzing its clipping mechanism, we ...
Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures : Abstract: Mixture of Experts (MoE) architectures enable efficient scaling of neural networks but suffer from expert collapse, where routing converges to a few dominant experts. This reduces model capa...
IndexTTS 2.5 Technical Report : Abstract: In prior work, we introduced IndexTTS 2, a zero-shot neural text-to-speech foundation model comprising two core components: a transformer-based Text-to-Semantic (T2S) module and a non-autore...
FLNet: Flood-Induced Agriculture Damage Assessment using Super Resolution of Satellite Images : Abstract: Distributing government relief efforts after a flood is challenging. In India, the crops are widely affected by floods; therefore, making rapid and accurate crop damage assessment is crucial...
Women Worry, Men Adopt: How Gendered Perceptions Shape the Use of Generative AI : Abstract: Generative artificial intelligence (GenAI) is diffusing rapidly, yet its adoption is strikingly unequal. Using nationally representative UK survey data from 2023 to 2024, we show that women ...
What Matters For Safety Alignment? : Abstract: This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and LRMs to provide essential insights for dev...
Implementing the First-Order Logic of Here and There : Abstract: We present automated theorem provers for the first-order logic of here and there (HT). They are based on a native sequent calculus for the logic of HT and an axiomatic embedding of the logic...
When Numbers Start Talking: Implicit Numerical Coordination Among LLM-Based Agents : Abstract: LLMs-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or ...
On the Trap Space Semantics of Normal Logic Programs : Abstract: The logical semantics of normal logic programs has traditionally been based on the notions of Clark's completion and two-valued or three-valued canonical models, including supported, stable,...
Logic Tensor Network-Enhanced Generative Adversarial Network : Abstract: In this paper, we introduce Logic Tensor Network-Enhanced Generative Adversarial Network (LTN-GAN), a novel framework that enhances Generative Adversarial Networks (GANs) by incorporating Lo...
IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting : Abstract: Generalizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward network for scene reconstruction. Among these parameters, Gaussian means are particularl...
AI Generated Text Detection : Abstract: The rapid development of large language models has led to an increase in AI-generated text, with students increasingly using LLM-generated content as their own work, which violates academic ...
Where meaning lives: Layer-wise accessibility of psycholinguistic features in encoder and decoder language models : Abstract: Understanding where transformer language models encode psychologically meaningful aspects of meaning is essential for both theory and practice. We conduct a systematic layer-wise probing stu...
An Algorithmic Framework for Systematic Literature Reviews: A Case Study for Financial Narratives : Abstract: This paper introduces an algorithmic framework for conducting systematic literature reviews (SLRs), designed to improve efficiency, reproducibility, and selection quality assessment in the l...
Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework : Abstract: Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We prop...
NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning : Abstract: Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation...
Criminal Liability of Generative Artificial Intelligence Providers for User-Generated Child Sexual Abuse Material : Abstract: The development of more powerful Generative Artificial Intelligence (GenAI) has expanded its capabilities and the variety of outputs. This has introduced significant legal challenges, includ...
Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents : Abstract: Human-agent dialogues often exhibit topic continuity-a stable thematic frame that evolves through temporally adjacent exchanges-yet most large language model (LLM) agent memory systems fail ...
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation : Abstract: Humans anticipate, from a glance and a contemplated action of their bodies, how the 3D world will respond, a capability that is equally vital for robotic manipulation. We introduce PointWorl...
Scalable Machine Learning Force Fields for Macromolecular Systems Through Long-Range Aware Message Passing : Abstract: Machine learning force fields (MLFFs) have revolutionized molecular simulations by providing quantum mechanical accuracy at the speed of molecular mechanical computations. However, a fundame...
Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model : Abstract: We analyze neural scaling laws in a solvable model of last-layer fine-tuning where targets have intrinsic, instance-heterogeneous difficulty. In our Latent Instance Difficulty (LID) model, e...
Evaluation of Multilingual LLMs Personalized Text Generation Capabilities Targeting Groups and Social-Media Platforms : Abstract: Capabilities of large language models to generate multilingual coherent text have continuously enhanced in recent years, which opens concerns about their potential misuse. Previous research ...
Bridging OLAP and RAG: A Multidimensional Approach to the Design of Corpus Partitioning : Abstract: Retrieval-Augmented Generation (RAG) systems are increasingly deployed on large-scale document collections, often comprising millions of documents and tens of millions of text chunks. In ind...
O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL : Abstract: The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we in...
RadDiff: Describing Differences in Radiology Image Sets with Natural Language : Abstract: Understanding how two radiology image sets differ is critical for generating clinical insights and for interpreting medical AI systems. We introduce RadDiff, a multimodal agentic system that...
From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level : Abstract: As large language models (LLMs) evolve into autonomous agents, evaluating repository-level reasoning, the ability to maintain logical consistency across massive, real-world, interdependent f...
CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval : Abstract: Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval sy...
R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification : Abstract: Reinforcement learning drives recent advances in LLM reasoning and agentic capabilities, yet current approaches struggle with both exploration and exploitation. Exploration suffers from low ...
The Power of 10: New Rules for the Digital World : Abstract: As artificial intelligence rapidly advances, society is increasingly captivated by promises of superhuman machines and seamless digital futures. Yet these visions often obscure mounting soci...
MHRC-Bench: A Multilingual Hardware Repository-Level Code Completion benchmark : Abstract: Large language models (LLMs) have achieved strong performance on code completion tasks in general-purpose programming languages. However, existing repository-level code completion benchmarks...
Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction : Abstract: The trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately. The lack of experimentally resolved protein struct...
TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL : Abstract: Reinforcement learning with group-based objectives, such as Group Relative Policy Optimization (GRPO), is a common framework for aligning large language models on complex reasoning tasks. Ho...
Inference Attacks Against Graph Generative Diffusion Models : Abstract: Graph generative diffusion models have recently emerged as a powerful paradigm for generating complex graph structures, effectively capturing intricate dependencies and relationships within ...
ADEPT: Adaptive Dynamic Early-Exit Process for Transformers : Abstract: The inference of large language models imposes significant computational workloads, often requiring the processing of billions of parameters. Although early-exit strategies have proven effec...
Can AI Chatbots Provide Coaching in Engineering? Beyond Information Processing Toward Mastery : Abstract: Engineering education faces a double disruption: traditional apprenticeship models that cultivated judgment and tacit skill are eroding, just as generative AI emerges as an informal coaching...
A Pre-trained Reaction Embedding Descriptor Capturing Bond Transformation Patterns : Abstract: With the rise of data-driven reaction prediction models, effective reaction descriptors are crucial for bridging the gap between real-world chemistry and digital representations. However, ge...
From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs : Abstract: Recent studies reveal that large language models (LLMs) exhibit limited logical reasoning abilities in mathematical problem-solving, instead often relying on pattern-matching and memorizatio...
Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis : Abstract: Large Language Models (LLMs) and agent-based systems often struggle with compositional generalization due to a data bottleneck in which complex skill combinations follow a long-tailed, power...
Disentangling Aleatoric and Epistemic Uncertainty in Physics-Informed Neural Networks. Application to Insulation Material Degradation Prognostics : Abstract: Physics-Informed Neural Networks (PINNs) provide a framework for integrating physical laws with data. However, their application to Prognostics and Health Management (PHM) remains constraine...
Discontinuous Galerkin finite element operator network for solving non-smooth PDEs : Abstract: We introduce Discontinuous Galerkin Finite Element Operator Network (DG--FEONet), a data-free operator learning framework that combines the strengths of the discontinuous Galerkin (DG) metho...
e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings : Abstract: Modern information systems often involve different types of items, e.g., a text query, an image, a video clip, or an audio segment. This motivates omni-modal embedding models that map hetero...
AMIR-GRPO: Inducing Implicit Preference Signals into GRPO : Abstract: Reinforcement learning has become the primary paradigm for aligning large language models (LLMs) on complex reasoning tasks, with group relative policy optimization (GRPO) widely used in lar...
Group and Exclusive Sparse Regularization-based Continual Learning of CNNs : Abstract: We present a regularization-based approach for continual learning (CL) of fixed capacity convolutional neural networks (CNN) that does not suffer from the problem of catastrophic forgetting ...
In Search of Grandmother Cells: Tracing Interpretable Neurons in Tabular Representations : Abstract: Foundation models are powerful yet often opaque in their decision-making. A topic of continued interest in both neuroscience and artificial intelligence is whether some neurons behave like g...
ReLA: Representation Learning and Aggregation for Job Scheduling with Reinforcement Learning : Abstract: Job scheduling is widely used in real-world manufacturing systems to assign ordered job operations to machines under various constraints. Existing solutions remain limited by long running ti...
MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction : Abstract: Accurate and high-resolution precipitation nowcasting from radar echo sequences is crucial for disaster mitigation and economic planning, yet it remains a significant challenge. Key difficul...
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis : Abstract: Zero-shot text-to-speech models can clone a speaker's timbre from a short reference audio, but they also strongly inherit the speaking style present in the reference. As a result, synthesizi...
Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines : Abstract: We introduce EPAG, a benchmark dataset and framework designed for Evaluating the Pre-consultation Ability of LLMs using diagnostic Guidelines. LLMs are evaluated directly through HPI-diagnos...
Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures : Abstract: Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces challenges due to subtle acoustic diff...
Policy-Guided Search on Tree-of-Thoughts for Efficient Problem Solving with Bounded Language Model Queries : Abstract: Recent studies explored integrating state-space search algorithms with Language Models (LM) to perform look-ahead on the token generation process, the ''Tree-of-Thoughts'' (ToT), generated b...
ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification : Abstract: Despite rich safety alignment strategies, large language models (LLMs) remain highly susceptible to jailbreak attacks, which compromise safety guardrails and pose serious security risks. Exi...
From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs : Abstract: Large Language Models (LLMs) show strong reasoning ability in open-domain question answering, yet their reasoning processes are typically linear and often logically inconsistent. In contrast...
Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions : Abstract: Recent advancements in Spatial Intelligence (SI) have predominantly relied on Vision-Language Models (VLMs), yet a critical question remains: does spatial understanding originate from visual...
Deontic Knowledge Graphs for Privacy Compliance in Multimodal Disaster Data Sharing : Abstract: Disaster response requires sharing heterogeneous artifacts, from tabular assistance records to UAS imagery, under overlapping privacy mandates. Operational systems often reduce compliance to...
A Proposed Paradigm for Imputing Missing Multi-Sensor Data in the Healthcare Domain : Abstract: Chronic diseases such as diabetes pose significant management challenges, particularly due to the risk of complications like hypoglycemia, which require timely detection and intervention. Co...
Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios : Abstract: The use of Large Language Models (LLMs) in police operations is growing, yet an evaluation framework tailored to police operations remains absent. While LLM's responses may not always be leg...
Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict : Abstract: Large language models (LLMs) are increasingly used to simulate decision-making tasks involving personal data sharing, where privacy concerns and prosocial motivations can push choices in opp...
Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models : Abstract: Large language models (LLMs) perform well on multi-hop reasoning, yet how they internally compose multiple facts remains unclear. Recent work proposes \emph{hop-aligned circuit hypothesis}, ...
VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation : Abstract: Effective reward design is a central challenge in Reinforcement Learning (RL) for code generation. Mainstream pass/fail outcome rewards enforce functional correctness via executing unit test...
A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields : Abstract: Autonomous navigation in complex and partially observable environments remains a central challenge in robotics. Several bio-inspired models of mapping and navigation based on place cells in ...
Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents : Abstract: Long-term memory is a critical capability for multimodal large language model (MLLM) agents, particularly in conversational settings where information accumulates and evolves over time. Howe...
Deploy-Master: Automating the Deployment of 50,000+ Agent-Ready Scientific Tools in One Day : Abstract: Open-source scientific software is abundant, yet most tools remain difficult to compile, configure, and reuse, sustaining a small-workshop mode of scientific computing. This deployment bottl...
Bootstrapping Code Translation with Weighted Multilanguage Exploration : Abstract: Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimizatio...
IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation : Abstract: A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approac...
Reasoning Pattern Alignment Merging for Adaptive Reasoning : Abstract: Recent large reasoning models (LRMs) have made substantial progress in complex reasoning tasks, yet they often generate lengthy reasoning paths for every query, incurring unnecessary computa...
Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning : Abstract: Supervised Fine-Tuning (SFT) is a standard approach for injecting domain knowledge into Large Language Models (LLMs). However, relying on validation perplexity to monitor training is often i...
SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) demonstrate significant progress in multimodal understanding and reasoning, yet object hallucination remains a critical challenge. While existing researc...
Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers : Abstract: Modern microgrids depend on distributed sensing and communication interfaces, making them increasingly vulnerable to cyber physical disturbances that threaten operational continuity and equi...
Submodular Evaluation Subset Selection in Automatic Prompt Optimization : Abstract: Automatic prompt optimization reduces manual prompt engineering, but relies on task performance measured on a small, often randomly sampled evaluation subset as its main source of feedback s...
CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation : Abstract: Referring remote sensing image segmentation aims to localize specific targets described by natural language within complex overhead imagery. However, due to extreme scale variations, dense s...
Efficient Sequential Recommendation for Long Term User Interest Via Personalization : Abstract: Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential mode...
Online Decision-Making Under Uncertainty for Vehicle-to-Building Systems : Abstract: Vehicle-to-building (V2B) systems integrate physical infrastructures, such as smart buildings and electric vehicles (EVs) connected to chargers at the building, with digital control mechanis...
SegNSP: Revisiting Next Sentence Prediction for Linear Text Segmentation : Abstract: Linear text segmentation is a long-standing problem in natural language processing (NLP), focused on dividing continuous text into coherent and semantically meaningful units. Despite its imp...
EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning : Abstract: Reliable epidemiological reasoning requires synthesizing study evidence to infer disease burden, transmission dynamics, and intervention effects at the population level. Existing medical que...
Content vs. Form: What Drives the Writing Score Gap Across Socioeconomic Backgrounds? A Generated Panel Approach : Abstract: Students from different socioeconomic backgrounds exhibit persistent gaps in test scores, gaps that can translate into unequal educational and labor-market outcomes later in life. In many as...
FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder : Abstract: End-to-end (E2E) models in autonomous driving aim to directly map sensor inputs to control commands, but their ability to generalize to novel and complex scenarios remains a key challenge. T...
An Expectation-Maximization Algorithm for Domain Adaptation in Gaussian Causal Models : Abstract: We study the problem of imputing a designated target variable that is systematically missing in a shifted deployment domain, when a Gaussian causal DAG is available from a fully observed sou...
Automated Feedback Generation for Undergraduate Mathematics: Development and Evaluation of an AI Teaching Assistant : Abstract: Intelligent tutoring systems have long enabled automated immediate feedback on student work when it is presented in a tightly structured format and when problems are very constrained, but re...
Microeconomic Foundations of Multi-Agent Learning : Abstract: Modern AI systems increasingly operate inside markets and institutions where data, behavior, and incentives are endogenous. This paper develops an economic foundation for multi-agent learnin...
Soft Contextualized Encoder For User Defined Text Classification : Abstract: User-Defined Text Classification (UDTC) considers the challenge of classifying input text to user-specified, previously unseen classes, a setting that arises frequently in real-world applica...
Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale : Abstract: Large language models (LLMs) are increasingly used as automated evaluators, yet prior works demonstrate that these LLM judges often lack consistency in scoring when the prompt is altered. Ho...
Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers : Abstract: Generative adversarial networks (GANs) and diffusion models have recently achieved state-of-the-art performance in audio super-resolution (ADSR), producing perceptually convincing wideband a...
MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models : Abstract: We present MARVEL (https://ligogpt.mit.edu/marvel), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. It is designed to addres...
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models : Abstract: Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing. In this work, we question this assumption by introducing COMMITTEEAUDIT, a post hoc fram...
Spectral Archaeology: The Causal Topology of Model Evolution : Abstract: Behavioral benchmarks tell us \textit{what} a model does, but not \textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token g...
Training-Free Adaptation of New-Generation LLMs using Legacy Clinical Models : Abstract: Adapting language models to the clinical domain through continued pretraining and fine-tuning requires costly retraining for each new model generation. We propose Cross-Architecture Proxy Tu...
Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks : Abstract: As Large Language Models (LLMs) are increasingly deployed in safety-critical domains, rigorously evaluating their robustness against adversarial jailbreaks is essential. However, current saf...
Tigrinya Number Verbalization: Rules, Algorithm, and Implementation : Abstract: We present a systematic formalization of Tigrinya cardinal and ordinal number verbalization, addressing a gap in computational resources for the language. This work documents the canonical r...
Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning : Abstract: Vision-Language Models (VLMs) have achieved strong performance on standard vision-language benchmarks, yet often rely on surface-level recognition rather than deeper reasoning. We propose vi...
Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models : Abstract: Earlier research has shown that metaphors influence human's decision making, which raises the question of whether metaphors also influence large language models (LLMs)' reasoning pathways, c...
MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models : Abstract: Recent advances in Vision-Language Models (VLMs) have improved performance in multi-modal learning, raising the question of whether these models truly understand the content they process. Cr...
Attention mechanisms in neural networks : Abstract: Attention mechanisms represent a fundamental paradigm shift in neural network architectures, enabling models to selectively focus on relevant portions of input sequences through learned weig...
Extreme-value forest fire prediction A study of the Loss Function in an Ordinality Scheme : Abstract: Wildfires are highly imbalanced natural hazards in both space and severity, making the prediction of extreme events particularly challenging. In this work, we introduce the first ordinal cla...
Bare-Metal Tensor Virtualization: Overcoming the Memory Wall in Edge-AI Inference on ARM64 : Abstract: The deployment of Large Language Models (LLMs) on edge devices is fundamentally constrained by the "Memory Wall" the bottleneck where data movement latency outstrips arithmetic throughput. S...
HEEGNet: Hyperbolic Embeddings for EEG : Abstract: Electroencephalography (EEG)-based brain-computer interfaces facilitate direct communication with a computer, enabling promising applications in human-computer interactions. However, their u...
Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting : Abstract: Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation, yet their clinical translation is hindered by architectural heterogeneity and the preval...
Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning : Abstract: On-policy reinforcement learning (RL), particularly Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), has become the dominant paradigm for fine-tuning large l...
CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature : Abstract: A photorealistic and controllable 3D caricaturization framework for faces is introduced. We start with an intrinsic Gaussian curvature-based surface exaggeration technique, which, when coupl...
Deep Learning-Based Image Recognition for Soft-Shell Shrimp Classification : Abstract: With the integration of information technology into aquaculture, production has become more stable and continues to grow annually. As consumer demand for high-quality aquatic products rises,...
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts : Abstract: We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, ...
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models, which integrate pretrained large Vision-Language Models (VLM) into their policy backbone, are gaining significant attention for their promising generaliz...
Mass Concept Erasure in Diffusion Models with Concept Hierarchy : Abstract: The success of diffusion models has raised concerns about the generation of unsafe or harmful content, prompting concept erasure approaches that fine-tune modules to suppress specific concep...
AI-Driven Cybersecurity Threats: A Survey of Emerging Risks and Defensive Strategies : Abstract: Artificial Intelligence's dual-use nature is revolutionizing the cybersecurity landscape, introducing new threats across four main categories: deepfakes and synthetic media, adversarial AI a...
CageDroneRF: A Large-Scale RF Benchmark and Toolkit for Drone Perception : Abstract: We present CageDroneRF (CDRF), a large-scale benchmark for Radio-Frequency (RF) drone detection and identification built from real-world captures and systematically generated synthetic varia...
PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception : Abstract: Distributed Multi-Agent Path Finding (MAPF) integrated with Multi-Agent Reinforcement Learning (MARL) has emerged as a prominent research focus, enabling real-time cooperative decision-makin...
130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone? : Abstract: This is a brief description of a project that has already autoformalized a large portion of the general topology from the Munkres textbook (which has in total 241 pages in 7 chapters and 39 ...
AgentMark: Utility-Preserving Behavioral Watermarking for Agents : Abstract: LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attri...
Lightweight Transformer Architectures for Edge Devices in Real-Time Applications : Abstract: The deployment of transformer-based models on resource-constrained edge devices represents a critical challenge in enabling real-time artificial intelligence applications. This comprehensive...
Automated Post-Incident Policy Gap Analysis via Threat-Informed Evidence Mapping using Large Language Models : Abstract: Cybersecurity post-incident reviews are essential for identifying control failures and improving organisational resilience, yet they remain labour-intensive, time-consuming, and heavily reli...
HyperCLOVA X 32B Think : Abstract: In this report, we present HyperCLOVA X 32B Think, a vision-language model designed with particular emphasis on reasoning within the Korean linguistic and cultural context, as well as agenti...
Feedback Indices to Evaluate LLM Responses to Rebuttals for Multiple Choice Type Questions : Abstract: We present a systematic framework of indices designed to characterize Large Language Model (LLM) responses when challenged with rebuttals during a chat. Assessing how LLMs respond to user di...
AI-Guided Discovery of Novel Ionic Liquid Solvents for Industrial CO2 Capture : Abstract: We present an AI-driven approach to discover compounds with optimal properties for CO2 capture from flue gas-refinery emissions' primary source. Focusing on ionic liquids (ILs) as alternativ...
$\alpha^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks : Abstract: Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such a...
A Quantum Model for Constrained Markowitz Modern Portfolio Using Slack Variables to Process Mixed-Binary Optimization under QAOA : Abstract: Effectively encoding inequality constraints is a primary obstacle in applying quantum algorithms to financial optimization. A quantum model for Markowitz portfolio optimization is presented ...
MixRx: Predicting Drug Combination Interactions with LLMs : Abstract: MixRx uses Large Language Models (LLMs) to classify drug combination interactions as Additive, Synergistic, or Antagonistic, given a multi-drug patient history. We evaluate the performance o...
Topic Segmentation Using Generative Language Models : Abstract: Topic segmentation using generative Large Language Models (LLMs) remains relatively unexplored. Previous methods use semantic similarity between sentences, but such models lack the long rang...
LLM_annotate: A Python package for annotating and analyzing fiction characters : Abstract: LLM_annotate is a Python package for analyzing the personality of fiction characters with large language models. It standardizes workflows for annotating character behaviors in full texts (e...
GuardEval: A Multi-Perspective Benchmark for Evaluating Safety, Fairness, and Robustness in LLM Moderators : Abstract: As large language models (LLMs) become deeply embedded in daily life, the urgent need for safer moderation systems, distinguishing between naive from harmful requests while upholding appropr...
Less is more: Not all samples are effective for evaluation : Abstract: The versatility of Large Language Models (LLMs) in vertical domains has spurred the development of numerous specialized evaluation benchmarks. However, these benchmarks often suffer from sig...
Advances and Challenges in Semantic Textual Similarity: A Comprehensive Survey : Abstract: Semantic Textual Similarity (STS) research has expanded rapidly since 2021, driven by advances in transformer architectures, contrastive learning, and domain-specific techniques. This survey...
The Instruction Gap: LLMs get lost in Following Instruction : Abstract: Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation, yet their deployment in enterprise environments reveals a critical limitatio...
OpenAI GPT-5 System Card : Abstract: This is the system card published alongside the OpenAI GPT-5 launch, August 2025. GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning mode...
Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support : Abstract: Large language models (LLMs) have rapidly advanced in clinical decision-making, yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastr...
Internal Reasoning vs. External Control: A Thermodynamic Analysis of Sycophancy in Large Language Models : Abstract: Large Language Models frequently exhibit sycophancy, prioritizing user agreeableness over correctness. We investigate whether this requires external regulation or can be mitigated by interna...
DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing : Abstract: Deep Research agents predominantly optimize search policies to maximize retrieval probability. However, we identify a critical bottleneck: the retrieval-utilization gap, where models fail to...
Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions : Abstract: Multi-agent Large Language Model (LLM) systems have emerged as powerful architectures for complex task decomposition and collaborative problem-solving. However, their long-term behavioral st...
ComfySearch: Autonomous Exploration and Reasoning for ComfyUI Workflows : Abstract: AI-generated content has progressed from monolithic models to modular workflows, especially on platforms like ComfyUI, allowing users to customize complex creative pipelines. However, the la...
MobileDreamer: Generative Sketch World Model for GUI Agent : Abstract: Mobile GUI agents have shown strong potential in real-world automation and practical applications. However, most existing agents remain reactive, making decisions mainly from current screen,...
Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models : Abstract: Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved significant performance gains by extending their chain-of-thought. However, this paradigm incu...
Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification : Abstract: Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to achieve remarkable reasoning in domains like mathematics and coding, where verifiable rewards provide clear signals. H...
Current Agents Fail to Leverage World Model as Tool for Foresight : Abstract: Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promisin...
Investigating the Grounding Bottleneck for a Large-Scale Configuration Problem: Existing Tools and Constraint-Aware Guessing : Abstract: Answer set programming (ASP) aims to realize the AI vision: The user specifies the problem, and the computer solves it. Indeed, ASP has made this vision true in many application domains. How...
xDNN(ASP): Explanation Generation System for Deep Neural Networks powered by Answer Set Programming : Abstract: Explainable artificial intelligence (xAI) has gained significant attention in recent years. Among other things, explainablility for deep neural networks has been a topic of intensive researc...
Formally Explaining Decision Tree Models with Answer Set Programming : Abstract: Decision tree models, including random forests and gradient-boosted decision trees, are widely used in machine learning due to their high predictive performance. However, their complex stru...
XAI-LAW: A Logic Programming Tool for Modeling, Explaining, and Learning Legal Decisions : Abstract: We propose an approach to model articles of the Italian Criminal Code (ICC), using Answer Set Programming (ASP), and to semi-automatically learn legal rules from examples based on prior judi...
Defeasible Conditionals using Answer Set Programming : Abstract: Defeasible entailment is concerned with drawing plausible conclusions from incomplete information. A foundational framework for modelling defeasible entailment is the KLM framework. Introduc...
ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition : Abstract: Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inf...
EntroCoT: Enhancing Chain-of-Thought via Adaptive Entropy-Guided Segmentation : Abstract: Chain-of-Thought (CoT) prompting has significantly enhanced the mathematical reasoning capabilities of Large Language Models. We find existing fine-tuning datasets frequently suffer from the...
Personalized Medication Planning via Direct Domain Modeling and LLM-Generated Heuristics : Abstract: Personalized medication planning involves selecting medications and determining a dosing schedule to achieve medical goals specific to each individual patient. Previous work successfully dem...
Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction : Abstract: Query correction is a critical entry point in modern search pipelines, demanding high accuracy strictly within real-time latency constraints. Chain-of-Thought (CoT) reasoning improves accura...
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs : Abstract: Large Reasoning Models (LRMs) achieve remarkable success through explicit thinking steps, yet the thinking steps introduce a novel risk by potentially amplifying unsafe behaviors. Despite th...
Architecting Agentic Communities using Design Patterns : Abstract: The rapid evolution of Large Language Models (LLM) and subsequent Agentic AI technologies requires systematic architectural guidance for building sophisticated, production-grade systems. Thi...
Interleaved Tool-Call Reasoning for Protein Function Understanding : Abstract: Recent advances in large language models (LLMs) have highlighted the effectiveness of chain-of-thought reasoning in symbolic domains such as mathematics and programming. However, our study s...
Controllable LLM Reasoning via Sparse Autoencoder-Based Steering : Abstract: Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex ...
SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models : Abstract: Training reliable tool-augmented agents remains a significant challenge, largely due to the difficulty of credit assignment in multi-step reasoning. While process-level reward models offer a...
ReEfBench: Quantifying the Reasoning Efficiency of LLMs : Abstract: Test-time scaling has enabled Large Language Models (LLMs) to tackle complex reasoning, yet the limitations of current Chain-of-Thought (CoT) evaluation obscures whether performance gains st...
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules : Abstract: Defending against jailbreak attacks is crucial for the safe deployment of Large Language Models (LLMs). Recent research has attempted to improve safety by training models to reason over safe...
Variance Computation for Weighted Model Counting with Knowledge Compilation Approach : Abstract: One of the most important queries in knowledge compilation is weighted model counting (WMC), which has been applied to probabilistic inference on various models, such as Bayesian networks. I...
Evolving Programmatic Skill Networks : Abstract: We study continual skill acquisition in open-ended embodied environments where an agent must construct, refine, and reuse an expanding library of executable skills. We introduce the Programm...
Personalization of Large Foundation Models for Health Interventions : Abstract: Large foundation models (LFMs) transform healthcare AI in prevention, diagnostics, and treatment. However, whether LFMs can provide truly personalized treatment recommendations remains an op...
CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support : Abstract: Clinical practice guidelines (CPGs) provide evidence-based recommendations for patient care; however, integrating them into Artificial Intelligence (AI) remains challenging. Previous approac...
Toward Maturity-Based Certification of Embodied AI: Quantifying Trustworthiness Through Measurement Mechanisms : Abstract: We propose a maturity-based framework for certifying embodied AI systems through explicit measurement mechanisms. We argue that certifiable embodied AI requires structured assessment framewo...
Exploration Through Introspection: A Self-Aware Reward Model : Abstract: Understanding how artificial agents model internal mental states is central to advancing Theory of Mind in AI. Evidence points to a unified system for self- and other-awareness. We explore t...
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization : Abstract: Large Language Models (LLMs) often generate substantively relevant content but fail to adhere to formal constraints, leading to outputs that are conceptually correct but procedurally flawed....
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs : Abstract: Large language models (LLMs) are increasingly being used to evolve solutions to problems in many domains, in a process inspired by biological evolution. However, unlike biological evolution,...
Mastering the Game of Go with Self-play Experience Replay : Abstract: The game of Go has long served as a benchmark for artificial intelligence, demanding sophisticated strategic reasoning and long-term planning. Previous approaches such as AlphaGo and its suc...

Research Sources: 393 | Generated: 1/8/2026