AI Research News Feeds for January 15th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

Positional Embedding-Aware Activations : Abstract: We present a neural network architecture designed to naturally learn a positional embedding and overcome the spectral bias towards lower frequencies faced by conventional activation function...
Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping : Abstract: In complex environments, autonomous robot navigation and environmental perception pose higher requirements for SLAM technology. This paper presents a novel method for semantically enhancing ...
Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations : Abstract: Enabling humanoid robots to physically interact with humans is a critical frontier, but progress is hindered by the scarcity of high-quality Human-Humanoid Interaction (HHoI) data. While lev...
Equi-ViT: Rotational Equivariant Vision Transformer for Robust Histopathology Analysis : Abstract: Vision Transformers (ViTs) have gained rapid adoption in computational pathology for their ability to model long-range dependencies through self-attention, addressing the limitations of conv...
POWDR: Pathology-preserving Outpainting with Wavelet Diffusion for 3D MRI : Abstract: Medical imaging datasets often suffer from class imbalance and limited availability of pathology-rich cases, which constrains the performance of machine learning models for segmentation, cla...
GOUHFI 2.0: A Next-Generation Toolbox for Brain Segmentation and Cortex Parcellation at Ultra-High Field MRI : Abstract: Ultra-High Field MRI (UHF-MRI) is increasingly used in large-scale neuroimaging studies, yet automatic brain segmentation and cortical parcellation remain challenging due to signal inhomogen...
W-DUALMINE: Reliability-Weighted Dual-Expert Fusion With Residual Correlation Preservation for Medical Image Fusion : Abstract: Medical image fusion integrates complementary information from multiple imaging modalities to improve clinical interpretation. However, existing deep learningbased methods, including recent ...
SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3 : Abstract: Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-le...
COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation : Abstract: 3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interaction. Optimization-based method...
Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering : Abstract: Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few second...
STEP3-VL-10B Technical Report : Abstract: We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is ...
SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings : Abstract: Monocular visual SLAM enables 3D reconstruction from internet video and autonomous navigation on resource-constrained platforms, yet suffers from scale drift, i.e., the gradual divergence of...
Self-Supervised Animal Identification for Long Videos : Abstract: Identifying individual animals in long-duration videos is essential for behavioral ecology, wildlife monitoring, and livestock management. Traditional methods require extensive manual annota...
LiteEmbed: Adapting CLIP to Rare Classes : Abstract: Large-scale vision-language models such as CLIP achieve strong zero-shot recognition but struggle with classes that are rarely seen during pretraining, including newly emerging entities and ...
Image2Garment: Simulation-ready Garment Generation from a Single Image : Abstract: Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed nature of this problem. Prior m...
AquaFeat+: an Underwater Vision Learning-based Enhancement Method for Object Detection, Classification, and Tracking : Abstract: Underwater video analysis is particularly challenging due to factors such as low lighting, color distortion, and turbidity, which compromise visual data quality and directly impact the perfo...
CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems : Abstract: Accurate and early perception of potential intrusion targets is essential for ensuring the safety of railway transportation systems. However, most existing systems focus narrowly on object c...
GRCF: Two-Stage Groupwise Ranking and Calibration Framework for Multimodal Sentiment Analysis : Abstract: Most Multimodal Sentiment Analysis research has focused on point-wise regression. While straightforward, this approach is sensitive to label noise and neglects whether one sample is more pos...
Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets : Abstract: Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. Robot demonstration d...
Iterative Differential Entropy Minimization (IDEM) method for fine rigid pairwise 3D Point Cloud Registration: A Focus on the Metric : Abstract: Point cloud registration is a central theme in computer vision, with alignment algorithms continuously improving for greater robustness. Commonly used methods evaluate Euclidean distances be...
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding : Abstract: We propose OpenVoxel, a training-free algorithm for grouping and captioning sparse voxels for the open-vocabulary 3D scene understanding tasks. Given the sparse voxel rasterization (SVR) mod...
Trustworthy Longitudinal Brain MRI Completion: A Deformation-Based Approach with KAN-Enhanced Diffusion Model : Abstract: Longitudinal brain MRI is essential for lifespan study, yet high attrition rates often lead to missing data, complicating analysis. Deep generative models have been explored, but most rely s...
Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling : Abstract: Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic an...
Bipartite Mode Matching for Vision Training Set Search from a Hierarchical Data Server : Abstract: We explore a situation in which the target domain is accessible, but real-time data annotation is not feasible. Instead, we would like to construct an alternative training set from a large-s...
GlovEgo-HOI: Bridging the Synthetic-to-Real Gap for Industrial Egocentric Human-Object Interaction Detection : Abstract: Egocentric Human-Object Interaction (EHOI) analysis is crucial for industrial safety, yet the development of robust models is hindered by the scarcity of annotated domain-specific data. We a...
Video Joint-Embedding Predictive Architectures for Facial Expression Recognition : Abstract: This paper introduces a novel application of Video Joint-Embedding Predictive Architectures (V-JEPAs) for Facial Expression Recognition (FER). Departing from conventional pre-training method...
V-DPM: 4D Video Reconstruction with Dynamic Point Maps : Abstract: Powerful 3D representations such as DUSt3R invariant point maps, which encode 3D shape and camera parameters, have significantly advanced feed forward 3D reconstruction. While point maps ass...
MAD: Motion Appearance Decoupling for efficient Driving World Models : Abstract: Recent video diffusion models generate photorealistic, temporally coherent videos, yet they fall short as reliable world models for autonomous driving, where structured motion and physically...
PrivLEX: Detecting legal concepts in images through Vision-Language Models : Abstract: We present PrivLEX, a novel image privacy classifier that grounds its decisions in legally defined personal data concepts. PrivLEX is the first interpretable privacy classifier aligned with ...
Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs : Abstract: Spatial reasoning has emerged as a critical capability for Multimodal Large Language Models (MLLMs), drawing increasing attention and rapid advancement. However, existing benchmarks primaril...
Radiomics-Integrated Deep Learning with Hierarchical Loss for Osteosarcoma Histology Classification : Abstract: Osteosarcoma (OS) is an aggressive primary bone malignancy. Accurate histopathological assessment of viable versus non-viable tumor regions after neoadjuvant chemotherapy is critical for pro...
Detail Loss in Super-Resolution Models Based on the Laplacian Pyramid and Repeated Upscaling and Downscaling Process : Abstract: With advances in artificial intelligence, image processing has gained significant interest. Image super-resolution is a vital technology closely related to real-world applications, as it enh...
Spectral Complex Autoencoder Pruning: A Fidelity-Guided Criterion for Extreme Structured Channel Compression : Abstract: We propose Spectral Complex Autoencoder Pruning (SCAP), a reconstruction-based criterion that measures functional redundancy at the level of individual output channels. For each convolutiona...
See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense fra...
Beyond the final layer: Attentive multilayer fusion for vision transformers : Abstract: With the rise of large-scale foundation models, efficiently adapting them to downstream tasks remains a central challenge. Linear probing, which freezes the backbone and trains a lightweight...
Frequency Error-Guided Under-sampling Optimization for Multi-Contrast MRI Reconstruction : Abstract: Magnetic resonance imaging (MRI) plays a vital role in clinical diagnostics, yet it remains hindered by long acquisition times and motion artifacts. Multi-contrast MRI reconstruction has eme...
Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain : Abstract: In the information and communications technology (ICT) industry, training a domain-specific large language model (LLM) or constructing a retrieval-augmented generation system requires a subs...
GaussianFluent: Gaussian Simulation for Dynamic Scenes with Mixed Materials : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent 3D representation for high-fidelity and real-time rendering. Prior work has coupled physics simulation with Gaussians, but predominant...
BrainSegNet: A Novel Framework for Whole-Brain MRI Parcellation Enhanced by Large Models : Abstract: Whole-brain parcellation from MRI is a critical yet challenging task due to the complexity of subdividing the brain into numerous small, irregular shaped regions. Traditionally, template-reg...
PhyRPR: Training-Free Physics-Constrained Video Generation : Abstract: Recent diffusion-based video generation models can synthesize visually plausible videos, yet they often struggle to satisfy physical constraints. A key reason is that most existing approache...
Hybrid guided variational autoencoder for visual place recognition : Abstract: Autonomous agents such as cars, robots and drones need to precisely localize themselves in diverse environments, including in GPS-denied indoor environments. One approach for precise localiz...
Integrating Diverse Assignment Strategies into DETRs : Abstract: Label assignment is a critical component in object detectors, particularly within DETR-style frameworks where the one-to-one matching strategy, despite its end-to-end elegance, suffers from ...
A$^2$TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation : Abstract: Gaussian Splatting has emerged as a powerful representation for high-quality, real-time 3D scene rendering. While recent works extend Gaussians with learnable textures to enrich visual appea...
DeTracker: Motion-decoupled Vehicle Detection and Tracking in Unstabilized Satellite Videos : Abstract: Satellite videos provide continuous observations of surface dynamics but pose significant challenges for multi-object tracking (MOT), especially under unstabilized conditions where platform ...
Knowledge-Embedded and Hypernetwork-Guided Few-Shot Substation Meter Defect Image Generation Method : Abstract: Substation meters play a critical role in monitoring and ensuring the stable operation of power grids, yet their detection of cracks and other physical defects is often hampered by a severe ...
CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation : Abstract: Robust local feature representations are essential for spatial intelligence tasks such as robot navigation and augmented reality. Establishing reliable correspondences requires descriptors t...
SPOT-Face: Forensic Face Identification using Attention Guided Optimal Transport : Abstract: Person identification in forensic investigations becomes very challenging when common identification means for DNA (i.e., hair strands, soft tissue) are not available. Current methods utiliz...
Disentangle Object and Non-object Infrared Features via Language Guidance : Abstract: Infrared object detection focuses on identifying and locating objects in complex environments (\eg, dark, snow, and rain) where visible imaging cameras are disabled by poor illumination. How...
SpikeVAEDiff: Neural Spike-based Natural Visual Scene Reconstruction via VD-VAE and Versatile Diffusion : Abstract: Reconstructing natural visual scenes from neural activity is a key challenge in neuroscience and computer vision. We present SpikeVAEDiff, a novel two-stage framework that combines a Very De...
Affostruction: 3D Affordance Grounding with Generative Reconstruction : Abstract: This paper addresses the problem of affordance grounding from RGBD images of an object, which aims to localize surface regions corresponding to a text query that describes an action on the o...
Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy : Abstract: White-Light Imaging (WLI) is the standard for endoscopic cancer screening, but Narrow-Band Imaging (NBI) offers superior diagnostic details. A key challenge is transferring knowledge from NB...
Point Tracking as a Temporal Cue for Robust Myocardial Segmentation in Echocardiography Videos : Abstract: Purpose: Myocardium segmentation in echocardiography videos is a challenging task due to low contrast, noise, and anatomical variability. Traditional deep learning models either process fram...
From Performance to Practice: Knowledge-Distilled Segmentator for On-Premises Clinical Workflows : Abstract: Deploying medical image segmentation models in routine clinical workflows is often constrained by on-premises infrastructure, where computational resources are fixed and cloud-based inferenc...
Architecture inside the mirage: evaluating generative image models on architectural style, elements, and typologies : Abstract: Generative artificial intelligence (GenAI) text-to-image systems are increasingly used to generate architectural imagery, yet their capacity to reproduce accurate images in a historically ru...
From Snow to Rain: Evaluating Robustness, Calibration, and Complexity of Model-Based Robust Training : Abstract: Robustness to natural corruptions remains a critical challenge for reliable deep learning, particularly in safety-sensitive domains. We study a family of model-based training approaches that...
SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection : Abstract: Zero-Shot Anomaly Detection (ZSAD) leverages Vision-Language Models (VLMs) to enable supervision-free industrial inspection. However, existing ZSAD paradigms are constrained by single visual...
SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL : Abstract: General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological l...
Beyond Seen Bounds: Class-Centric Polarization for Single-Domain Generalized Deep Metric Learning : Abstract: Single-domain generalized deep metric learning (SDG-DML) faces the dual challenge of both category and domain shifts during testing, limiting real-world applications. Therefore, aiming to le...
LPCAN: Lightweight Pyramid Cross-Attention Network for Rail Surface Defect Detection Using RGB-D Data : Abstract: This paper addresses the limitations of current vision-based rail defect detection methods, including high computational complexity, excessive parameter counts, and suboptimal accuracy. We p...
LP-LLM: End-to-End Real-World Degraded License Plate Text Recognition via Large Multimodal Models : Abstract: Real-world License Plate Recognition (LPR) faces significant challenges from severe degradations such as motion blur, low resolution, and complex illumination. The prevailing "restoration-th...
Towards Open Environments and Instructions: General Vision-Language Navigation via Fast-Slow Interactive Reasoning : Abstract: Vision-Language Navigation aims to enable agents to navigate to a target location based on language instructions. Traditional VLN often follows a close-set assumption, i.e., training and tes...
SAM-Aug: Leveraging SAM Priors for Few-Shot Parcel Segmentation in Satellite Time Series : Abstract: Few-shot semantic segmentation of time-series remote sensing images remains a critical challenge, particularly in regions where labeled data is scarce or costly to obtain. While state-of-the...
Small but Mighty: Dynamic Wavelet Expert-Guided Fine-Tuning of Large-Scale Models for Optical Remote Sensing Object Segmentation : Abstract: Accurately localizing and segmenting relevant objects from optical remote sensing images (ORSIs) is critical for advancing remote sensing applications. Existing methods are typically built u...
Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams : Abstract: Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and ...
Exploring Reliable Spatiotemporal Dependencies for Efficient Visual Tracking : Abstract: Recent advances in transformer-based lightweight object tracking have established new standards across benchmarks, leveraging the global receptive field and powerful feature extraction capab...
Depth-Wise Representation Development Under Blockwise Self-Supervised Learning for Video Vision Transformers : Abstract: End-to-end backpropagation couples all layers through a global error signal, enabling coordinated learning but requiring long-range credit assignment. Motivated by recent progress in blockwi...
Changes in Visual Attention Patterns for Detection Tasks due to Dependencies on Signal and Background Spatial Frequencies : Abstract: We aim to investigate the impact of image and signal properties on visual attention mechanisms during a signal detection task in digital images. The application of insight yielded from this ...
Instance camera focus prediction for crystal agglomeration classification : Abstract: Agglomeration refers to the process of crystal clustering due to interparticle forces. Crystal agglomeration analysis from microscopic images is challenging due to the inherent limitations o...
SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds : Abstract: Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 fo...
Thermo-LIO: A Novel Multi-Sensor Integrated System for Structural Health Monitoring : Abstract: Traditional two-dimensional thermography, despite being non-invasive and useful for defect detection in the construction field, is limited in effectively assessing complex geometries, inacce...
Variance-Penalized MC-Dropout as a Learned Smoothing Prior for Brain Tumour Segmentation : Abstract: Brain tumor segmentation is essential for diagnosis and treatment planning, yet many CNN and U-Net based approaches produce noisy boundaries in regions of tumor infiltration. We introduce UA...
Compressing Vision Transformers in Geospatial Transfer Learning with Manifold-Constrained Optimization : Abstract: Deploying geospatial foundation models on resource-constrained edge devices demands compact architectures that maintain high downstream performance. However, their large parameter counts and...
TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts : Abstract: Unified image generation and editing models suffer from severe task interference in dense diffusion transformers architectures, where a shared parameter space must compromise between conflic...
The Semantic Lifecycle in Embodied AI: Acquisition, Representation and Storage via Foundation Models : Abstract: Semantic information in embodied AI is inherently multi-source and multi-stage, making it challenging to fully leverage for achieving stable perception-to-action loops in real-world environm...
Residual Cross-Modal Fusion Networks for Audio-Visual Navigation : Abstract: Audio-visual embodied navigation aims to enable an agent to autonomously localize and reach a sound source in unseen 3D environments by leveraging auditory cues. The key challenge of this ta...
R$^2$BD: A Reconstruction-Based Method for Generalizable and Efficient Detection of Fake Images : Abstract: Recently, reconstruction-based methods have gained attention for AIGC image detection. These methods leverage pre-trained diffusion models to reconstruct inputs and measure residuals for dis...
Bias Detection and Rotation-Robustness Mitigation in Vision-Language Models and Generative Image Models : Abstract: Vision-Language Models (VLMs) and generative image models have achieved remarkable performance across multimodal tasks, yet their robustness and fairness under input transformations remain i...
"Hiding in Plain Sight": Designing Synthetic Dialog Generation for Uncovering Socially Situated Norms : Abstract: Naturally situated conversations encapsulate the social norms inherent to their context, reflecting both the relationships between interlocutors and the underlying communicative intent. In t...
Can Editing LLMs Inject Harm? : Abstract: Large Language Models (LLMs) have emerged as a new information channel. Meanwhile, one critical but under-explored question is: Is it possible to bypass the safety alignment and inject harmf...
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts : Abstract: Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave...
ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation : Abstract: Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The eme...
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning : Abstract: Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training ...
Show, don't tell -- Providing Visual Error Feedback for Handwritten Documents : Abstract: Handwriting remains an essential skill, particularly in education. Therefore, providing visual feedback on handwritten documents is an important but understudied area. We outline the many ch...
Permutation Matching Under Parikh Budgets: Linear-Time Detection, Packing, and Disjoint Selection : Abstract: We study permutation (jumbled/Abelian) pattern matching over a general alphabet $Σ$. Given a pattern P of length m and a text T of length n, the classical task is to decide whether T contain...
Dissecting Judicial Reasoning in U.S. Copyright Damage Awards : Abstract: Judicial reasoning in copyright damage awards poses a core challenge for computational legal analysis. Although federal courts follow the 1976 Copyright Act, their interpretations and factor...
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception : Abstract: We introduce a voice-agentic framework that learns one critical omni-understanding skill: knowing when to trust itself versus when to consult external audio perception. Our work is motivated...
SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing : Abstract: The recent surge in open-source Multimodal Large Language Models (MLLM) frameworks, such as LLaVA, provides a convenient kickoff for artificial intelligence developers and researchers. Howev...
Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments : Abstract: Current large language model agents predominantly operate under a reactive paradigm, responding only to immediate user queries within short-term sessions. This limitation hinders their abili...
AviationLMM: A Large Multimodal Foundation Model for Civil Aviation : Abstract: Civil aviation is a cornerstone of global transportation and commerce, and ensuring its safety, efficiency and customer satisfaction is paramount. Yet conventional Artificial Intelligence (A...
Human-AI Co-design for Clinical Prediction Models : Abstract: Developing safe, effective, and practically useful clinical prediction models (CPMs) traditionally requires iterative collaboration between clinical experts, data scientists, and informatici...
StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching : Abstract: Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investi...
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm : Abstract: Current AI safety frameworks, which often treat harmfulness as binary, lack the flexibility to handle borderline cases where humans meaningfully disagree. To build more pluralistic systems, ...
Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR : Abstract: Reading text from images or scanned documents via OCR models has been a longstanding focus of researchers. Intuitively, text reading is perceived as a straightforward perceptual task, and ex...
Empathy Applicability Modeling for General Health Queries : Abstract: LLMs are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient communication. Existing NLP frameworks f...
LLMs can Compress LLMs: Adaptive Pruning by Agents : Abstract: As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving performance. Existing methods such...
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation : Abstract: Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotatio...
Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data: the PyMUSAS framework for Multilingual Semantic Annotation : Abstract: Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis S...
TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion : Abstract: Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce catalogs, semantic search, and biomedical discovery. Yet,...
LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation : Abstract: Large Language Models (LLMs), despite their remarkable capabilities across NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. This is even more e...
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing : Abstract: Reinforcement learning (RL)-based enhancement of large language models (LLMs) often leads to reduced output diversity, undermining their utility in open-ended tasks like creative writing. Cu...
Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering : Abstract: Autonomous systems conducting schema-grounded information-gathering dialogues face an instrumentation gap, lacking turn-level observables for monitoring acquisition efficiency and detecting ...
Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats : Abstract: Microscaling Floating-Point (MXFP) has emerged as a promising low-precision format for large language models (LLMs). Despite various post-training quantization (PTQ) algorithms being propose...
SERM: Self-Evolving Relevance Model with Agent-Driven Learning from Massive Query Streams : Abstract: Due to the dynamically evolving nature of real-world query streams, relevance models struggle to generalize to practical search scenarios. A sophisticated solution is self-evolution techniqu...
MVSS: A Unified Framework for Multi-View Structured Survey Generation : Abstract: Scientific surveys require not only summarizing large bodies of literature, but also organizing them into clear and coherent conceptual structures. Existing automatic survey generation metho...
SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics : Abstract: The rapid evolution of Large Language Models (LLMs) has fostered diverse paradigms for automated slide generation, ranging from code-driven layouts to image-centric synthesis. However, evalu...
Improving Symbolic Translation of Language Models for Logical Reasoning : Abstract: The use of formal language for deductive logical reasoning aligns well with language models (LMs), where translating natural language (NL) into first-order logic (FOL) and employing an exter...
Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models : Abstract: In language models (LMs), intra-memory knowledge conflict largely arises when inconsistent information about the same event is encoded within the model's parametric knowledge. While prior wo...
Bias Dynamics in BabyLMs: Towards a Compute-Efficient Sandbox for Democratising Pre-Training Debiasing : Abstract: Pre-trained language models (LMs) have, over the last few years, grown substantially in both societal adoption and training costs. This rapid growth in size has constrained progress in under...
Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge. Recently, some works have incorporated iterative knowledge accumulation proces...
The Imperfective Paradox in Large Language Models : Abstract: Do Large Language Models (LLMs) genuinely grasp the compositional semantics of events, or do they rely on surface-level probabilistic heuristics? We investigate the Imperfective Paradox, a l...
Relation Extraction Capabilities of LLMs on Clinical Text: A Bilingual Evaluation for English and Turkish : Abstract: The scarcity of annotated datasets for clinical information extraction in non-English languages hinders the evaluation of large language model (LLM)-based methods developed primarily in Engl...
Frame of Reference: Addressing the Challenges of Common Ground Representation in Situational Dialogs : Abstract: Common ground plays a critical role in situated spoken dialogues, where interlocutors must establish and maintain shared references to entities, events, and relations to sustain coherent int...
Improving Implicit Hate Speech Detection via a Community-Driven Multi-Agent Framework : Abstract: This work proposes a contextualised detection framework for implicitly hateful speech, implemented as a multi-agent system comprising a central Moderator Agent and dynamically constructed Co...
Understanding or Memorizing? A Case Study of German Definite Articles in Language Models : Abstract: Language models perform well on grammatical agreement, but it is unclear whether this reflects rule-based generalization or memorization. We study this question for German definite singular ...
ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering : Abstract: Recent studies in medical question answering (Medical QA) have actively explored the integration of large language models (LLMs) with biomedical knowledge graphs (KGs) to improve factual acc...
MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus : Abstract: With the rapid advancement of Multimodal Large Language Models (MLLMs), their potential has garnered significant attention in Chinese Classical Studies (CCS). While existing research has pri...
When to Invoke: Refining LLM Fairness with Toxicity Assessment : Abstract: Large Language Models (LLMs) are increasingly used for toxicity assessment in online moderation systems, where fairness across demographic groups is essential for equitable treatment. Howeve...
TeachPro: Multi-Label Qualitative Teaching Evaluation via Cross-View Graph Synergy and Semantic Anchored Evidence Encoding : Abstract: Standardized Student Evaluation of Teaching often suffer from low reliability, restricted response options, and response distortion. Existing machine learning methods that mine open-ended co...
When to Trust: A Causality-Aware Calibration Framework for Accurate Knowledge Graph Retrieval-Augmented Generation : Abstract: Knowledge Graph Retrieval-Augmented Generation (KG-RAG) extends the RAG paradigm by incorporating structured knowledge from knowledge graphs, enabling Large Language Models (LLMs) to perform...
UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning : Abstract: User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and proactively engages in negotiation by challe...
A.X K1 Technical Report : Abstract: We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size ...
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection : Abstract: Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of...
OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb : Abstract: Large language models and text encoders increasingly power web-based information systems in the social sciences, including digital libraries, data catalogues, and search interfaces used by r...
Identity-Robust Language Model Generation via Content Integrity Preservation : Abstract: Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where dem...
Adaptive Multi-Stage Patent Claim Generation with Unified Quality Assessment : Abstract: Current patent claim generation systems face three fundamental limitations: poor cross-jurisdictional generalization, inadequate semantic relationship modeling between claims and prior art, ...
Contrastive Bi-Encoder Models for Multi-Label Skill Extraction: Enhancing ESCO Ontology Matching with BERT and Attention Mechanisms : Abstract: Fine-grained labor market analysis increasingly relies on mapping unstructured job advertisements to standardized skill taxonomies such as ESCO. This mapping is naturally formulated as an Ex...
SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding : Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their reasoning capabilities. However, they continue to struggle with basic character-level tasks, such as cou...
From Symbolic to Natural-Language Relations: Rethinking Knowledge Graph Construction in the Era of Large Language Models : Abstract: Knowledge graphs (KGs) have commonly been constructed using predefined symbolic relation schemas, typically implemented as categorical relation labels. This design has notable shortcomings: ...
Mi:dm 2.0 Korea-centric Bilingual Language Models : Abstract: We introduce Mi:dm 2.0, a bilingual large language model (LLM) specifically engineered to advance Korea-centric AI. This model goes beyond Korean text processing by integrating the values, r...
Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP : Abstract: Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis. While early approaches treated disagreement as no...
Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models : Abstract: This paper presents team Kl33n3x's multilingual dialogue summarization and question answering system developed for the NLPAI4Health 2025 shared task. The approach employs a three-stage pipel...
SITA: Learning Speaker-Invariant and Tone-Aware Speech Representations for Low-Resource Tonal Languages : Abstract: Tonal low-resource languages are widely spoken yet remain underserved by modern speech technology. A key challenge is learning representations that are robust to nuisance variation such as g...
Is Grokking Worthwhile? Functional Analysis and Transferability of Generalization Circuits in Transformers : Abstract: While Large Language Models (LLMs) excel at factual retrieval, they often struggle with the "curse of two-hop reasoning" in compositional tasks. Recent research suggests that parameter-shari...
Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity : Abstract: Large language models generate judgments that resemble those of humans. Yet the extent to which these models align with human judgments in interpreting figurative and socially grounded langu...
SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science : Abstract: Scientific reasoning increasingly requires linking structured experimental data with the unstructured literature that explains it, yet most large language model (LLM) assistants cannot reaso...
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG : Abstract: The development of large language models (LLMs) has achieved superior performance in a range of downstream tasks, including LLM-based retrieval-augmented generation (RAG). The quality of gen...
Multicultural Spyfall: Assessing LLMs through Dynamic Multilingual Social Deduction Game : Abstract: The rapid advancement of Large Language Models (LLMs) has necessitated more robust evaluation methods that go beyond static benchmarks, which are increasingly prone to data saturation and le...
TranslateGemma Technical Report : Abstract: We present TranslateGemma, a suite of open machine translation models based on the Gemma 3 foundation models. To enhance the inherent multilingual capabilities of Gemma 3 for the translation...
Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM : Abstract: Deploying LLMs raises two coupled challenges: (1) monitoring - estimating where a model underperforms as traffic and domains drift - and (2) improvement - prioritizing data acquisition to cl...
Evaluating Role-Consistency in LLMs for Counselor Training : Abstract: The rise of online counseling services has highlighted the need for effective training methods for future counselors. This paper extends research on VirCo, a Virtual Client for Online Counse...
NewsScope: Schema-Grounded Cross-Domain News Claim Extraction with Open Models : Abstract: Automated news verification requires structured claim extraction, but existing approaches either lack schema compliance or generalize poorly across domains. This paper presents NewsScope, a ...
M\'as contexto no es mejor. Paradoja de la diluci\'on vectorial en RAG corporativos : Abstract: Técnicas recientes de "Contextualized Chunking" inyectan resúmenes para mejorar el contexto en RAG, pero introducen una "dilución vectorial" que opaca el contenido local. Evaluando distintos...
Gaming the Answer Matcher: Examining the Impact of Text Manipulation on Automated Judgment : Abstract: Automated answer matching, which leverages LLMs to evaluate free-text responses by comparing them to a reference answer, shows substantial promise as a scalable and aligned alternative to hu...
PediaMind-R1: A Temperament-Aware Language Model for Personalized Early Childhood Care Reasoning via Cognitive Modeling and Preference Alignment : Abstract: This paper presents PediaMind-R1, a domain-specialized large language model designed to achieve active personalization in intelligent parenting scenarios. Unlike conventional systems that pr...
Scalable and Reliable Evaluation of AI Knowledge Retrieval Systems: RIKER and the Coherent Simulated Universe : Abstract: Evaluating knowledge systems (LLMs, RAG, knowledge graphs, etc) faces fundamental challenges: static benchmarks are vulnerable to contamination, LLM-based judges exhibit systematic biases, a...
Resisting Correction: How RLHF Makes Language Models Ignore External Safety Signals in Natural Conversation : Abstract: Safety architectures for language models increasingly rely on external monitors to detect errors and inject corrective signals at inference time. For such systems to function in interactive ...
Triples and Knowledge-Infused Embeddings for Clustering and Classification of Scientific Documents : Abstract: The increasing volume and complexity of scientific literature demand robust methods for organizing and understanding research documents. In this study, we explore how structured knowledge, s...
Consistency-Aware Editing for Entity-level Unlearning in Language Models : Abstract: Large language models (LLMs) risk retaining sensitive, copyrighted, or harmful information from their training data. Entity-level unlearning addresses this issue by removing all knowledge of...
Recursive Knowledge Synthesis for Multi-LLM Systems: Stability Analysis and Tri-Agent Audit Framework : Abstract: This paper presents a tri-agent cross-validation framework for analyzing stability and explainability in multi-model large language systems. The architecture integrates three heterogeneous L...
Companion Agents: A Table-Information Mining Paradigm for Text-to-SQL : Abstract: Large-scale Text-to-SQL benchmarks such as BIRD typically assume complete and accurate database annotations as well as readily available external knowledge, which fails to reflect common ind...
A Review: PTSD in Pre-Existing Medical Condition on Social Media : Abstract: Post-Traumatic Stress Disorder (PTSD) is a multifaceted mental health condition, particularly challenging for individuals with pre-existing medical conditions. This review critically examine...
DeliberationBench: When Do More Voices Hurt? A Controlled Study of Multi-LLM Deliberation Protocols : Abstract: Multi-agent systems where Large Language Models (LLMs) deliberate to form consensus have gained significant attention, yet their practical value over simpler methods remains under-scrutinize...
Variational Bayesian Inference for Tensor Robust Principal Component Analysis : Abstract: Tensor Robust Principal Component Analysis (TRPCA) holds a crucial position in machine learning and computer vision. It aims to recover underlying low-rank structures and to characterize the...
SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification : Abstract: Classifying Antimicrobial Peptides (AMPs) from the vast collection of peptides derived from metagenomic sequencing offers a promising avenue for combating antibiotic resistance. However, mos...
Mathematical Derivation Graphs: A Relation Extraction Task in STEM Manuscripts : Abstract: Recent advances in natural language processing (NLP), particularly with the emergence of large language models (LLMs), have significantly enhanced the field of textual analysis. However, whi...
Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust Assumption : Abstract: Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust as...
Know Yourself Better: Diverse Object-Related Features Improve Open Set Recognition : Abstract: Open set recognition (OSR) is a critical aspect of machine learning, addressing the challenge of detecting novel classes during inference. Within the realm of deep learning, neural classifie...
Human-in-the-Loop Segmentation of Multi-species Coral Imagery : Abstract: Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain exp...
Beyond One-Size-Fits-All: A Survey of Personalized Affective Computing in Human-Agent Interaction : Abstract: In personalized machine learning, the aim of personalization is to train a model that caters to a specific individual or group of individuals by optimizing one or more performance metrics an...
Neural Emulator Superiority: When Machine Learning for PDEs Surpasses its Training Data : Abstract: Neural operators or emulators for PDEs trained on data from numerical solvers are conventionally assumed to be limited by their training data's fidelity. We challenge this assumption by iden...
Differentially private federated learning for localized control of infectious disease dynamics : Abstract: In times of epidemics, swift reaction is necessary to mitigate epidemic spreading. For this reaction, localized approaches have several advantages, limiting necessary resources and reducing ...
Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization : Abstract: Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but exist...
Exploring the Secondary Risks of Large Language Models : Abstract: Ensuring the safety and alignment of Large Language Models is a significant challenge with their growing integration into critical applications and societal functions. While prior research h...
Benchmarking Positional Encodings for GNNs and Graph Transformers : Abstract: Positional Encodings (PEs) are essential for injecting structural information into Graph Neural Networks (GNNs), particularly Graph Transformers, yet their empirical impact remains insuffici...
Lens: A Knowledge-Guided Foundation Model for Network Traffic : Abstract: Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, ...
Soft Contrastive Learning for Time Series : Abstract: Contrastive learning has shown to be effective to learn representations from time series in a self-supervised way. However, contrasting similar time series instances or values from adjacent ...
Reinforcement Learning with Exogenous States and Rewards : Abstract: Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards ...
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning : Abstract: Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that expli...
Value-Aware Numerical Representations for Transformer Language Models : Abstract: Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A cent...
Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection : Abstract: Large Language Model (LLM) routers dynamically select optimal models for given inputs. Existing approaches typically assume access to ground-truth labeled data, which is often unavailable in...
Identifying Models Behind Text-to-Image Leaderboards : Abstract: Text-to-image (T2I) models are increasingly popular, producing a large share of AI-generated images online. To compare model quality, voting-based leaderboards have become the standard, rely...
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records : Abstract: While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we...
LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach : Abstract: Large-scale optimization is a key backbone of modern business decision-making. However, building these models is often labor-intensive and time-consuming. We address this by proposing LEAN-L...
Linear Complexity Self-Supervised Learning for Music Understanding with Random Quantizer : Abstract: In recent years, foundation models have become very popular due to their exceptional performance, mainly in natural language (NLP) tasks where they were first introduced. These models usuall...
Residual Power Flow for Neural Solvers : Abstract: The energy transition challenges operational tasks based on simulations and optimisation. These computations need to be fast and flexible as the grid is ever-expanding, and renewables' uncer...
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion : Abstract: To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe update...
Towards Robust Cross-Dataset Object Detection Generalization under Domain Specificity : Abstract: Object detectors often perform well in-distribution, yet degrade sharply on a different benchmark. We study cross-dataset object detection (CD-OD) through a lens of setting specificity. We g...
High-fidelity lunar topographic reconstruction across diverse terrain and illumination environments using deep learning : Abstract: Topographic models are essential for characterizing planetary surfaces and for inferring underlying geological processes. Nevertheless, meter-scale topographic data remain limited, which con...
SoK: Enhancing Cryptographic Collaborative Learning with Differential Privacy : Abstract: In collaborative learning (CL), multiple parties jointly train a machine learning model on their private datasets. However, data can not be shared directly due to privacy concerns. To ensure...
Do Transformers Understand Ancient Roman Coin Motifs Better than CNNs? : Abstract: Automated analysis of ancient coins has the potential to help researchers extract more historical insights from large collections of coins and to help collectors understand what they are buy...
Ability Transfer and Recovery via Modularized Parameters Localization : Abstract: Large language models can be continually pre-trained or fine-tuned to improve performance in specific domains, languages, or skills, but this specialization often degrades other capabilities...
High-Performance Serverless Computing: A Systematic Literature Review on Serverless for HPC, AI, and Big Data : Abstract: The widespread deployment of large-scale, compute-intensive applications such as high-performance computing, artificial intelligence, and big data is leading to convergence between cloud and...
Explainable Autoencoder-Based Anomaly Detection in IEC 61850 GOOSE Networks : Abstract: The IEC 61850 Generic Object-Oriented Substation Event (GOOSE) protocol plays a critical role in real-time protection and automation of digital substations, yet its lack of native security m...
Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing : Abstract: Cluster workload allocation often requires complex configurations, creating a usability gap. This paper introduces a semantic, intent-driven scheduling paradigm for cluster systems using Nat...
Magnifying change: Rapid burn scar mapping with multi-resolution, multi-source satellite imagery : Abstract: Delineating wildfire affected areas using satellite imagery remains challenging due to irregular and spatially heterogeneous spectral changes across the electromagnetic spectrum. While recen...
LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference : Abstract: LLM inference latency critically determines user experience and operational costs, directly impacting throughput under SLO constraints. Even brief latency spikes degrade service quality desp...
Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation : Abstract: Despite significant progress in autoregressive image generation, inference remains slow due to the sequential nature of AR models and the ambiguity of image tokens, even when using speculati...
N-EIoU-YOLOv9: A Signal-Aware Bounding Box Regression Loss for Lightweight Mobile Detection of Rice Leaf Diseases : Abstract: In this work, we propose N EIoU YOLOv9, a lightweight detection framework based on a signal aware bounding box regression loss derived from non monotonic gradient focusing and geometric deco...
Deep Learning-based Binary Analysis for Vulnerability Detection in x86-64 Machine Code : Abstract: While much of the current research in deep learning-based vulnerability detection relies on disassembled binaries, this paper explores the feasibility of extracting features directly from ra...
A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication : Abstract: The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and a...
How Many Human Judgments Are Enough? Feasibility Limits of Human Preference Evaluation : Abstract: Human preference evaluations are widely used to compare generative models, yet it remains unclear how many judgments are required to reliably detect small improvements. We show that when pre...
Horseshoe Mixtures-of-Experts (HS-MoE) : Abstract: Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior's adaptive global-l...
Universal Latent Homeomorphic Manifolds: Cross-Domain Representation Learning via Homeomorphism Verification : Abstract: We present the Universal Latent Homeomorphic Manifold (ULHM), a framework that unifies semantic representations (e.g., human descriptions, diagnostic labels) and observation-driven machine r...
An Inexact Weighted Proximal Trust-Region Method : Abstract: In [R. J. Baraldi and D. P. Kouri, Math. Program., 201:1 (2023), pp. 559-598], the authors introduced a trust-region method for minimizing the sum of a smooth nonconvex and a nonsmooth conve...
Tail-Sensitive KL and R\'enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings : Abstract: Hamiltonian Monte Carlo (HMC) algorithms are among the most widely used sampling methods in high dimensional settings, yet their convergence properties are poorly understood in divergences t...
Block Decomposable Methods for Large-Scale Optimization Problems : Abstract: This dissertation explores block decomposable methods for large-scale optimization problems. It focuses on alternating direction method of multipliers (ADMM) schemes and block coordinate des...
Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space : Abstract: Hihg-temperature creep characterization of structural alloys traditionally relies on serial uniaxial tests, which are highly inefficient for exploring the large search space of alloy composi...
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models : Abstract: Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current method...
ConvoLearn: A Dataset of Constructivist Tutor-Student Dialogue : Abstract: In educational applications, LLMs exhibit several fundamental pedagogical limitations, such as their tendency to reveal solutions rather than support dialogic learning. We introduce ConvoLea...
Fine Grained Evaluation of LLMs-as-Judges : Abstract: A good deal of recent research has focused on how Large Language Models (LLMs) may be used as `judges' in place of humans to evaluate the quality of the output produced by various text /...
Navigating Ideation Space: Decomposed Conceptual Representations for Positioning Scientific Ideas : Abstract: Scientific discovery is a cumulative process and requires new ideas to be situated within an ever-expanding landscape of existing knowledge. An emerging and critical challenge is how to iden...
Comprehensive Machine Learning Benchmarking for Fringe Projection Profilometry with Photorealistic Synthetic Data : Abstract: Machine learning approaches for fringe projection profilometry (FPP) are hindered by the lack of large, diverse datasets and comprehensive benchmarking protocols. This paper introduces the f...
Adaptive few-shot learning for robust part quality classification in two-photon lithography : Abstract: Two-photon lithography (TPL) is an advanced additive manufacturing (AM) technique for fabricating high-precision micro-structures. While computer vision (CV) is proofed for automated quality...
Learning Domain-Invariant Representations for Cross-Domain Image Registration via Scene-Appearance Disentanglement : Abstract: Image registration under domain shift remains a fundamental challenge in computer vision and medical imaging: when source and target images exhibit systematic intensity differences, the brig...
ForensicFormer: Hierarchical Multi-Scale Reasoning for Cross-Domain Image Forgery Detection : Abstract: The proliferation of AI-generated imagery and sophisticated editing tools has rendered traditional forensic methods ineffective for cross-domain forgery detection. We present ForensicFormer,...
Directional Attractors in LLM Reasoning: How Similarity Retrieval Steers Iterative Summarization Based Reasoning : Abstract: Iterative summarization based reasoning frameworks such as InftyThink enable long-horizon reasoning in large language models (LLMs) by controlling context growth, but they repeatedly regener...
Emissions and Performance Trade-off Between Small and Large Language Models : Abstract: The advent of Large Language Models (LLMs) has raised concerns about their enormous carbon footprint, starting with energy-intensive training and continuing through repeated inference. This ...
Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness : Abstract: Automated short-answer grading (ASAG) remains a challenging task due to the linguistic variability of student responses and the need for nuanced, rubric-aligned partial credit. While Large L...
From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda : Abstract: Safety mechanisms in LLMs remain vulnerable to attacks that reframe harmful requests through culturally coded structures. We introduce Adversarial Tales, a jailbreak technique that embeds ha...
Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design : Abstract: Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we in...
Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection : Abstract: Multi-Task Learning (MTL) combined with Low-Rank Adaptation (LoRA) has emerged as a promising direction for parameter-efficient deployment of Large Language Models (LLMs). By sharing a singl...
Exploring Fine-Tuning for Tabular Foundation Models : Abstract: Tabular Foundation Models (TFMs) have recently shown strong in-context learning capabilities on structured data, achieving zero-shot performance comparable to traditional machine learning me...
From Prompt to Protocol: Fast Charging Batteries with Large Language Models : Abstract: Efficiently optimizing battery charging protocols is challenging because each evaluation is slow, costly, and non-differentiable. Many existing approaches address this difficulty by heavily ...
Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric : Abstract: Machine unlearning is becoming essential for building trustworthy and compliant language models. Yet unlearning success varies considerably across individual samples: some are reliably erase...
Energy-Entropy Regularization: The True Power of Minimal Looped Transformers : Abstract: Recent research suggests that looped Transformers have superior reasoning capabilities compared to standard deep architectures. Current approaches to training single-head looped architecture...
Constraint- and Score-Based Nonlinear Granger Causality Discovery with Kernels : Abstract: Kernel-based methods are used in the context of Granger Causality to enable the identification of nonlinear causal relationships between time series variables. In this paper, we show that tw...
Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs : Abstract: SMEs increasingly seek alternatives to cloud LLM APIs, which raise data privacy concerns. Dedicated cloud GPU instances offer improved privacy but with limited guarantees and ongoing costs, ...
Class Adaptive Conformal Training : Abstract: Deep neural networks have achieved remarkable success across a variety of tasks, yet they often suffer from unreliable probability estimates. As a result, they can be overconfident in their ...
Parallelizable memory recurrent units : Abstract: With the emergence of massively parallel processing units, parallelization has become a desirable property for new sequence models. The ability to parallelize the processing of sequences wit...
Deep Operator Networks for Surrogate Modeling of Cyclic Adsorption Processes with Varying Initial Conditions : Abstract: Deep Operator Networks are emerging as fundamental tools among various neural network types to learn mappings between function spaces, and have recently gained attention due to their ability...
Terminally constrained flow-based generative models from an optimal control perspective : Abstract: We address the problem of sampling from terminally constrained distributions with pre-trained flow-based generative models through an optimal control formulation. Theoretically, we character...
SimMerge: Learning to Select Merge Operators from Similarity Signals : Abstract: Model merging enables multiple large language models (LLMs) to be combined into a single model while preserving performance. This makes it a valuable tool in LLM development, offering a comp...
FairGU: Fairness-aware Graph Unlearning in Social Network : Abstract: Graph unlearning has emerged as a critical mechanism for supporting sustainable and privacy-preserving social networks, enabling models to remove the influence of deleted nodes and thereby b...
Searth Transformer: A Transformer Architecture Incorporating Earth's Geospheric Physical Priors for Global Mid-Range Weather Forecasting : Abstract: Accurate global medium-range weather forecasting is fundamental to Earth system science. Most existing Transformer-based forecasting models adopt vision-centric architectures that neglect th...
On the Hardness of Computing Counterfactual and Semifactual Explanations in XAI : Abstract: Providing clear explanations to the choices of machine learning models is essential for these models to be deployed in crucial applications. Counterfactual and semi-factual explanations have...
Late Breaking Results: Quamba-SE: Soft-edge Quantizer for Activations in State Space Models : Abstract: We propose Quamba-SE, a soft-edge quantizer for State Space Model (SSM) activation quantization. Unlike existing methods, using standard INT8 operation, Quamba-SE employs three adaptive scal...
DeepLight: A Sobolev-trained Image-to-Image Surrogate Model for Light Transport in Tissue : Abstract: In optoacoustic imaging, recovering the absorption coefficients of tissue by inverting the light transport remains a challenging problem. Improvements in solving this problem can greatly ben...
Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction steps : Abstract: We introduce a new method of generating Computer Aided Design (CAD) profiles via a sequence of simple geometric constructions including curve offsetting, rotations and intersections. These s...
Preliminary Tests of the Anticipatory Classifier System with Hindsight Experience Replay : Abstract: This paper introduces ACS2HER, a novel integration of the Anticipatory Classifier System (ACS2) with the Hindsight Experience Replay (HER) mechanism. While ACS2 is highly effective at buildi...
GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are desi...
Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data : Abstract: Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (...
Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction : Abstract: Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a sig...
Learning to Trust Experience: A Monitor-Trust-Regulator Framework for Learning under Unobservable Feedback Reliability : Abstract: Learning under unobservable feedback reliability poses a distinct challenge beyond optimization robustness: a system must decide whether to learn from an experience, not only how to learn st...
RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning : Abstract: While Supervised Fine-Tuning (SFT) and Rejection Sampling Fine-Tuning (RFT) are standard for LLM alignment, they either rely on costly expert data or discard valuable negative samples, leadi...
HGATSolver: A Heterogeneous Graph Attention Solver for Fluid-Structure Interaction : Abstract: Fluid-structure interaction (FSI) systems involve distinct physical domains, fluid and solid, governed by different partial differential equations and coupled at a dynamic interface. While l...
XLinear: A Lightweight and Accurate MLP-Based Model for Long-Term Time Series Forecasting with Exogenous Inputs : Abstract: Despite the prevalent assumption of uniform variable importance in long-term time series forecasting models, real world applications often exhibit asymmetric causal relationships and varying...
Reward Learning through Ranking Mean Squared Error : Abstract: Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred f...
GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization : Abstract: The prevailing post-training paradigm for Large Reasoning Models (LRMs)--Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL)--suffers from an intrinsic optimization mismatch...
From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences : Abstract: Marked Temporal Point Processes (MTPPs) arise naturally in medical, social, commercial, and financial domains. However, existing Transformer-based methods mostly inject temporal information ...
$D^2Prune$: Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness : Abstract: Large language models (LLMs) face significant deployment challenges due to their massive computational demands. % While pruning offers a promising compression solution, existing methods suff...
Geometric Stability: The Missing Axis of Representations : Abstract: Analysis of learned representations has a blind spot: it focuses on $similarity$, measuring how closely embeddings align with external references, but similarity reveals only what is represe...
BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning : Abstract: As Large Language Models (LLMs) increasingly shape online content, removing targeted information from well-trained LLMs (also known as LLM unlearning) has become critical for web governance....
DP-FEDSOFIM: Differentially Private Federated Stochastic Optimization using Regularized Fisher Information Matrix : Abstract: Differentially private federated learning (DP-FL) suffers from slow convergence under tight privacy budgets due to the overwhelming noise introduced to preserve privacy. While adaptive optim...
Multi-Teacher Ensemble Distillation: A Mathematical Framework for Probability-Domain Knowledge Aggregation : Abstract: Building on the probability-domain distillation framework of Sparse-KD, we develop an axiomatic, operator-theoretic framework for multi-teacher ensemble knowledge distillation. Rather than p...
Efficient Clustering in Stochastic Bandits : Abstract: We study the Bandit Clustering (BC) problem under the fixed confidence setting, where the objective is to group a collection of data sequences (arms) into clusters through sequential samplin...
KTCF: Actionable Recourse in Knowledge Tracing via Counterfactual Explanations for Education : Abstract: Using Artificial Intelligence to improve teaching and learning benefits greater adaptivity and scalability in education. Knowledge Tracing (KT) is recognized for student modeling task due to...
Interpretable Probability Estimation with LLMs via Shapley Reconstruction : Abstract: Large Language Models (LLMs) demonstrate potential to estimate the probability of uncertain events, by leveraging their extensive knowledge and reasoning capabilities. This ability can be ap...
Discrete Solution Operator Learning for Geometry-Dependent PDEs : Abstract: Neural operator learning accelerates PDE solution by approximating operators as mappings between continuous function spaces. Yet in many engineering settings, varying geometry induces discre...
EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge : Abstract: Detecting evasive answers in earnings calls is critical for financial transparency, yet progress is hindered by the lack of large-scale benchmarks. We introduce EvasionBench, comprising 30,0...
Enhancing Imbalanced Electrocardiogram Classification: A Novel Approach Integrating Data Augmentation through Wavelet Transform and Interclass Fusion : Abstract: Imbalanced electrocardiogram (ECG) data hampers the efficacy and resilience of algorithms in the automated processing and interpretation of cardiovascular diagnostic information, which in tu...
Comparative Assessment of Concrete Compressive Strength Prediction at Industry Scale Using Embedding-based Neural Networks, Transformers, and Traditional Machine Learning Approaches : Abstract: Concrete is the most widely used construction material worldwide; however, reliable prediction of compressive strength remains challenging due to material heterogeneity, variable mix proport...
Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling : Abstract: Large Language Models (LLMs) can enhance reasoning capabilities through test-time scaling by generating multiple traces. However, the combination of lengthy reasoning traces with multiple sa...
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning : Abstract: In this report, we introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model. It achieves SOTA performance among open-source models of comparable scale ...
MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting : Abstract: Group Relative Policy Optimization (GRPO) has become a standard approach for training mathematical reasoning models; however, its reliance on multiple completions per prompt makes training c...
SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache : Abstract: We present Speculative Rollout with Tree-Structured Cache (SRT), a simple, model-free approach to accelerate on-policy reinforcement learning (RL) for language models without sacrificing dis...
Lean Clients, Full Accuracy: Hybrid Zeroth- and First-Order Split Federated Learning : Abstract: Split Federated Learning (SFL) enables collaborative training between resource-constrained edge devices and a compute-rich server. Communication overhead is a central issue in SFL and can be...
Resolving Predictive Multiplicity for the Rashomon Set : Abstract: The existence of multiple, equally accurate models for a given predictive task leads to predictive multiplicity, where a ``Rashomon set'' of models achieve similar accuracy but diverges in t...
Deep Incomplete Multi-View Clustering via Hierarchical Imputation and Alignment : Abstract: Incomplete multi-view clustering (IMVC) aims to discover shared cluster structures from multi-view data with partial observations. The core challenges lie in accurately imputing missing view...
SCaLE: Switching Cost aware Learning and Exploration : Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $\ell...
Layer-Parallel Training for Transformers : Abstract: We present a new training methodology for transformers using a multilevel, layer-parallel approach. Through a neural ODE formulation of transformers, our application of a multilevel parallel...
Meta-learning to Address Data Shift in Time Series Classification : Abstract: Across engineering and scientific domains, traditional deep learning (TDL) models perform well when training and test data share the same distribution. However, the dynamic nature of real-wo...
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers : Abstract: The Warmup Stable Decay (WSD) learning rate scheduler has recently become popular, largely due to its good performance and flexibility when training large language models. It remains an open...
Physics-Guided Counterfactual Explanations for Large-Scale Multivariate Time Series: Application in Scalable and Interpretable SEP Event Prediction : Abstract: Accurate prediction of solar energetic particle events is vital for safeguarding satellites, astronauts, and space-based infrastructure. Modern space weather monitoring generates massive vol...
Optimising for Energy Efficiency and Performance in Machine Learning : Abstract: The ubiquity of machine learning (ML) and the demand for ever-larger models bring an increase in energy consumption and environmental impact. However, little is known about the energy scalin...
Continuous Fairness On Data Streams : Abstract: We study the problem of enforcing continuous group fairness over windows in data streams. We propose a novel fairness model that ensures group fairness at a finer granularity level (referred...
Breaking the Bottlenecks: Scalable Diffusion Models for 3D Molecular Generation : Abstract: Diffusion models have emerged as a powerful class of generative models for molecular design, capable of capturing complex structural distributions and achieving high fidelity in 3D molecule ...
DriftGuard: A Hierarchical Framework for Concept Drift Detection and Remediation in Supply Chain Forecasting : Abstract: Supply chain forecasting models degrade over time as real-world conditions change. Promotions shift, consumer preferences evolve, and supply disruptions alter demand patterns, causing what i...
XGBoost Forecasting of NEPSE Index Log Returns with Walk Forward Validation : Abstract: This study develops a robust machine learning framework for one-step-ahead forecasting of daily log-returns in the Nepal Stock Exchange (NEPSE) Index using the XGBoost regressor. A comprehen...
Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models : Abstract: We introduce Spectral Generative Flow Models (SGFMs), a physics-inspired alternative to transformer-based large language models. Instead of representing text or video as sequences of discret...
Attention Consistency Regularization for Interpretable Early-Exit Neural Networks : Abstract: Early-exit neural networks enable adaptive inference by allowing predictions at intermediate layers, reducing computational cost. However, early exits often lack interpretability and may foc...

Research Sources: 270 | Generated: 1/15/2026