AI Research News Feeds for December 12th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion : Abstract: Multi-modal image fusion (MMIF) enhances the information content of the fused image by combining the unique as well as common features obtained from different modality sensor images, improvi...
Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints : Abstract: 3D human generation is increasingly significant in various applications. However, the direct use of 2D generative methods in 3D generation often results in losing local details, while method...
Effective Online Exam Proctoring by Combining Lightweight Face Detection and Deep Recognition : Abstract: Online exams, conducted via video conferencing platforms such as Zoom, have become popular in educational institutions since COVID-19. While convenient, ensuring the integrity and security o...
Dual Cluster Contrastive learning for Object Re-Identification : Abstract: Recently, cluster contrastive learning has been proven effective for object ReID by computing the contrastive loss between the individual features and the cluster memory. However, existing m...
Design of a six wheel suspension and a three-axis linear actuation mechanism for a laser weeding robot : Abstract: Mobile robots are increasingly utilized in agriculture to automate labor-intensive tasks such as weeding, sowing, harvesting and soil analysis. Recently, agricultural robots have been develo...
StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space : Abstract: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canon...
WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World : Abstract: Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally. Despite rap...
Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision : Abstract: The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map monocular visual input to control ...
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization : Abstract: Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on hol...
Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration : Abstract: In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask if samples can be generated co...
E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training : Abstract: Self-supervised pre-training has revolutionized foundation models for languages, individual 2D images and videos, but remains largely unexplored for learning 3D-aware representations from mu...
ClusIR: Towards Cluster-Guided All-in-One Image Restoration : Abstract: All-in-One Image Restoration (AiOIR) aims to recover high-quality images from diverse degradations within a unified framework. However, existing methods often fail to explicitly model degrad...
Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving : Abstract: We present Flex, an efficient and effective scene encoder that addresses the computational bottleneck of processing high-volume multi-camera data in end-to-end autonomous driving. Flex emplo...
MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation : Abstract: This paper proposes a large-scale multi-modal dataset for referring motion expression video segmentation, focusing on segmenting and tracking target objects in videos based on language descr...
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language : Abstract: We introduce VL-JEPA, a vision-language model built on a Joint Embedding Predictive Architecture (JEPA). Instead of autoregressively generating tokens as in classical VLMs, VL-JEPA predicts ...
GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting : Abstract: Speech-driven talking heads have recently emerged and enable interactive avatars. However, real-world applications are limited, as current methods achieve high visual fidelity but slow or fa...
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos : Abstract: Motion understanding is fundamental to physical reasoning, enabling models to infer dynamics and predict future states. However, state-of-the-art models still struggle on recent motion bench...
DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance : Abstract: Recent vision-language model (VLM)-based approaches have achieved impressive results on SVG generation. However, because they generate only text and lack visual signals during decoding, they...
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction : Abstract: Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in develo...
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos : Abstract: Motion capture now underpins content creation far beyond digital humans, yet most existing pipelines remain species- or template-specific. We formalize this gap as Category-Agnostic Motion C...
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models : Abstract: This paper introduces the concept of Microscopic Spatial Intelligence (MiSI), the capability to perceive and reason about the spatial relationships of invisible microscopic entities, which i...
SWiT-4D: Sliding-Window Transformer for Lossless and Parameter-Free Temporal 4D Generation : Abstract: Despite significant progress in 4D content generation, the conversion of monocular videos into high-quality animated 3D assets with explicit 4D meshes remains considerably challenging. The s...
PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning : Abstract: 6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly con...
Self-Ensemble Post Learning for Noisy Domain Generalization : Abstract: While computer vision and machine learning have made great progress, their robustness is still challenged by two key issues: data distribution shift and label noise. When domain generalizati...
Graph Laplacian Transformer with Progressive Sampling for Prostate Cancer Grading : Abstract: Prostate cancer grading from whole-slide images (WSIs) remains a challenging task due to the large-scale nature of WSIs, the presence of heterogeneous tissue structures, and difficulty of se...
Blood Pressure Prediction for Coronary Artery Disease Diagnosis using Coronary Computed Tomography Angiography : Abstract: Computational fluid dynamics (CFD) based simulation of coronary blood flow provides valuable hemodynamic markers, such as pressure gradients, for diagnosing coronary artery disease (CAD). Ho...
LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation : Abstract: Colonoscopic polyp diagnosis is pivotal for early colorectal cancer detection, yet traditional automated reporting suffers from inconsistencies and hallucinations due to the scarcity of high...
IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation : Abstract: Recent advances in motion-aware large language models have shown remarkable promise for unifying motion understanding and generation tasks. However, these models typically treat understandin...
Video Depth Propagation : Abstract: Depth estimation in videos is essential for visual perception in real-world applications. However, existing methods either rely on simple frame-by-frame monocular models, leading to temporal...
SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving : Abstract: End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities...
CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images : Abstract: Uncertainty estimation is essential for the safe clinical deployment of medical image segmentation systems, enabling the identification of unreliable predictions and supporting human oversig...
Geo6DPose: Fast Zero-Shot 6D Object Pose Estimation via Geometry-Filtered Feature Matching : Abstract: Recent progress in zero-shot 6D object pose estimation has been driven largely by large-scale models and cloud-based inference. However, these approaches often introduce high latency, elevat...
XDen-1K: A Density Field Dataset of Real-World Objects : Abstract: A deep understanding of the physical world is a central goal for embodied AI and realistic simulation. While current models excel at capturing an object's surface geometry and appearance, th...
NaviHydra: Controllable Navigation-guided End-to-end Autonomous Driving with Hydra-distillation : Abstract: The complexity of autonomous driving scenarios requires robust models that can interpret high-level navigation commands and generate safe trajectories. While traditional rule-based systems c...
TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection : Abstract: Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecti...
K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices : Abstract: Point tracking in video sequences is a foundational capability for real-world computer vision applications, including robotics, autonomous systems, augmented reality, and video analysis. Whi...
DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM : Abstract: Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language m...
Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces : Abstract: We present Lang2Motion, a framework for language-guided point trajectory generation by aligning motion manifolds with joint embedding spaces. Unlike prior work focusing on human motion or vi...
Robust Multi-Disease Retinal Classification via Xception-Based Transfer Learning and W-Net Vessel Segmentation : Abstract: In recent years, the incidence of vision-threatening eye diseases has risen dramatically, necessitating scalable and accurate screening solutions. This paper presents a comprehensive study o...
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos : Abstract: We propose Track and Caption Any Motion (TCAM), a motion-centric framework for automatic video understanding that discovers and describes motion patterns without user queries. Understanding ...
Salient Object Detection in Complex Weather Conditions via Noise Indicators : Abstract: Salient object detection (SOD), a foundational task in computer vision, has advanced from single-modal to multi-modal paradigms to enhance generalization. However, most existing SOD methods ...
Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration : Abstract: All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework, yet existing methods increasingly rely on complex architectu...
Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner : Abstract: Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlo...
Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks : Abstract: Isolated Sign Language Recognition (ISLR) is critical for bridging the communication gap between the Deaf and Hard-of-Hearing (DHH) community and the hearing world. However, robust ISLR is f...
Grounding Everything in Tokens for Multimodal Large Language Models : Abstract: Multimodal large language models (MLLMs) have made significant advancements in vision understanding and reasoning. However, the autoregressive Transformer architecture used by MLLMs requries...
Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding : Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress on various vision-language tasks, yet their visual perception remains limited. Humans, in comparison, perceive comp...
Take a Peek: Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA : Abstract: Few-shot semantic segmentation (FSS) aims to segment novel classes in query images using only a small annotated support set. While prior research has mainly focused on improving decoders, th...
3D Blood Pulsation Maps : Abstract: We present Pulse3DFace, the first dataset of its kind for estimating 3D blood pulsation maps. These maps can be used to develop models of dynamic facial blood pulsation, enabling the creatio...
Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network : Abstract: Shape-from-Focus (SFF) is a passive depth estimation technique that infers scene depth by analyzing focus variations in a focal stack. Most recent deep learning-based SFF methods typically o...
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment : Abstract: Existing frameworks for learned video compression suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separa...
Neural Collapse in Test-Time Adaptation : Abstract: Test-Time Adaptation (TTA) enhances model robustness to out-of-distribution (OOD) data by updating the model online during inference, yet existing methods lack theoretical insights into the ...
TransLocNet: Cross-Modal Attention for Aerial-Ground Vehicle Localization with Contrastive Learning : Abstract: Aerial-ground localization is difficult due to large viewpoint and modality gaps between ground-level LiDAR and overhead imagery. We propose TransLocNet, a cross-modal attention framework th...
MultiHateLoc: Towards Temporal Localisation of Multimodal Hate Content in Online Videos : Abstract: The rapid growth of video content on platforms such as TikTok and YouTube has intensified the spread of multimodal hate speech, where harmful cues emerge subtly and asynchronously across vis...
Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method : Abstract: High-quality point cloud data is a critical foundation for tasks such as autonomous driving and 3D reconstruction. However, LiDAR-based point cloud acquisition is often affected by various d...
Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching : Abstract: Accurate spatial understanding is essential for image-guided surgery, augmented reality integration and context awareness. In minimally invasive procedures, where visual input is the sole in...
RaLiFlow: Scene Flow Estimation with 4D Radar and LiDAR Point Clouds : Abstract: Recent multimodal fusion methods, integrating images with LiDAR point clouds, have shown promise in scene flow estimation. However, the fusion of 4D millimeter wave radar and LiDAR remains u...
Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a state-of-the-art method for novel view synthesis. However, its performance heavily relies on dense, high-quality input imagery, an assumption th...
Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos : Abstract: Zero-shot Long Video Moment Retrieval (ZLVMR) is the task of identifying temporal segments in hour-long videos using a natural language query without task-specific training. The core technic...
Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task : Abstract: Video Question Answering (VideoQA) task serves as a critical playground for evaluating whether foundation models can effectively perceive, understand, and reason about dynamic real-world sce...
mmCounter: Static People Counting in Dense Indoor Scenarios Using mmWave Radar : Abstract: mmWave radars struggle to detect or count individuals in dense, static (non-moving) groups due to limitations in spatial resolution and reliance on movement for detection. We present mmCount...
Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation : Abstract: Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders ...
Topology-Agnostic Animal Motion Generation from Text Prompt : Abstract: Motion generation is fundamental to computer animation and widely used across entertainment, robotics, and virtual environments. While recent methods achieve impressive results, most rely on...
CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates : Abstract: Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions to...
Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution : Abstract: Real-World Image Super-Resolution (Real-ISR) aims to recover high-quality images from low-quality inputs degraded by unknown and complex real-world factors. Real-world scenarios involve dive...
A Conditional Generative Framework for Synthetic Data Augmentation in Segmenting Thin and Elongated Structures in Biological Images : Abstract: Thin and elongated filamentous structures, such as microtubules and actin filaments, often play important roles in biological systems. Segmenting these filaments in biological images is a fu...
Simple Yet Effective Selective Imputation for Incomplete Multi-view Clustering : Abstract: Incomplete multi-view data, where different views suffer from missing and unbalanced observations, pose significant challenges for clustering. Existing imputation-based methods attempt to es...
StainNet: A Special Staining Self-Supervised Vision Transformer for Computational Pathology : Abstract: Foundation models trained with self-supervised learning (SSL) on large-scale histological images have significantly accelerated the development of computational pathology. These models can s...
EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs : Abstract: Audio-Visual Large Language Models (AV-LLMs) face prohibitive computational overhead from massive audio and video tokens. Token reduction, while extensively explored for video-only LLMs, is ...
Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset : Abstract: We propose a novel generative approach for 3D human pose estimation. 3D human pose estimation poses several key challenges due to the complex geometry of the human body, self-occluding joint...
ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle...
DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra...
Efficient-VLN: A Training-Efficient Vision-Language Navigation Model : Abstract: Multimodal large language models (MLLMs) have shown promising potential in Vision-Language Navigation (VLN). However, their practical development is severely hindered by the substantial trai...
Physically Aware 360$^\circ$ View Generation from a Single Image using Disentangled Scene Embeddings : Abstract: We introduce Disentangled360, an innovative 3D-aware technology that integrates the advantages of direction disentangled volume rendering with single-image 360° unique view synthesis for app...
ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions : Abstract: Shot transitions play a pivotal role in multi-shot video generation, as they determine the overall narrative expression and the directorial design of visual storytelling. However, recent pro...
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation : Abstract: Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, ex...
Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction : Abstract: Recent advances in generalizable Gaussian splatting (GS) have enabled feed-forward reconstruction of scenes from tens of input views. Long-LRM notably scales this paradigm to 32 input images...
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models : Abstract: Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual...
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule : Abstract: Accurate segmentation of cardiac chambers in echocardiography sequences is crucial for the quantitative analysis of cardiac function, aiding in clinical diagnosis and treatment. The imaging ...
THE-Pose: Topological Prior with Hybrid Graph Fusion for Estimating Category-Level 6D Object Pose : Abstract: Category-level object pose estimation requires both global context and local structure to ensure robustness against intra-class variations. However, 3D graph convolution (3D-GC) methods only...
Multi-dimensional Preference Alignment by Conditioning Reward Itself : Abstract: Reinforcement Learning from Human Feedback has emerged as a standard for aligning diffusion models. However, we identify a fundamental limitation in the standard DPO formulation because it r...
Emerging Standards for Machine-to-Machine Video Coding : Abstract: Machines are increasingly becoming the primary consumers of visual data, yet most deployments of machine-to-machine systems still rely on remote inference where pixel-based video is streamed...
Latent Chain-of-Thought World Modeling for End-to-End Driving : Abstract: Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior wor...
Feature Coding for Scalable Machine Vision : Abstract: Deep neural networks (DNNs) drive modern machine vision but are challenging to deploy on edge devices due to high compute demands. Traditional approaches-running the full model on-device or ...
Topological Conditioning for Mammography Models via a Stable Wavelet-Persistence Vectorization : Abstract: Breast cancer is the most commonly diagnosed cancer in women and a leading cause of cancer death worldwide. Screening mammography reduces mortality, yet interpretation still suffers from sub...
Hierarchical Instance Tracking to Balance Privacy Preservation with Accessible Information : Abstract: We propose a novel task, hierarchical instance tracking, which entails tracking all instances of predefined categories of objects and parts, while maintaining their hierarchical relationship...
TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing : Abstract: We present TraceFlow, a novel framework for high-fidelity rendering of dynamic specular scenes by addressing two key challenges: precise reflection direction estimation and physically accura...
Neuromorphic Eye Tracking for Low-Latency Pupil Detection : Abstract: Eye tracking for wearable systems demands low latency and milliwatt-level power, but conventional frame-based pipelines struggle with motion blur, high compute cost, and limited temporal res...
The Spatial Semantics of Iconic Gesture : Abstract: The current multimodal turn in linguistic theory leaves a crucial question unanswered: what is the meaning of iconic gestures, and how does it compose with speech meaning? We argue for a sep...
CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences : Abstract: Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems ...
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation : Abstract: Automatic audio captioning is essential for audio understanding, enabling applications such as accessibility and content indexing. However, evaluating the quality of audio captions remains a...
Watermarks for Language Models via Probabilistic Automata : Abstract: A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high de...
Diffusion Is Your Friend in Show, Suggest and Tell : Abstract: Diffusion Denoising models demonstrated impressive results across generative Computer Vision tasks, but they still fail to outperform standard autoregressive solutions in the discrete domain...
Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning : Abstract: Urban regeneration presents significant challenges within the context of urbanization, requiring adaptive approaches to tackle evolving needs. Leveraging advancements in large language model...
Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity : Abstract: Emotions are central to politics and analyzing their role in political communication has a long tradition. As research increasingly leverages audio-visual materials to analyze the display of...
Quantifying Emotional Tone in Tolkien's The Hobbit: Dialogue Sentiment Analysis with RegEx, NRC-VAD, and Python : Abstract: This study analyzes the emotional tone of dialogue in J. R. R. Tolkien's The Hobbit (1937) using computational text analysis. Dialogue was extracted with regular expressions, then preprocess...
TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage : Abstract: Emergency speech recognition systems exhibit systematic performance degradation on non-standard English varieties, creating a critical gap in services for Caribbean populations. We present T...
From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages : Abstract: Large language models are commonly trained on dominant languages like English, and their representation of low resource languages typically reflects cultural and linguistic biases present in...
AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence : Abstract: Despite rapid advances in multimodal large language models, agricultural applications remain constrained by the lack of multilingual speech data, unified multimodal architectures, and compre...
RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems : Abstract: Reward modeling has become a cornerstone of aligning large language models (LLMs) with human preferences. Yet, when extended to subjective and open-ended domains such as role play, existing ...
XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs : Abstract: Current large language models (LLMs) are trained on massive amounts of text data, primarily from a few dominant languages. Studies suggest that this over-reliance on high-resource languages,...
Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs : Abstract: What counts as evidence for syntactic structure? In traditional generative grammar, systematic contrasts in grammaticality such as subject-auxiliary inversion and the licensing of parasitic ...
Decoding Student Minds: Leveraging Conversational Agents for Psychological and Learning Analysis : Abstract: This paper presents a psychologically-aware conversational agent designed to enhance both learning performance and emotional well-being in educational settings. The system combines Large Lan...
Enhancing Next-Generation Language Models with Knowledge Graphs: Extending Claude, Mistral IA, and GPT-4 via KG-BERT : Abstract: Large language models (LLMs) like Claude, Mistral IA, and GPT-4 excel in NLP but lack structured knowledge, leading to factual inconsistencies. We address this by integrating Knowledge Graph...
Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature : Abstract: The integrity and reliability of scientific literature is facing a serious threat by adversarial text generation techniques, specifically from the use of automated paraphrasing tools to mask...
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground : Abstract: We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense to...
Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models : Abstract: We explore the use of small language models (SLMs) for automatic question generation as a complement to the prevalent use of their large counterparts in learning analytics research. We prese...
Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts : Abstract: Transfer Learning (TL) is currently the most effective approach for modeling building thermal dynamics when only limited data are available. TL uses a pretrained model that is fine-tuned to ...
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization : Abstract: Current image generation methods are based on a two-stage training approach. In stage 1, an auto-encoder is trained to compress an image into a latent space; in stage 2, a generative model i...
Extrapolating Jet Radiation with Autoregressive Transformers : Abstract: Generative networks are an exciting tool for fast LHC event fixed number of particles. Autoregressive transformers allow us to generate events containing variable numbers of particles, very ...
Deep Operator BSDE: a Numerical Scheme to Approximate Solution Operators : Abstract: Motivated by dynamic risk measures and conditional $g$-expectations, in this work we propose a numerical method to approximate the solution operator given by a Backward Stochastic Differenti...
IRG: Modular Synthetic Relational Database Generation with Complex Relational Schemas : Abstract: Relational databases (RDBs) are widely used by corporations and governments to store multiple related tables. Their relational schemas pose unique challenges to synthetic data generation for...
Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration : Abstract: Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data cente...
Enhanced Spatial Clustering of Single-Molecule Localizations with Graph Neural Networks : Abstract: Single-molecule localization microscopy generates point clouds corresponding to fluorophore localizations. Spatial cluster identification and analysis of these point clouds are crucial for e...
Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization : Abstract: Recent studies have shown that deep learning models are very vulnerable to poisoning attacks. Many defense methods have been proposed to address this issue. However, traditional poisoning at...
Noisy Spiking Actor Network for Exploration : Abstract: As a general method for exploration in deep reinforcement learning (RL), NoisyNet can produce problem-specific exploration strategies. Spiking neural networks (SNNs), due to their binary fir...
Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit : Abstract: Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the walls, and the perceptual limita...
Noisy Quantum Learning Theory : Abstract: We develop a framework for learning from noisy quantum experiments, focusing on fault-tolerant devices accessing uncharacterized systems through noisy couplings. Our starting point is the co...
Hermitian Yang--Mills connections on general vector bundles: geometry and physical Yukawa couplings : Abstract: We compute solutions to the Hermitian Yang-Mills equations on holomorphic vector bundles $V$ via an alternating optimisation procedure founded on geometric machine learning. The proposed met...
Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets : Abstract: In this paper, we consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to b...
Iterative Compositional Data Generation for Robot Control : Abstract: Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and m...
A Differentiable Digital Twin of Distributed Link Scheduling for Contention-Aware Networking : Abstract: Many routing and flow optimization problems in wired networks can be solved efficiently using minimum cost flow formulations. However, this approach does not extend to wireless multi-hop net...
Physics-informed Polynomial Chaos Expansion with Enhanced Constrained Optimization Solver and D-optimal Sampling : Abstract: Physics-informed polynomial chaos expansions (PC$^2$) provide an efficient physically constrained surrogate modeling framework by embedding governing equations and other physical constraints...
An Elementary Proof of the Near Optimality of LogSumExp Smoothing : Abstract: We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical s...
Deep sets and event-level maximum-likelihood estimation for fast pile-up jet rejection in ATLAS : Abstract: Multiple proton-proton collisions (pile-up) occur at every bunch crossing at the LHC, with the mean number of interactions expected to reach 80 during Run 3 and up to 200 at the High-Luminos...
Quantum Approaches to Urban Logistics: From Core QAOA to Clustered Scalability : Abstract: The Traveling Salesman Problem (TSP) is a fundamental challenge in combinatorial optimization, widely applied in logistics and transportation. As the size of TSP instances grows, traditional...
Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting : Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. In many such settings, speakers of Indian languages frequently communicate using romaniz...
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification : Abstract: Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also insepara...
PMB-NN: Physiology-Centred Hybrid AI for Personalized Hemodynamic Monitoring from Photoplethysmography : Abstract: Continuous monitoring of blood pressure (BP) and hemodynamic parameters such as peripheral resistance (R) and arterial compliance (C) are critical for early vascular dysfunction detection. W...
Sharp Monocular View Synthesis in Less Than a Second : Abstract: We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted ...
Optimal transport unlocks end-to-end learning for single-molecule localization : Abstract: Single-molecule localization microscopy (SMLM) allows reconstructing biology-relevant structures beyond the diffraction limit by detecting and localizing individual fluorophores -- fluoresce...
Virtual camera detection: Catching video injection attacks in remote biometric systems : Abstract: Face anti-spoofing (FAS) is a vital component of remote biometric authentication systems based on facial recognition, increasingly used across web-based applications. Among emerging threats,...
Adaptive Intrusion Detection System Leveraging Dynamic Neural Models with Adversarial Learning for 5G/6G Networks : Abstract: Intrusion Detection Systems (IDS) are critical components in safeguarding 5G/6G networks from both internal and external cyber threats. While traditional IDS approaches rely heavily on signa...
Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs : Abstract: Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are largely passive; they provide only post-hoc own...
Topology-Guided Quantum GANs for Constrained Graph Generation : Abstract: Quantum computing (QC) promises theoretical advantages, benefiting computational problems that would not be efficiently classically simulatable. However, much of this theoretical speedup dep...
Flexible Deep Neural Networks for Partially Linear Survival Data : Abstract: We propose a flexible deep neural network (DNN) framework for modeling survival data within a partially linear regression structure. The approach preserves interpretability through a paramet...
Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models : Abstract: In context learning (ICL) underpins recent advances in large language models (LLMs), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop ...
Hyperspectral Image Data Reduction for Endmember Extraction : Abstract: Endmember extraction from hyperspectral images aims to identify the spectral signatures of materials present in a scene. Recent studies have shown that self-dictionary methods can achieve hi...
From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection : Abstract: Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggest...
Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds : Abstract: This paper introduces a new probabilistic framework for supervised learning in neural systems. It is designed to model complex, uncertain systems whose random outputs are strongly non-Gaussi...
Diffusion differentiable resampling : Abstract: This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly p...
RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI : Abstract: Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference accelerati...
Residual subspace evolution strategies for nonlinear inverse problems : Abstract: Nonlinear inverse problems often feature noisy, non-differentiable, or expensive residual evaluations that make Jacobian-based solvers unreliable. Popular derivative-free optimizers such as ...
Tracking large chemical reaction networks and rare events by neural networks : Abstract: Chemical reaction networks are widely used to model stochastic dynamics in chemical kinetics, systems biology and epidemiology. Solving the chemical master equation that governs these system...
Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels : Abstract: We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, und...
Solving Semi-Supervised Few-Shot Learning from an Auto-Annotation Perspective : Abstract: Semi-supervised few-shot learning (SSFSL) formulates real-world applications like ''auto-annotation'', as it aims to learn a model over a few labeled and abundant unlabeled examples to annot...
Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap : Abstract: As both ML training and inference are increasingly distributed, parallelization techniques that shard (divide) ML model across GPUs of a distributed system, are often deployed. With such tec...
Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models : Abstract: Semi-analytic models are a widely used approach to simulate galaxy properties within a cosmological framework, relying on simplified yet physically motivated prescriptions. They have also pr...
On Learning-Curve Monotonicity for Maximum Likelihood Estimators : Abstract: The property of learning-curve monotonicity, highlighted in a recent series of work by Loog, Mey and Viering, describes algorithms which only improve in average performance given more data, ...
AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding : Abstract: Evaluating large language models (LLMs) has recently emerged as a critical issue for safe and trustworthy application of LLMs in the medical domain. Although a variety of static medical ques...
The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights : Abstract: We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descen...
Semantic-Aware Confidence Calibration for Automated Audio Captioning : Abstract: Automated audio captioning models frequently produce overconfident predictions regardless of semantic accuracy, limiting their reliability in deployment. This deficiency stems from two facto...
Inference for Batched Adaptive Experiments : Abstract: The advantages of adaptive experiments have led to their rapid adoption in economics, other fields, as well as among practitioners. However, adaptive experiments pose challenges for causal i...
STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale : Abstract: Real-world ecommerce recommender systems must deliver relevant items under strict tens-of-milliseconds latency constraints despite challenges such as cold-start products, rapidly shifting us...
A Model-Guided Neural Network Method for the Inverse Scattering Problem : Abstract: Inverse medium scattering is an ill-posed, nonlinear wave-based imaging problem arising in medical imaging, remote sensing, and non-destructive testing. Machine learning (ML) methods offer i...
Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation : Abstract: Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requiremen...
Independent Density Estimation : Abstract: Large-scale Vision-Language models have achieved remarkable results in various domains, such as image captioning and conditioned image generation. Neverthe- less, these models still encounte...
LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes : Abstract: Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively litt...
Enhancing Fake-News Detection with Node-Level Topological Features : Abstract: In recent years, the proliferation of misinformation and fake news has posed serious threats to individuals and society, spurring intense research into automated detection methods. Previous ...
TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 : Abstract: The rapid growth of Web3.0 is transforming the Internet from a centralized structure to decentralized, which empowers users with unprecedented self-sovereignty over their own data. However, ...
QSTAformer: A Quantum-Enhanced Transformer for Robust Short-Term Voltage Stability Assessment against Adversarial Attacks : Abstract: Short-term voltage stability assessment (STVSA) is critical for secure power system operation. While classical machine learning-based methods have demonstrated strong performance, they still...
Bidirectional Normalizing Flow: From Data to Noise and Back : Abstract: Normalizing Flows (NFs) have been established as a principled framework for generative modeling. Standard NFs consist of a forward process and a reverse process: the forward process maps dat...
Asynchronous Reasoning: Training-Free Interactive Thinking LLMs : Abstract: Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive: giv...
Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation : Abstract: Autonomous navigation in underwater environments remains a major challenge due to the absence of GPS, degraded visibility, and the presence of submerged obstacles. This article investigates ...
Physics-Informed Learning of Flow Distribution and Receiver Heat Losses in Parabolic Trough Solar Fields : Abstract: Parabolic trough Concentrating Solar Power (CSP) plants operate large hydraulic networks of collector loops that must deliver a uniform outlet temperature despite spatially heterogeneous opt...
Classifier Reconstruction Through Counterfactual-Aware Wasserstein Prototypes : Abstract: Counterfactual explanations provide actionable insights by identifying minimal input changes required to achieve a desired model prediction. Beyond their interpretability benefits, counterfa...
Guided Transfer Learning for Discrete Diffusion Models : Abstract: Discrete diffusion models achieve strong performance across language and other discrete domains, providing a powerful alternative to autoregressive models. However, their strong performance ...
Scaling Behavior of Discrete Diffusion Language Models : Abstract: Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion ...
Bayesian Symbolic Regression via Posterior Sampling : Abstract: Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. This paper introduces a Sequentia...
Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments : Abstract: This paper introduces a reinforcement learning framework that enables controllable and diverse player behaviors without relying on human gameplay data. Existing approaches often require larg...
Interpretable and Steerable Concept Bottleneck Sparse Autoencoders : Abstract: Sparse autoencoders (SAEs) promise a unified approach for mechanistic interpretability, concept discovery, and model steering in LLMs and LVLMs. However, realizing this potential requires th...
Template-Free Retrosynthesis with Graph-Prior Augmented Transformers : Abstract: Retrosynthesis reaction prediction seeks to infer plausible reactant molecules for a given product and is a central problem in computer-aided organic synthesis. Despite recent progress, many...
Generalized Spherical Neural Operators: Green's Function Formulation : Abstract: Neural operators offer powerful approaches for solving parametric partial differential equations, but extending them to spherical domains remains challenging due to the need to preserve intr...
Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality : Abstract: Deep generative models, while revolutionizing fields like image and text generation, largely operate as opaque black boxes, hindering human understanding, control, and alignment. While metho...
HybridVFL: Disentangled Feature Learning for Edge-Enabled Vertical Federated Multimodal Classification : Abstract: Vertical Federated Learning (VFL) offers a privacy-preserving paradigm for Edge AI scenarios like mobile health diagnostics, where sensitive multimodal data reside on distributed, resource-c...
Learning by Analogy: A Causal Framework for Composition Generalization : Abstract: Compositional generalization -- the ability to understand and generate novel combinations of learned concepts -- enables models to extend their capabilities beyond limited experiences. While...
DCFO Additional Material : Abstract: Outlier detection identifies data points that significantly deviate from the majority of the data distribution. Explaining outliers is crucial for understanding the underlying factors that c...
Token Sample Complexity of Attention : Abstract: As context windows in large language models continue to expand, it is essential to characterize how attention behaves at extreme sequence lengths. We introduce token-sample complexity: the r...
Supporting Migration Policies with Forecasts: Illegal Border Crossings in Europe through a Mixed Approach : Abstract: This paper presents a mixed-methodology to forecast illegal border crossings in Europe across five key migratory routes, with a one-year time horizon. The methodology integrates machine lear...
Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification : Abstract: Bayesian Neural Networks (BNNs) provide principled uncertainty quantification but suffer from substantial computational and memory overhead compared to deterministic networks. While quantiza...
Multi-Objective Reward and Preference Optimization: Theory and Algorithms : Abstract: This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models. T...
THeGAU: Type-Aware Heterogeneous Graph Autoencoder and Augmentation : Abstract: Heterogeneous Graph Neural Networks (HGNNs) are effective for modeling Heterogeneous Information Networks (HINs), which encode complex multi-typed entities and relations. However, HGNNs ofte...
Is the Information Bottleneck Robust Enough? Towards Label-Noise Resistant Information Bottleneck Learning : Abstract: The Information Bottleneck (IB) principle facilitates effective representation learning by preserving label-relevant information while compressing irrelevant information. However, its strong...
Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders : Abstract: The Key-Value (KV) cache is the primary memory bottleneck in long-context Large Language Models, yet it is typically treated as an opaque numerical tensor. In this work, we propose \textbf{S...
Mode-Seeking for Inverse Problems with Diffusion Models : Abstract: A pre-trained unconditional diffusion model, combined with posterior sampling or maximum a posteriori (MAP) estimation techniques, can solve arbitrary inverse problems without task-specific ...
Disentangled and Distilled Encoder for Out-of-Distribution Reasoning with Rademacher Guarantees : Abstract: Recently, the disentangled latent space of a variational autoencoder (VAE) has been used to reason about multi-label out-of-distribution (OOD) test samples that are derived from different di...
Hybrid Physics-ML Model for Forward Osmosis Flux with Complete Uncertainty Quantification : Abstract: Forward Osmosis (FO) is a promising low-energy membrane separation technology, but challenges in accurately modelling its water flux (Jw) persist due to complex internal mass transfer phenom...
Metacognitive Sensitivity for Test-Time Dynamic Model Selection : Abstract: A key aspect of human cognition is metacognition - the ability to assess one's own knowledge and judgment reliability. While deep learning models can express confidence in their predictions,...
The Operator Origins of Neural Scaling Laws: A Generalized Spectral Transport Dynamics of Deep Learning : Abstract: Modern deep networks operate in a rough, finite-regularity regime where Jacobian-induced operators exhibit heavy-tailed spectra and strong basis drift. In this work, we derive a unified oper...
Fitting magnetization data using continued fraction of straight lines : Abstract: Magnetization of a ferromagnetic substance in response to an externally applied magnetic field increases with the strength of the field. This is because at the microscopic level, magnetic mo...
Better Prevent than Tackle: Valuing Defense in Soccer Based on Graph Neural Networks : Abstract: Evaluating defensive performance in soccer remains challenging, as effective defending is often expressed not through visible on-ball actions such as interceptions and tackles, but through p...
An Interpretable AI Tool for SAVR vs TAVR in Low to Intermediate Risk Patients with Severe Aortic Stenosis : Abstract: Background. Treatment selection for low to intermediate risk patients with severe aortic stenosis between surgical (SAVR) and transcatheter (TAVR) aortic valve replacement remains variable i...
A Kernel-based Resource-efficient Neural Surrogate for Multi-fidelity Prediction of Aerodynamic Field : Abstract: Surrogate models provide fast alternatives to costly aerodynamic simulations and are extremely useful in design and optimization applications. This study proposes the use of a recent kernel-...
R^2-HGP: A Double-Regularized Gaussian Process for Heterogeneous Transfer Learning : Abstract: Multi-output Gaussian process (MGP) models have attracted significant attention for their flexibility and uncertainty-quantification capabilities, and have been widely adopted in multi-sourc...
Exact Recovery of Non-Random Missing Multidimensional Time Series via Temporal Isometric Delay-Embedding Transform : Abstract: Non-random missing data is a ubiquitous yet undertreated flaw in multidimensional time series, fundamentally threatening the reliability of data-driven analysis and decision-making. Pure low...
MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification : Abstract: We present miniF2F-Dafny, the first translation of the mathematical reasoning benchmark miniF2F to an automated theorem prover: Dafny. Previously, the benchmark existed only in interactive t...
Assessing Neuromorphic Computing for Fingertip Force Decoding from Electromyography : Abstract: High-density surface electromyography (HD-sEMG) provides a noninvasive neural interface for assistive and rehabilitation control, but mapping neural activity to user motor intent remains cha...
CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation : Abstract: In practical deep learning deployment, the scarcity of data and the imbalance of label distributions often lead to semantically uncovered regions within the real-world data distribution, hin...
Rethinking Causal Discovery Through the Lens of Exchangeability : Abstract: Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelli...
Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences : Abstract: Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availabil...
Sequence-to-Image Transformation for Sequence Classification Using Rips Complex Construction and Chaos Game Representation : Abstract: Traditional feature engineering approaches for molecular sequence classification suffer from sparsity issues and computational complexity, while deep learning models often underperform on ta...
Partitioning the Sample Space for a More Precise Shannon Entropy Estimation : Abstract: Reliable data-driven estimation of Shannon entropy from small data sets, where the number of examples is potentially smaller than the number of possible outcomes, is a critical matter in sev...
\textsc{Text2Graph}: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios : Abstract: Large Language Models (LLMs) have become effective zero-shot classifiers, but their high computational requirements and environmental costs limit their practicality for large-scale annotatio...
Mitigating Exposure Bias in Risk-Aware Time Series Forecasting with Soft Tokens : Abstract: Autoregressive forecasting is central to predictive control in diabetes and hemodynamic management, where different operating zones carry different clinical risks. Standard models trained wi...
Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition : Abstract: Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower...
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation : Abstract: In the unsupervised pre-training for reinforcement learning, the agent aims to learn a prior policy for downstream tasks without relying on task-specific reward functions. We focus on state ...
Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation : Abstract: Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes ...
Latent Action World Models for Control with Unlabeled Trajectories : Abstract: Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely o...
BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization : Abstract: Constructing a Pareto set is pivotal for navigating the capability-efficiency trade-offs in Large Language Models (LLMs); however, existing merging techniques remain inadequate for this task...
HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding : Abstract: Heterogeneous graph neural networks (HGNNs) have demonstrated strong capability in modeling complex semantics across multi-type nodes and relations. However, their scalability to large-scale...
Faster Results from a Smarter Schedule: Reframing Collegiate Cross Country through Analysis of the National Running Club Database : Abstract: Collegiate cross country teams often build their season schedules on intuition rather than evidence, partly because large-scale performance datasets are not publicly accessible. To address t...
Risk-Bounded Multi-Agent Visual Navigation via Iterative Risk Allocation : Abstract: Safe navigation is essential for autonomous systems operating in hazardous environments, especially when multiple agents must coordinate using only high-dimensional visual observations. Whil...
MaskedManipulator: Versatile Whole-Body Manipulation : Abstract: We tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, t...
PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving : Abstract: While significant progress has been made in research and development on open-source and cost-efficient large-language models (LLMs), serving scalability remains a critical challenge, particu...
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts : Abstract: We introduce ShapeWords, an approach for synthesizing images based on 3D shape guidance and text prompts. ShapeWords incorporates target 3D shape information within specialized tokens embedd...
Brain-like emergent properties in deep networks: impact of network architecture, datasets and training : Abstract: Despite the rapid pace at which deep networks are improving on standardized vision benchmarks, they are still outperformed by humans on real-world vision tasks. One solution to this problem ...
BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation : Abstract: Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, espec...
Machine Learning for Quantifier Selection in cvc5 : Abstract: In this work we considerably improve the state-of-the-art SMT solving on first-order quantified problems by efficient machine learning guidance of quantifier selection. Quantifiers represent...
Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning : Abstract: Multi-robot path finding in dynamic environments is a highly challenging classic problem. In the movement process, robots need to avoid collisions with other moving robots while minimizing t...
SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model : Abstract: We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle...
Hierarchical Dataset Selection for High-Quality Data Sharing : Abstract: The success of modern machine learning hinges on access to high-quality training data. In many real-world scenarios, such as acquiring data from public repositories or sharing across institu...
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation : Abstract: Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generation recently. However, applyin...
ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning : Abstract: Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapi...
AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation : Abstract: Recent advances in subject-driven video generation with large diffusion models have enabled personalized content synthesis conditioned on user-provided subjects. However, existing methods la...
Mull-Tokens: Modality-Agnostic Latent Thinking : Abstract: Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimodal models exploring the poten...
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis : Abstract: Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-vi...
Stronger Normalization-Free Transformers : Abstract: Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) has demonstrated that alternativ...
Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks : Abstract: The construction of adversarial attacks for neural networks appears to be a crucial challenge for their deployment in various services. To estimate the adversarial robustness of a neural net...
Any4D: Unified Feed-Forward Metric 4D Reconstruction : Abstract: We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction. Any4D directly generates per-pixel motion and geometry predictions for N frames, i...
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models : Abstract: Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework ...
Decoupled Q-Chunking : Abstract: Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to boots...
SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale : Abstract: The resource requirements of Neural Networks can be significantly reduced through pruning -- the removal of seemingly less important parameters. However, with the rise of Large Language Mode...
UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting : Abstract: We study long-horizon exogenous-only temperature forecasting - a challenging univariate setting where only the past values of the indoor temperature are used for prediction - using linear an...
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence : Abstract: Spatial understanding over continuous visual input is crucial for MLLMs to evolve into general-purpose assistants in physical environments. Yet there is still no comprehensive benchmark that...
Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants : Abstract: Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are ofte...
Extrapolation of Periodic Functions Using Binary Encoding of Continuous Numerical Values : Abstract: We report the discovery that binary encoding allows neural networks to extrapolate periodic functions beyond their training bounds. We introduce Normalized Base-2 Encoding (NB2E) as a method...
What matters for Representation Alignment: Global Information or Spatial Structure? : Abstract: Representation alignment (REPA) guides generative training by distilling representations from a strong, pretrained vision encoder to intermediate diffusion features. We investigate a fundame...
LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification : Abstract: LabelFusion is a fusion ensemble for text classification that learns to combine a traditional transformer-based classifier (e.g., RoBERTa) with one or more Large Language Models (LLMs such a...
The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality : Abstract: We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate ...
Natural Language Interface for Firewall Configuration : Abstract: This paper presents the design and prototype implementation of a natural language interface for configuring enterprise firewalls. The framework allows administrators to express access contro...
Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving : Abstract: Generative AI offers new opportunities for individualized and adaptive learning, particularly through large language model (LLM)-based feedback systems. While LLMs can produce effective feed...
Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation : Abstract: Achieving high-performing language models which include medium- and lower-resource languages remains a challenge. Massively multilingual models still underperform compared to language-specif...
Metaphor-based Jailbreaking Attacks on Text-to-Image Models : Abstract: Text-to-image~(T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreaking attacks have shown that adversarial promp...
Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework : Abstract: The rapid adoption of generative AI has undermined traditional modular assessments in computing education, creating a disconnect between academic evaluation and industry practice. This paper...
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving : Abstract: Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also insepara...
LGAN: An Efficient High-Order Graph Neural Network via the Line Graph Aggregation : Abstract: Graph Neural Networks (GNNs) have emerged as a dominant paradigm for graph classification. Specifically, most existing GNNs mainly rely on the message passing strategy between neighbor nodes...
Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation : Abstract: Textual data used to train large language models (LLMs) exhibits multifaceted bias manifestations encompassing harmful language and skewed demographic distributions. Regulations such as the ...
PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code : Abstract: Large Language Model (LLM)-based code assistants have emerged as a powerful application of generative AI, demonstrating impressive capabilities in code generation and comprehension. A key re...
How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning : Abstract: Connected and automated vehicles (CAVs) have the potential to enhance driving safety, for example by enabling safe vehicle following and more efficient traffic scheduling. For such future de...
Rethinking Popularity Bias in Collaborative Filtering via Analytical Vector Decomposition : Abstract: Popularity bias fundamentally undermines the personalization capabilities of collaborative filtering (CF) models, causing them to disproportionately recommend popular items while neglecting ...
Evaluating Gemini Robotics Policies in a Veo World Simulator : Abstract: Generative world models hold significant potential for simulating interactions with visuomotor policies in varied environments. Frontier video models can enable generation of realistic obser...
Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval : Abstract: Semantic retrieval of remote sensing (RS) images is a critical task fundamentally challenged by the \textquote{semantic gap}, the discrepancy between a model's low-level visual features and ...
LLM-Auction: Generative Auction towards LLM-Native Advertising : Abstract: The rapid advancement of large language models (LLMs) necessitates novel monetization strategies, among which LLM-native advertising has emerged as a promising paradigm by naturally integrat...
Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning : Abstract: Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often re...
UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning : Abstract: Robust adversarial reinforcement learning has emerged as an effective paradigm for training agents to handle uncertain disturbance in real environments, with critical applications in sequent...
T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method : Abstract: Neural network constraint satisfaction is crucial for safety-critical applications such as power system optimization, robotic path planning, and autonomous driving. However, existing constra...
Maximum Risk Minimization with Random Forests : Abstract: We consider a regression setting where observations are collected in different environments modeled by different data distributions. The field of out-of-distribution (OOD) generalization aim...
Clustered Federated Learning with Hierarchical Knowledge Distillation : Abstract: Clustered Federated Learning (CFL) has emerged as a powerful approach for addressing data heterogeneity and ensuring privacy in large distributed IoT environments. By clustering clients and ...
An M-Health Algorithmic Approach to Identify and Assess Physiotherapy Exercises in Real Time : Abstract: This work presents an efficient algorithmic framework for real-time identification, classification, and evaluation of human physiotherapy exercises using mobile devices. The proposed method ...
Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers : Abstract: Since large language models (LLMs) have a tendency to generate factually inaccurate output, retrieval-augmented generation (RAG) has gained significant attention as a key means to mitigate t...
Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction : Abstract: Deep learning has advanced vectorized road extraction in urban settings, yet off-road environments remain underexplored and challenging. A significant domain gap causes advanced models to fa...
How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation : Abstract: The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by stude...
Sliding Window Attention Adaptation : Abstract: The self-attention mechanism in Transformer-based Large Language Models (LLMs) scales quadratically with input length, making long-context inference expensive. Sliding window attention (SWA)...
The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks : Abstract: Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks, typically reliant on heuristic brute-force methods. Despite significant empirical advanc...
Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale : Abstract: Real-world AI software engineering demands coding agents that can reason over massive repositories, maintain durable memory across and within long sessions, and robustly coordinate complex t...
Cross-modal Retrieval Models for Stripped Binary Analysis : Abstract: LLM-agent based binary code analysis has demonstrated significant potential across a wide range of software security scenarios, including vulnerability detection, malware analysis, etc. In a...
The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation : Abstract: Conventional Sequential Recommender Systems (SRS) typically assign unique Hash IDs (HID) to construct item embeddings. These HID embeddings effectively learn collaborative information from h...
Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies : Abstract: Large Vision Language Models (LVLMs) have made remarkable progress, enabling sophisticated vision-language interaction and dialogue applications. However, existing benchmarks primarily focus...
Neural personal sound zones with flexible bright zone control : Abstract: Personal sound zone (PSZ) reproduction system, which attempts to create distinct virtual acoustic scenes for different listeners at their respective positions within the same spatial area us...
D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning : Abstract: The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance privacy, trust, and incentives. Existing appro...
GPG: Generalized Policy Gradient Theorem for Transformer-based Policies : Abstract: We present the Generalized Policy Gradient (GPG) Theorem, specifically designed for Transformer-based policies. Notably, we demonstrate that both standard Policy Gradient Theorem and GRPO em...
Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) demonstrate impressive reasoning capabilities, but often fail to perceive fine-grained visual details, limiting their applicability in precision-dema...
Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories : Abstract: Agentic systems built on large language models operate through recursive feedback loops, where each output becomes the next input. Yet the geometric behavior of these agentic loops (whether ...
A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale : Abstract: Distributed machine learning systems require strong privacy guarantees, verifiable compliance, and scalable deploy- ment across heterogeneous and multi-cloud environments. This work introduc...
Multilingual VLM Training: Adapting an English-Trained VLM to French : Abstract: Artificial intelligence has made great progress in recent years, particularly in the development of Vision--Language Models (VLMs) that understand both visual and textual data. However, thes...
Translating Informal Proofs into Formal Proofs Using a Chain of States : Abstract: We address the problem of translating informal mathematical proofs expressed in natural language into formal proofs in Lean4 under a constrained computational budget. Our approach is grounde...
High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments : Abstract: This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through...
FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning : Abstract: Federated Learning (FL) enables collaborative model training across distributed devices while safeguarding data and user privacy. However, FL remains susceptible to privacy threats that can ...
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing : Abstract: We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibili...
Graph Neural Network Based Adaptive Threat Detection for Cloud Identity and Access Management Logs : Abstract: The rapid expansion of cloud infrastructures and distributed identity systems has significantly increased the complexity and attack surface of modern enterprises. Traditional rule based or s...
Computing Evolutionarily Stable Strategies in Imperfect-Information Games : Abstract: We present an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. Our main algorithm is for two-player ...
Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters : Abstract: Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters a...
RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection : Abstract: The proliferation of AI-generated video technologies poses challenges to information integrity. While recent benchmarks advance AIGC video detection, they overlook a critical factor: many st...
InFerActive: Towards Scalable Human Evaluation of Large Language Models through Interactive Inference : Abstract: Human evaluation remains the gold standard for evaluating outputs of Large Language Models (LLMs). The current evaluation paradigm reviews numerous individual responses, leading to significa...
Adaptive Information Routing for Multimodal Time Series Forecasting : Abstract: Time series forecasting is a critical task for artificial intelligence with numerous real-world applications. Traditional approaches primarily rely on historical time series data to predict ...
Federated Domain Generalization with Latent Space Inversion : Abstract: Federated domain generalization (FedDG) addresses distribution shifts among clients in a federated learning framework. FedDG methods aggregate the parameters of locally trained client models...
Offscript: Automated Auditing of Instruction Adherence in LLMs : Abstract: Large Language Models (LLMs) and generative search systems are increasingly used for information seeking by diverse populations with varying preferences for knowledge sourcing and presentati...
Enhancing Large Language Models for End-to-End Circuit Analysis Problem Solving : Abstract: Large language models (LLMs) have shown strong performance in data-rich domains such as programming, but their reliability in engineering tasks remains limited. Circuit analysis -- requiring...
Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning : Abstract: The safety alignment of large language models (LLMs) is becoming increasingly important with their democratization. In this paper, we study the safety degradation that comes with adapting LL...
PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset : Abstract: Personalized review response generation presents a significant challenge in domains where user information is limited, such as food delivery platforms. While large language models (LLMs) off...
Universal Hirschberg for Width Bounded Dynamic Programs : Abstract: Hirschberg's algorithm (1975) reduces the space complexity for the longest common subsequence problem from $O(N^2)$ to $O(N)$ via recursive midpoint bisection on a grid dynamic program (DP)....
Workflow is All You Need: Escaping the "Statistical Smoothing Trap" via High-Entropy Information Foraging and Adversarial Pacing : Abstract: Central to long-form text generation in vertical domains is the "impossible trinity" confronting current large language models (LLMs): the simultaneous achievement of low hallucination, deep...
VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio : Abstract: General-purpose audio representations aim to map acoustically variable instances of the same event to nearby points, resolving content identity in a zero-shot setting. Unlike supervised clas...
CHyLL: Learning Continuous Neural Representations of Hybrid Systems : Abstract: Learning the flows of hybrid systems that have both continuous and discrete time dynamics is challenging. The existing method learns the dynamics in each discrete mode, which suffers from th...
MedXAI: A Retrieval-Augmented and Self-Verifying Framework for Knowledge-Guided Medical Image Analysis : Abstract: Accurate and interpretable image-based diagnosis remains a fundamental challenge in medical AI, particularly un- der domain shifts and rare-class conditions. Deep learning mod- els often str...
Defining the Scope of Learning Analytics: An Axiomatic Approach for Analytic Practice and Measurable Learning Phenomena : Abstract: Learning Analytics (LA) has rapidly expanded through practical and technological innovation, yet its foundational identity has remained theoretically under-specified. This paper addresses th...
What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models : Abstract: This article looks at how reasoning works in current Large Language Models (LLMs) that function using the token-completion method. It examines their stochastic nature and their similarity to...
Classifying Metamorphic versus Single-Fold Proteins with Statistical Learning and AlphaFold2 : Abstract: The remarkable success of AlphaFold2 in providing accurate atomic-level prediction of protein structures from their amino acid sequence has transformed approaches to the protein folding prob...
DB2-TransF: All You Need Is Learnable Daubechies Wavelets for Time Series Forecasting : Abstract: Time series forecasting requires models that can efficiently capture complex temporal dependencies, especially in large-scale and high-dimensional settings. While Transformer-based architect...
Detailed balance in large language model-driven agents : Abstract: Large language model (LLM)-driven agents are emerging as a powerful new paradigm for solving complex problems. Despite the empirical success of these practices, a theoretical framework to un...
MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata : Abstract: Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of thes...
Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs : Abstract: Fine-tuning is integral for aligning large language models (LLMs) with human preferences. Multiple-Reference Preference Optimization (MRPO) builds on Direct Preference Optimization (DPO) by ...
Cluster-Dags as Powerful Background Knowledge For Causal Discovery : Abstract: Finding cause-effect relationships is of key importance in science. Causal discovery aims to recover a graph from data that succinctly describes these cause-effect relationships. However, cu...
ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects : Abstract: Weakly supervised oriented object detection (WS-OOD) has gained attention as a cost-effective alternative to fully supervised methods, providing both efficiency and high accuracy. Among weak...
ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs : Abstract: Machine unlearning aims to remove the influence of specific data points from a trained model to satisfy privacy, copyright, and safety requirements. In real deployments, providers distribute...
ELANA: A Simple Energy and Latency Analyzer for LLMs : Abstract: The latency and power consumption of large language models (LLMs) are major constraints when serving them across a wide spectrum of hardware platforms, from mobile edge devices to cloud GPU ...
Norm-Governed Multi-Agent Decision-Making in Simulator-Coupled Environments:The Reinsurance Constrained Multi-Agent Simulation Process (R-CMASP) : Abstract: Reinsurance decision-making exhibits the core structural properties that motivate multi-agent models: distributed and asymmetric information, partial observability, heterogeneous epistemic r...
IoTEdu: Access Control, Detection, and Automatic Incident Response in Academic IoT Networks : Abstract: The growing presence of IoT devices in academic environments has increased operational complexity and exposed security weaknesses, especially in academic institutions without unified policie...
On Decision-Making Agents and Higher-Order Causal Processes : Abstract: We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions, the classical limit of higher...
Multi-Granular Node Pruning for Circuit Discovery : Abstract: Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruni...
LLMs Can Assist with Proposal Selection at Large User Facilities : Abstract: We explore how large language models (LLMs) can enhance the proposal selection process at large user facilities, offering a scalable, consistent, and cost-effective alternative to traditiona...
V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions : Abstract: Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically e...
Agile Deliberation: Concept Deliberation for Subjective Visual Classification : Abstract: From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users...
HAROOD: A Benchmark for Out-of-distribution Generalization in Sensor-based Human Activity Recognition : Abstract: Sensor-based human activity recognition (HAR) mines activity patterns from the time-series sensory data. In realistic scenarios, variations across individuals, devices, environments, and tim...
Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly : Abstract: Retrieval-Augmented Generation (RAG) systems often fail on multi-hop queries when the initial retrieval misses a bridge fact. Prior corrective approaches, such as Self-RAG, CRAG, and Adaptiv...
COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators : Abstract: Background: While intravascular imaging, particularly optical coherence tomography (OCT), improves percutaneous coronary intervention (PCI) outcomes, its interpretation is operator-dependent...
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution : Abstract: Procedural memory enables large language model (LLM) agents to internalize "how-to" knowledge, theoretically reducing redundant trial-and-error. However, existing frameworks predominantly su...
Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning : Abstract: Recent advances in vision-language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), w...
Challenges of Evaluating LLM Safety for User Welfare : Abstract: Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice ...
AEBNAS: Strengthening Exit Branches in Early-Exit Networks through Hardware-Aware Neural Architecture Search : Abstract: Early-exit networks are effective solutions for reducing the overall energy consumption and latency of deep learning models by adjusting computation based on the complexity of input data. By...
On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity : Abstract: As Large Language Models (LLM) based multi-agent systems become increasingly prevalent, the collective behaviors, e.g., collective intelligence, of such artificial communities have drawn gro...
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models : Abstract: Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns as these systems are increasingly deployed at scale. Existing inference-time mitigati...
Refinement Contrastive Learning of Cell-Gene Associations for Unsupervised Cell Type Identification : Abstract: Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been de...
Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs : Abstract: Data center (DC) infrastructure serves as the backbone to support the escalating demand for computing capacity. Traditional design methodologies that blend human expertise with specialized s...
NormCode: A Semi-Formal Language for Context-Isolated AI Planning : Abstract: Multistep workflows that chain large language model (LLM) calls suffer from context pollution: as information accumulates across steps, models hallucinate, confuse intermediate outputs, and ...
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning : Abstract: Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of for...
Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation : Abstract: Procedural Content Generation (PCG) offers scalable methods for algorithmically creating complex, customizable worlds. However, controlling these pipelines requires the precise configuration...
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection : Abstract: The landscape of scientific peer review is rapidly evolving with the integration of Large Language Models (LLMs). This shift is driven by two parallel trends: the widespread individual adopt...
Targeted Data Protection for Diffusion Model by Matching Training Trajectory : Abstract: Recent advancements in diffusion models have made fine-tuning text-to-image models for personalization increasingly accessible, but have also raised significant concerns regarding unauthoriz...
Representation of the structure of graphs by sequences of instructions : Abstract: The representation of graphs is commonly based on the adjacency matrix concept. This formulation is the foundation of most algebraic and computational approaches to graph processing. The adv...
Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention : Abstract: Recently, reinforcement learning (RL) has become a common choice in enhancing the reasoning capabilities of vision-language models (VLMs). Considering existing RL- based finetuning methods, ...
AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management : Abstract: The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the...
LLM-Empowered Representation Learning for Emerging Item Recommendation : Abstract: In this work, we tackle the challenge of recommending emerging items, whose interactions gradually accumulate over time. Existing methods often overlook this dynamic process, typically assum...
REMISVFU: Vertical Federated Unlearning via Representation Misdirection for Intermediate Output Feature : Abstract: Data-protection regulations such as the GDPR grant every participant in a federated system a right to be forgotten. Federated unlearning has therefore emerged as a research frontier, aiming ...
On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering : Abstract: Inference-time steering enables pretrained diffusion/flow models to be adapted to new tasks without retraining. A widely used approach is the ratio-of-densities method, which defines a time-...
User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation : Abstract: Vision-and-Language Navigation (VLN) requires agents to navigate complex environments by following natural-language instructions. General Scene Adaptation for VLN (GSA-VLN) shifts the focus ...
EpiPlanAgent: Agentic Automated Epidemic Response Planning : Abstract: Epidemic response planning is essential yet traditionally reliant on labor-intensive manual methods. This study aimed to design and evaluate EpiPlanAgent, an agent-based system using large l...
InfoCom: Kilobyte-Scale Communication-Efficient Collaborative Perception with Information Bottleneck : Abstract: Precise environmental perception is critical for the reliability of autonomous driving systems. While collaborative perception mitigates the limitations of single-agent perception through in...
Trustworthy Orchestration Artificial Intelligence by the Ten Criteria with Control-Plane Governance : Abstract: As Artificial Intelligence (AI) systems increasingly assume consequential decision-making roles, a widening gap has emerged between technical capabilities and institutional accountability. E...
Investigating The Functional Roles of Attention Heads in Vision Language Models: Evidence for Reasoning Modules : Abstract: Despite excelling on multimodal benchmarks, vision-language models (VLMs) largely remain a black box. In this paper, we propose a novel interpretability framework to systematically analyze t...
Neuronal Attention Circuit (NAC) for Representation Learning : Abstract: Attention improves representation learning over RNNs, but its discrete nature limits continuous-time (CT) modeling. We introduce Neuronal Attention Circuit (NAC), a novel, biologically plaus...
Reverse Thinking Enhances Missing Information Detection in Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in various reasoning tasks, yet they often struggle with problems involving missing information, exhibiting issues such...
ID-PaS : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs : Abstract: Mixed-Integer Linear Programs (MIPs) are powerful and flexible tools for modeling a wide range of real-world combinatorial optimization problems. Predict-and-Search methods operate by using ...
An exploration for higher efficiency in multi objective optimisation with reinforcement learning : Abstract: Efficiency in optimisation and search processes persists to be one of the challenges, which affects the performance and use of optimisation algorithms. Utilising a pool of operators instead ...
CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment : Abstract: Medical care follows complex clinical pathways that extend beyond isolated physician-patient encounters, emphasizing decision-making and transitions between different stages. Current benchma...
The 2025 Foundation Model Transparency Index : Abstract: Foundation model developers are among the world's most important companies. As these companies become increasingly consequential, how do their transparency practices evolve? The 2025 Foundat...
AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice : Abstract: Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer f...
Modeling Narrative Archetypes in Conspiratorial Narratives: Insights from Singapore-Based Telegram Groups : Abstract: Conspiratorial discourse is increasingly embedded within digital communication ecosystems, yet its structure and spread remain difficult to study. This work analyzes conspiratorial narrative...
Robust AI Security and Alignment: A Sisyphean Endeavor? : Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by extending Gödel's incompleteness theorem to AI. Knowing these limitations and pre...
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit : Abstract: Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often r...
Linear socio-demographic representations emerge in Large Language Models from indirect cues : Abstract: We investigate how LLMs encode sociodemographic attributes of human conversational partners inferred from indirect cues such as names and occupations. We show that LLMs develop linear repres...
Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research : Abstract: While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" syste...
Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning : Abstract: Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While ``Decomposition-and-Fill'' meth...
SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration : Abstract: Recent advances in foundation models have shown promising results in developing generalist robotics that can perform diverse tasks in open-ended scenarios given multimodal inputs. However, c...
DynaMate: An Autonomous Agent for Protein-Ligand Molecular Dynamics Simulations : Abstract: Force field-based molecular dynamics (MD) simulations are indispensable for probing the structure, dynamics, and functions of biomolecular systems, including proteins and protein-ligand comp...
Exploring LLMs for Scientific Information Extraction Using The SciEx Framework : Abstract: Large language models (LLMs) are increasingly touted as powerful tools for automating scientific information extraction. However, existing methods and tools often struggle with the realities...
Fuzzy Hierarchical Multiplex : Abstract: A new fuzzy optimization framework that extends FCM causality is proposed. This model utilizes the dynamics to map data into metrics and create a framework that examines logical implication ...
Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting : Abstract: Echocardiography is central to contemporary cardiovascular care, but full-study interpretation remains a cognitively demanding, multi-view task that is still performed manually. While recent...
Exploring Health Misinformation Detection with Multi-Agent Debate : Abstract: Fact-checking health-related claims has become increasingly critical as misinformation proliferates online. Effective verification requires both the retrieval of high-quality evidence and ri...
Suzume-chan: Your Personal Navigator as an Embodied Information Hub : Abstract: Access to expert knowledge often requires real-time human communication. Digital tools improve access to information but rarely create the sense of connection needed for deep understanding. ...
ExaCraft: Dynamic Learning Context Adaptation for Personalized Educational Examples : Abstract: Learning is most effective when it's connected to relevant, relatable examples that resonate with learners on a personal level. However, existing educational AI tools don't focus on generati...

Research Sources: 360 | Generated: 12/12/2025