AI Research News Feeds for December 5th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Efficient stereo matching on embedded GPUs with zero-means cross correlation : Abstract: Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to ...
Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection : Abstract: Monocular 3D object detection is a challenging task because depth information is difficult to obtain from 2D images. A subset of viewpoint-agnostic monocular 3D detection methods also do not...
Surface-Based Visibility-Guided Uncertainty for Continuous Active 3D Neural Reconstruction : Abstract: View selection is critical in active 3D neural reconstruction as it impacts the contents of training set and resulting final output quality. Recent view selection strategies emphasize the vi...
OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation : Abstract: Category-level articulated object pose estimation focuses on the pose estimation of unknown articulated objects within known categories. Despite its significance, this task remains challengi...
Multimodal Markup Document Models for Graphic Design Completion : Abstract: We introduce MarkupDM, a multimodal markup document model that represents graphic design as an interleaved multimodal document consisting of both markup language and images. Unlike existing ...
Learning Geodesics of Geometric Shape Deformations From Images : Abstract: This paper presents a novel method, named geodesic deformable networks (GDN), that for the first time enables the learning of geodesic flows of deformation fields derived from images. In par...
Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields : Abstract: Novel-view synthesis is an important problem in computer vision with applications in 3D reconstruction, mixed reality, and robotics. Recent methods like 3D Gaussian Splatting (3DGS) have bec...
A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs : Abstract: This paper proposes a memory-efficient optimization strategy for the high-performance point cloud registration algorithm VANICP, enabling lightweight execution on embedded GPUs with constrai...
Reflection Removal through Efficient Adaptation of Diffusion Transformers : Abstract: We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting...
Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects : Abstract: The perception of transparent objects is one of the well-known challenges in computer vision. Conventional depth sensors have difficulty in sensing the depth of transparent objects due to re...
Generative Neural Video Compression via Video Diffusion Prior : Abstract: We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and s...
RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation : Abstract: Earth observation (EO) data spans a wide range of spatial, spectral, and temporal resolutions, from high-resolution optical imagery to low resolution multispectral products or radar time ser...
Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding : Abstract: Facial Image inpainting aim is to restore the missing or corrupted regions in face images while preserving identity, structural consistency and photorealistic image quality, a task specifica...
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image : Abstract: Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geome...
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer : Abstract: Constructing 4D language fields is crucial for embodied AI, augmented/virtual reality, and 4D scene understanding, as they provide enriched semantic representations of dynamic environments a...
BulletTime: Decoupled Control of Time and Camera Pose for Video Generation : Abstract: Emerging video diffusion models achieve high visual fidelity but fundamentally couple scene dynamics with camera motion, limiting their ability to provide precise spatial and temporal contro...
Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints : Abstract: Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occl...
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression : Abstract: Recent advances in autoregressive video diffusion have enabled real-time frame streaming, yet existing solutions still suffer from temporal repetition, drift, and motion deceleration. We fin...
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly improved performance on tasks such as visual grounding and visual question answering. However, the reasoning pr...
SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards : Abstract: In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a syste...
EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation : Abstract: All-in-One Image Restoration (AiOIR) tasks often involve diverse degradation that require robust and versatile strategies. However, most existing approaches typically lack explicit frequency...
ShadowDraw: From Any Object to Shadow-Drawing Compositional Art : Abstract: We introduce ShadowDraw, a framework that transforms ordinary 3D objects into shadow-drawing compositional art. Given a 3D object, our system predicts scene parameters, including object pose...
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning : Abstract: Reward models are critical for aligning vision-language systems with human preferences, yet current approaches suffer from hallucination, weak visual grounding, and an inability to use tools...
Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting : Abstract: Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on mod...
Light-X: Generative 4D Video Rendering with Camera and Illumination Control : Abstract: Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key ...
The changing surface of the world's roads : Abstract: Resilient road infrastructure is a cornerstone of the UN Sustainable Development Goals. Yet a primary indicator of network functionality and resilience is critically lacking: a comprehensive...
Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex : Abstract: Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on re...
Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators : Abstract: Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedd...
Shared Multi-modal Embedding Space for Face-Voice Association : Abstract: The FAME 2026 challenge comprises two demanding tasks: training face-voice associations combined with a multilingual setting that includes testing on languages on which the model was not tra...
From Generated Human Videos to Physically Plausible Robot Trajectories : Abstract: Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot contr...
Towards Cross-View Point Correspondence in Vision-Language Models : Abstract: Cross-view correspondence is a fundamental capability for spatial understanding and embodied AI. However, it is still far from being realized in Vision-Language Models (VLMs), especially in ...
OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution : Abstract: Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle ...
Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild : Abstract: Generative psychological analysis of in-the-wild conversations faces two fundamental challenges: (1) existing Vision-Language Models (VLMs) fail to resolve Articulatory-Affective Ambiguity, ...
E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving : Abstract: End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and ...
MT-Depth: Multi-task Instance feature analysis for the Depth Completion : Abstract: Depth completion plays a vital role in 3D perception systems, especially in scenarios where sparse depth data must be densified for tasks such as autonomous driving, robotics, and augmented ...
Order Matters: 3D Shape Generation from Sequential VR Sketches : Abstract: VR sketching lets users explore and iterate on ideas directly in 3D, offering a faster and more intuitive alternative to conventional CAD tools. However, existing sketch-to-shape models igno...
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling : Abstract: Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and char...
LaFiTe: A Generative Latent Field for 3D Native Texturing : Abstract: Generating high-fidelity, seamless textures directly on 3D surfaces, what we term 3D-native texturing, remains a fundamental open challenge, with the potential to overcome long-standing limi...
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture : Abstract: We propose EMMA, an efficient and unified architecture for multimodal understanding, generation and editing. Specifically, EMMA primarily consists of 1) An efficient autoencoder with a 32x c...
RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS : Abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with ac...
LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation : Abstract: Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simu...
FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis : Abstract: Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide...
A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World : Abstract: Existing methods for deepfake detection aim to develop generalizable detectors. Although "generalizable" is the ultimate target once and for all, with limited training forgeries and domains,...
Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens : Abstract: Autoregressive (AR) visual generation has emerged as a powerful paradigm for image and multimodal synthesis, owing to its scalability and generality. However, existing AR image generation su...
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing : Abstract: Capturing accurate 3D human pose in the wild would provide valuable data for training pose estimation and motion generation methods. While video-based estimation approaches have become incre...
SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection : Abstract: Automated lesion detection in chest X-rays has demonstrated significant potential for improving clinical diagnosis by precisely localizing pathological abnormalities. While recent promptable...
SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded Platforms : Abstract: Real-time tracking of small unmanned aerial vehicles (UAVs) on edge devices faces a fundamental resolution-speed conflict. Downsampling high-resolution imagery to standard detector input siz...
You Only Train Once (YOTO): A Retraining-Free Object Detection Framework : Abstract: Object detection constitutes the primary task within the domain of computer vision. It is utilized in numerous domains. Nonetheless, object detection continues to encounter the issue of cata...
Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRI : Abstract: We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of ac...
ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching : Abstract: Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure ...
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion : Abstract: Latent Diffusion Models (LDMs) inherently follow a coarse-to-fine generation process, where high-level semantic structure is generated slightly earlier than fine-grained texture. This indica...
Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral Fitting : Abstract: The Herculaneum Papyri are a collection of rolled papyrus documents that were charred and buried by the famous eruption of Mount Vesuvius. They promise to contain a wealth of previously unse...
LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging : Abstract: 3D vision foundation models like Visual Geometry Grounded Transformer (VGGT) have advanced greatly in geometric perception. However, it is time-consuming and memory-intensive for long sequen...
Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition : Abstract: This study introduces a pioneering methodology for human action recognition by harnessing deep neural network techniques and adaptive fusion strategies across multiple modalities, including ...
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization : Abstract: Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a...
GeoPE:A Unified Geometric Positional Embedding for Structured Tensors : Abstract: Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation,...
Balanced Few-Shot Episodic Learning for Accurate Retinal Disease Diagnosis : Abstract: Automated retinal disease diagnosis is vital given the rising prevalence of conditions such as diabetic retinopathy and macular degeneration. Conventional deep learning approaches require la...
Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks : Abstract: We pilot a family of stable contrastive losses for learning pixel-level representations that jointly capture semantic and geometric information. Our approach maps each pixel of an image to a...
Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model : Abstract: Diffusion models have emerged as a widely utilized and successful methodology in human motion synthesis. Task-oriented diffusion models have significantly advanced action-to-motion, text-to-...
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers : Abstract: Recent image diffusion transformers achieve high-fidelity generation, but struggle to generate images beyond these scales, suffering from content repetition and quality degradation. In this ...
DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance : Abstract: Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked...
EgoLCD: Egocentric Video Generation with Long Context Diffusion : Abstract: Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from conte...
VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory : Abstract: Autoregressive (AR) diffusion enables streaming, interactive long-video generation by producing frames causally, yet maintaining coherence over minute-scale horizons remains challenging due ...
Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation : Abstract: Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches...
WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism : Abstract: While fulfilling communication tasks, wireless signals can also be used to sense the environment. Among various types of sensing media, WiFi signals offer advantages such as widespread avail...
Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification : Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modal...
Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification : Abstract: Traditional high-quality 3D scanning and reconstruction typically relies on human labor to plan the scanning procedure. With the rapid development of embodied systems such as drones and robo...
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement : Abstract: Video Large Language Models (Video LLMs) have shown impressive performance across a wide range of video-language tasks. However, they often fail in scenarios requiring a deeper understanding...
Refa\c{c}ade: Editing Object with Given Reference Texture : Abstract: Recent advances in diffusion models have brought remarkable progress in image and video editing, yet some tasks remain underexplored. In this paper, we introduce a new task, Object Retexture...
Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model : Abstract: Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approac...
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale : Abstract: The advancement of embodied AI has unlocked significant potential for intelligent humanoid robots. However, progress in both Vision-Language-Action (VLA) models and world models is severely ...
VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management : Abstract: Ultra long video understanding remains an open challenge, as existing vision language models (VLMs) falter on such content due to limited context length and inefficient long term memory rete...
Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian Optimization : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a leading technique for novel view synthesis, demonstrating exceptional rendering efficiency. \replaced[]{Well-reconstructed surfaces can be chara...
Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering : Abstract: Document Visual Question Answering (DocVQA) enables end-to-end reasoning grounded on information present in a document input. While recent models have shown impressive capabilities, they rem...
COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence : Abstract: Visual Spatial Reasoning is crucial for enabling Multimodal Large Language Models (MLLMs) to understand object properties and spatial relationships, yet current models still struggle with 3D...
Dataset creation for supervised deep learning-based analysis of microscopic images - review of important considerations and recommendations : Abstract: Supervised deep learning (DL) receives great interest for automated analysis of microscopic images with an increasing body of literature supporting its potential. The development and validat...
Prompt2Craft: Generating Functional Craft Assemblies with LLMs : Abstract: Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task tha...
TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification : Abstract: Tumor segmentation and diagnosis in contrast-enhanced Computed Tomography (CT) rely heavily on the physiological dynamics of contrast agents. However, obtaining a complete multi-phase series...
Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge Distillation : Abstract: Unmanned aerial vehicle (UAV) target tracking based on thermal infrared imaging has been one of the most important sensing technologies in anti-UAV applications. However, the infrared UAV ta...
SAM3-I: Segment Anything with Instructions : Abstract: Segment Anything Model 3 (SAM3) has advanced open-vocabulary segmentation through promptable concept segmentation, allowing users to segment all instances corresponding to a given concept, t...
Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot : Abstract: Detecting illicit visual content demands more than image-level NSFW flags; moderators must also know what objects make an image illegal and where those objects occur. We introduce a zero-sho...
Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence : Abstract: In this work, we introduce HeFT (Head-Frequency Tracker), a zero-shot point tracking framework that leverages the visual priors of pretrained video diffusion models. To better understand how...
I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models : Abstract: Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insuffi...
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length : Abstract: Existing diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, stre...
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation : Abstract: Efficient streaming video generation is critical for simulating interactive and dynamic worlds. Existing methods distill few-step video diffusion models with sliding window attention, using ...
UniLight: A Unified Representation for Lighting : Abstract: Lighting has a strong influence on visual appearance, yet understanding and representing lighting in images remains notoriously difficult. Various lighting representations exist, such as env...
Learning Single-Image Super-Resolution in the JPEG Compressed Domain : Abstract: Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to...
Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications : Abstract: Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle con...
How (Mis)calibrated is Your Federated CLIP and What To Do About It? : Abstract: While vision-language models like CLIP have been extensively studied, their calibration, crucial for reliable predictions, has received limited attention. Although a few prior works have exa...
Real-time Cricket Sorting By Sex : Abstract: The global demand for sustainable protein sources is driving increasing interest in edible insects, with Acheta domesticus (house cricket) identified as one of the most suitable species for ...
Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding : Abstract: Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non...
DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision : Abstract: Vision Transformers face a fundamental limitation: standard self-attention jointly processes spatial and channel dimensions, leading to entangled representations that prevent independent mod...
SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting : Abstract: Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion...
A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks : Abstract: Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains di...
Open Set Face Forgery Detection via Dual-Level Evidence Collection : Abstract: The proliferation of face forgeries has increasingly undermined confidence in the authenticity of online content. Given the rapid development of face forgery generation algorithms, new fake ...
MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching : Abstract: Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs signifi...
FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring : Abstract: Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifac...
Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models : Abstract: Large-scale pre-trained Vision-Language Models (VLMs) have demonstrated strong few-shot learning capabilities. However, these methods typically learn holistic representations where an image'...
Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection : Abstract: Medical image classification plays an increasingly vital role in identifying various diseases by classifying medical images, such as X-rays, MRIs and CT scans, into different categories base...
Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection : Abstract: Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often en...
UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D Scenes : Abstract: Ray tracing 3D Gaussian particles enables realistic effects such as depth of field, refractions, and flexible camera modeling for novel-view synthesis. However, existing methods trace Gaussi...
Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models : Abstract: Accurate and interpretable gait analysis plays a crucial role in the early detection of Parkinsons disease (PD),yet most existing approaches remain limited by single-modality inputs, low rob...
Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation : Abstract: As a challenging video editing task, movie trailer generation involves selecting and reorganizing movie shots to create engaging trailers. Currently, most existing automatic trailer generati...
MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving : Abstract: End-to-End autonomous driving (E2E-AD) has emerged as a new paradigm, where trajectory planning plays a crucial role. Existing studies mainly follow two directions: trajectory generation ori...
StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios : Abstract: As embodied intelligence advances toward real-world deployment, the ability to continuously perceive and reason over streaming visual inputs becomes essential. In such settings, an agent mus...
GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis : Abstract: Recent image denoising methods have leveraged generative modeling for real noise synthesis to address the costly acquisition of real-world noisy data. However, these generative models typica...
dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning : Abstract: The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-...
UniTS: Unified Time Series Generative Model for Remote Sensing : Abstract: One of the primary objectives of satellite remote sensing is to capture the complex dynamics of the Earth environment, which encompasses tasks such as reconstructing continuous cloud-free ti...
DeRA: Decoupled Representation Alignment for Video Tokenization : Abstract: This paper presents DeRA, a novel 1D video tokenizer that decouples the spatial-temporal representation learning in video tokenization to achieve better training efficiency and performance. ...
Not All Birds Look The Same: Identity-Preserving Generation For Birds : Abstract: Since the advent of controllable image generation, increasingly rich modes of control have enabled greater customization and accessibility for everyday users. Zero-shot, identity-preserving ...
Controllable Long-term Motion Generation with Extended Joint Targets : Abstract: Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion d...
Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal : Abstract: Inevitable specular highlights in practical environments severely impair the visual performance, thus degrading the task effectiveness and efficiency. Although there exist considerable metho...
Dual-branch Prompting for Multimodal Machine Translation : Abstract: Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often...
Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection : Abstract: Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipul...
OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology : Abstract: The microscopic examination of surgical tissue remains a cornerstone of disease classification but relies on subjective interpretations and access to highly specialized experts, which can co...
Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers : Abstract: This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geo...
Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning : Abstract: Humans naturally perceive continuous experience as a hierarchy of temporally nested events, fine-grained actions embedded within coarser routines. Replicating this structure in computer visi...
MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis : Abstract: While text-to-video (T2V) generation has achieved remarkable progress in photorealism, generating intent-aligned videos that faithfully obey physics principles remains a core challenge. In t...
ReasonX: MLLM-Guided Intrinsic Image Decomposition : Abstract: Intrinsic image decomposition aims to separate images into physical components such as albedo, depth, normals, and illumination. While recent diffusion- and transformer-based models benefit ...
6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models : Abstract: Vision-language models are increasingly integrated into clinical workflows. However, existing benchmarks primarily assess performance on common anatomical presentations and fail to capture t...
MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models : Abstract: We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage desi...
Geschlechts\"ubergreifende Maskulina im Sprachgebrauch Eine korpusbasierte Untersuchung zu lexemspezifischen Unterschieden : Abstract: This study examines the distribution and linguistic characteristics of generic masculines (GM) in contemporary German press texts. The use of masculine personal nouns to refer to mixed-gende...
OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models : Abstract: Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction...
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs : Abstract: Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4)....
Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time : Abstract: It is a critical challenge to efficiently unlock the powerful reasoning potential of Large Language Models (LLMs) for specific tasks or new distributions. Existing test-time adaptation metho...
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing : Abstract: Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for maki...
Challenging the Abilities of Large Language Models in Italian: a Community Initiative : Abstract: The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these model...
AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages : Abstract: Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in...
DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors : Abstract: We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a se...
DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution : Abstract: In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-author...
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates : Abstract: Expanding the linguistic diversity of instruct large language models (LLMs) is crucial for global accessibility but is often hindered by the reliance on costly specialized target language la...
SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs : Abstract: Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dependencies, and executing complex logical reasoning....
LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics : Abstract: Large language models (LLMs) demonstrate remarkable potential across diverse language related tasks, yet whether they capture deeper linguistic properties, such as syntactic structure, phone...
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction : Abstract: The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven...
Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking : Abstract: This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with con...
Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o : Abstract: Effective communication is central to achieving positive healthcare outcomes in mental health contexts, yet international students often face linguistic and cultural barriers that hinder the...
Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection : Abstract: Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depe...
Towards Contextual Sensitive Data Detection : Abstract: The emergence of open data portals necessitates more attention to protecting sensitive data before datasets get published and exchanged. While an abundance of methods for suppressing sensiti...
Can machines perform a qualitative data analysis? Reading the debate with Alan Turing : Abstract: This paper reflects on the literature that rejects the use of Large Language Models (LLMs) in qualitative data analysis. It illustrates through empirical evidence as well as critical reflect...
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment : Abstract: Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid ...
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction : Abstract: Image captioning has drawn considerable attention from the natural language processing and computer vision fields. Aiming to reduce the reliance on curated data, several studies have explore...
Limit cycles for speech : Abstract: Rhythmic fluctuations in acoustic energy and accompanying neuronal excitations in cortical oscillations are characteristic of human speech, yet whether a corresponding rhythmicity inheres in...
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs : Abstract: Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework...
Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective : Abstract: Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these syste...
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case : Abstract: Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, p...
The AI Consumer Index (ACE) : Abstract: We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform high-value consumer tasks. ACE contains a hidden heldout set o...
Algorithmic Thinking Theory : Abstract: Large language models (LLMs) have proven to be highly effective for solving complex reasoning tasks. Surprisingly, their capabilities can often be improved by iterating on previously generat...
One-shot acceleration of transient PDE solvers via online-learned preconditioners : Abstract: Data-driven acceleration of scientific computing workflows has been a high-profile aim of machine learning (ML) for science, with numerical simulation of transient partial differential equat...
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral : Abstract: Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers...
SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats : Abstract: Accurate question answering over real spreadsheets remains difficult due to multirow headers, merged cells, and unit annotations that disrupt naive chunking, while rigid SQL views fail on fi...
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle : Abstract: Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-ori...
ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation : Abstract: Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without cos...
LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving : Abstract: Our work presents a novel reinforcement learning (RL) based framework to optimize heuristic selection within the conflict-driven clause learning (CDCL) process, improving the efficiency of B...
MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation : Abstract: Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decis...
RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning : Abstract: Removing specific data influence from large language models (LLMs) remains challenging, as retraining is costly and existing approximate unlearning methods are often unstable. The challenge ...
MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection : Abstract: LLM-based approaches have recently achieved impressive results in zero-shot stance detection. However, they still struggle in complex real-world scenarios, where stance understanding require...
UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction : Abstract: The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our meth...
EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion : Abstract: Adjusting the outdated knowledge of large language models (LLMs) after deployment remains a major challenge. This difficulty has spurred the development of knowledge editing, which seeks to ...
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees : Abstract: The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aim...
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning : Abstract: We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by h...
LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence : Abstract: Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across ...
ArterialNet: Reconstructing Arterial Blood Pressure Waveform with Wearable Pulsatile Signals, a Cohort-Aware Approach : Abstract: Goal: Continuous arterial blood pressure (ABP) waveform is invasive but essential for hemodynamic monitoring. Current non-invasive techniques reconstruct ABP waveforms with pulsatile signals...
Convolutional Monge Mapping between EEG Datasets to Support Independent Component Labeling : Abstract: EEG recordings contain rich information about neural activity but are subject to artifacts, noise, and superficial differences due to sensors, amplifiers, and filtering. Independent componen...
Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning : Abstract: The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode simil...
Towards an end-to-end artificial intelligence driven global weather forecasting system : Abstract: The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting....
Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs : Abstract: Subjective mean opinion scores (MOS) remain the de-facto target for non-intrusive speech and singing quality assessment. However, MOS is a scalar that collapses heterogeneous user expectatio...
Tokenizing Buildings: A Transformer for Layout Synthesis : Abstract: We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize build...
Series of quasi-uniform scatterings with fast search, root systems and neural network classifications : Abstract: In this paper we describe an approach to construct large extendable collections of vectors in predefined spaces of given dimensions. These collections are useful for neural network latent sp...
STELLA: Guiding Large Language Models for Time Series Forecasting with Semantic Abstractions : Abstract: Recent adaptations of Large Language Models (LLMs) for time series forecasting often fail to effectively enhance information for raw series, leaving LLM reasoning capabilities underutilized....
Shorting Dynamics and Structured Kernel Regularization : Abstract: This paper develops a nonlinear operator dynamic that progressively removes the influence of a prescribed feature subspace while retaining maximal structure elsewhere. The induced sequence o...
Environment-Aware Channel Inference via Cross-Modal Flow: From Multimodal Sensing to Wireless Channels : Abstract: Accurate channel state information (CSI) underpins reliable and efficient wireless communication. However, acquiring CSI via pilot estimation incurs substantial overhead, especially in massi...
Rethinking the Use of Vision Transformers for AI-Generated Image Detection : Abstract: Rich feature representations derived from CLIP-ViT have been widely utilized in AI-generated image detection. While most existing methods primarily leverage features from the final layer, we...
Learning Causality for Longitudinal Data : Abstract: This thesis develops methods for causal inference and causal representation learning (CRL) in high-dimensional, time-varying data. The first contribution introduces the Causal Dynamic Vari...
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models : Abstract: Large vision-language model (LVLM) based text-to-image (T2I) systems have become the dominant paradigm in image generation, yet whether they amplify social biases remains insufficiently unde...
Towards a unified framework for guided diffusion models : Abstract: Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025prov...
Evolutionary Architecture Search through Grammar-Based Sequence Alignment : Abstract: Neural architecture search (NAS) in expressive search spaces is a computationally hard problem, but it also holds the potential to automatically discover completely novel and performant arch...
HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition : Abstract: Handwritten Text Recognition remains challenging due to the limited data, high writing style variance, and scripts with complex diacritics. Existing approaches, though partially address thes...
Model-Free Assessment of Simulator Fidelity via Quantile Curves : Abstract: Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. Howe...
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation : Abstract: Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniq...
QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory : Abstract: Long short-term memory (LSTM) models are a particular type of recurrent neural networks (RNNs) that are central to sequential modeling tasks in domains such as urban telecommunication foreca...
Meta-Learning for Quantum Optimization via Quantum Sequence Model : Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is a leading approach for solving combinatorial optimization problems on near-term quantum processors. However, finding good variational...
Control Consistency Losses for Diffusion Bridges : Abstract: Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly p...
Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction : Abstract: Although diffusion models now occupy a central place in generative modeling, introductory treatments commonly assume Euclidean data and seldom clarify their connection to discrete-state anal...
Structured Document Translation via Format Reinforcement Learning : Abstract: Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, w...
Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning : Abstract: Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done...
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation : Abstract: Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corr...
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation : Abstract: Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, exi...
Safe Online Bid Optimization with Return on Investment and Budget Constraints : Abstract: In online marketing, the advertisers aim to balance achieving high volumes and high profitability. The companies' business units address this tradeoff by maximizing the volumes while guarant...
ImageNot: A contrast with ImageNet preserves model rankings : Abstract: We introduce ImageNot, a dataset constructed explicitly to be drastically different than ImageNet while matching its scale. ImageNot is designed to test the external validity of deep learnin...
FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion : Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single better-performing model in a cost-effective and data-effic...
NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks : Abstract: Quantization is a pivotal technique for managing the growing computational and memory demands of Deep Neural Networks (DNNs). By reducing the number of bits used to represent weights and act...
Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks : Abstract: In this work, we explore the intersection of sparse coding theory and deep learning to enhance our understanding of feature extraction capabilities in advanced neural network architectures. ...
Educational Cone Model in Embedding Vector Spaces : Abstract: Human-annotated datasets with explicit difficulty ratings are essential in intelligent educational systems. Although embedding vector spaces are widely used to represent semantic closeness a...
Computational Linguistics Meets Libyan Dialect: A Study on Dialect Identification : Abstract: This study investigates logistic regression, linear support vector machine, multinomial Naive Bayes, and Bernoulli Naive Bayes for classifying Libyan dialect utterances gathered from Twitter...
Polynomiogram: An Integrated Framework for Root Visualization and Generative Art : Abstract: This work presents the Polynomiogram framework, an integrated computational platform for exploring, visualizing, and generating art from polynomial root systems. The main innovation is a fle...
The Geometry of Benchmarks: A New Path Toward AGI : Abstract: Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoni...
Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer : Abstract: Real-time video motion transfer applications such as immersive gaming and vision-based anomaly detection require accurate yet diverse future predictions to support realistic synthesis and ro...
Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint : Abstract: Flow matching-based generative models have been integrated into the plug-and-play image restoration framework, and the resulting plug-and-play flow matching (PnP-Flow) model has achieved som...
Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural Networks : Abstract: This paper introduces a novel method for generating high-quality Digital Image Correlation (DIC) dataset based on non-uniform B-spline surfaces. By randomly generating control point coordina...
One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises : Abstract: The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversari...
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment : Abstract: Recent advancement in multimodal LLMs (MLLMs) has demonstrated their remarkable capability to generate descriptive captions for input videos. However, these models suffer from factual inaccu...
AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning : Abstract: Contemporary DevSecOps pipelines have to deal with the evolution of security in an ever-continuously integrated and deployed environment. Existing methods,such as rule-based intrusion detect...
Constructive Approximation under Carleman's Condition, with Applications to Smoothed Analysis : Abstract: A classical result of Carleman, based on the theory of quasianalytic functions, shows that polynomials are dense in $L^2(μ)$ for any $μ$ such that the moments $\int x^k dμ$ do not grow too r...
Informative missingness and its implications in semi-supervised learning : Abstract: Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-int...
Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering : Abstract: Sarcasm is common in online discussions, yet difficult for machines to identify because the intended meaning often contradicts the literal wording. In this work, I study sarcasm detection us...
Predicting Time-Dependent Flow Over Complex Geometries Using Operator Networks : Abstract: Fast, geometry-generalizing surrogates for unsteady flow remain challenging. We present a time-dependent, geometry-aware Deep Operator Network that predicts velocity fields for moderate-Re f...
NORi: An ML-Augmented Ocean Boundary Layer Parameterization : Abstract: NORi is a machine-learned (ML) parameterization of ocean boundary layer turbulence that is physics-based and augmented with neural networks. NORi stands for neural ordinary differential equa...
Mathematical Framing for Different Agent Strategies : Abstract: We introduce a unified mathematical and probabilistic framework for understanding and comparing diverse AI agent strategies. We bridge the gap between high-level agent design concepts, such ...
When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering : Abstract: Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that e...
SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding : Abstract: Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal informat...
Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control : Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow eithe...
Fermionic neural Gibbs states : Abstract: We introduce fermionic neural Gibbs states (fNGS), a variational framework for modeling finite-temperature properties of strongly interacting fermions. fNGS starts from a reference mean-fiel...
Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting : Abstract: We present a novel recurrent neural network architecture designed explicitly for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational manage...
Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond : Abstract: We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to a...
Continuous-time reinforcement learning for optimal switching over multiple regimes : Abstract: This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regular...
Sequential Enumeration in Large Language Models : Abstract: Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily ha...
Complementary Characterization of Agent-Based Models via Computational Mechanics and Diffusion Models : Abstract: This article extends the preprint "Characterizing Agent-Based Model Dynamics via $ε$-Machines and Kolmogorov-Style Complexity" by introducing diffusion models as orthogonal and complementary...
Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety Guarantees : Abstract: Data-driven methods have become paramount in modern systems and control problems characterized by growing levels of complexity. In safety-critical environments, deploying these methods requi...
TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation : Abstract: Effective earthquake risk reduction relies on accurate site-specific evaluations. This requires models that can represent the influence of local site conditions on ground motion characterist...
TRINITY: An Evolved LLM Coordinator : Abstract: Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orches...
Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement : Abstract: Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in pract...
A Tutorial on Regression Analysis: From Linear Models to Deep Learning -- Lecture Notes on Artificial Intelligence : Abstract: This article serves as the regression analysis lecture notes in the Intelligent Computing course cluster (including the courses of Artificial Intelligence, Data Mining, Machine Learning, and...
RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting : Abstract: Reinforcement Learning from Human Feedback (RLHF) is an important fine-tuning technique for large language models (LLMs) and comprises three stages: generation, inference, and training. The ...
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems : Abstract: Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-...
A result relating convex n-widths to covering numbers with some applications to neural networks : Abstract: In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typica...
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty : Abstract: Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overti...
CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent : Abstract: Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conve...
Amortized Inference of Multi-Modal Posteriors using Likelihood-Weighted Normalizing Flows : Abstract: We present a novel technique for amortized posterior estimation using Normalizing Flows trained with likelihood-weighted importance sampling. This approach allows for the efficient inference...
Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning : Abstract: The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combinin...
Efficient Generative Transformer Operators For Million-Point PDEs : Abstract: We introduce ECHO, a transformer-operator framework for generating million-point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential e...
Dual-Path Region-Guided Attention Network for Ground Reaction Force and Moment Regression : Abstract: Accurate estimation of three-dimensional ground reaction forces and moments (GRFs/GRMs) is crucial for both biomechanics research and clinical rehabilitation evaluation. In this study, we fo...
SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals : Abstract: Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their utility is often limited by noisy and inconsistent ac...
Multi-LLM Collaboration for Medication Recommendation : Abstract: As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language...
Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection : Abstract: Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hyb...
David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? : Abstract: Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always ...
OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design : Abstract: Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often ma...
Gradient Descent with Provably Tuned Learning-rate Schedules : Abstract: Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one ...
The Geometry of Intelligence: Deterministic Functional Topology as a Foundation for Real-World Perception : Abstract: Real-world physical processes do not generate arbitrary variability: their signals concentrate on compact and low-variability subsets of functional space. This geometric structure enables ra...
TV2TV: A Unified Framework for Interleaved Language and Video Generation : Abstract: Video generation models are rapidly advancing, but can still struggle with complex video outputs that require significant semantic branching or repeated high-level reasoning about what shoul...
Deep infant brain segmentation from multi-contrast MRI : Abstract: Segmentation of magnetic resonance images (MRI) facilitates analysis of human brain development by delineating anatomical structures. However, in infants and young children, accurate segment...
Value Gradient Guidance for Flow Matching Alignment : Abstract: While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficie...
The Universal Weight Subspace Hypothesis : Abstract: We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demon...
Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants : Abstract: As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency ...
AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy : Abstract: Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large...
Patient Safety Risks from AI Scribes: Signals from End-User Feedback : Abstract: AI scribes are transforming clinical documentation at scale. However, their real-world performance remains understudied, especially regarding their impacts on patient safety. To this end, we...
Measuring Agents in Production : Abstract: AI agents are actively running in production across diverse industries, yet little is publicly known about which technical approaches enable successful real-world deployments. We present the...
Enhancing next token prediction based pre-training for jet foundation models : Abstract: Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datas...
The Initialization Determines Whether In-Context Learning Is Gradient Descent : Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attenti...
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order : Abstract: Post-training with reinforcement learning (RL) typically optimizes a single scalar objective and ignores structure in how solutions are produced. We ask whether a scalar hint toward a canoni...
GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers : Abstract: Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP ...
When do spectral gradient updates help in deep learning? : Abstract: Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transform...
Evaluating Long-Context Reasoning in LLM-Based WebAgents : Abstract: As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for provi...
RNNs perform task computations by dynamically warping neural representations : Abstract: Analysing how neural networks represent data features in their activations can help interpret how they perform tasks. Hence, a long line of work has focused on mathematically characterising ...
Data-regularized Reinforcement Learning for Diffusion Models at Scale : Abstract: Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such...
RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection : Abstract: Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and c...
Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism : Abstract: Popular offline reinforcement learning (RL) methods rely on conservatism, either by penalizing out-of-dataset actions or by restricting planning horizons. In this work, we question the unive...
Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models : Abstract: Detecting when large language models (LLMs) are uncertain is critical for building reliable systems, yet existing methods are overly complicated, relying on brittle semantic clustering or in...
SmartAlert: Implementing Machine Learning-Driven Clinical Decision Support for Inpatient Lab Utilization Reduction : Abstract: Repetitive laboratory testing unlikely to yield clinically useful information is a common practice that burdens patients and increases healthcare costs. Education and feedback interventions ...
STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting : Abstract: Fine-grained air pollution forecasting is crucial for urban management and the development of healthy buildings. Deploying portable sensors on mobile platforms such as cars and buses offers ...
Learning to Orchestrate Agents in Natural Language with the Conductor : Abstract: Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Cond...
Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles : Abstract: We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feat...
GraphBench: Next-generation graph learning benchmarking : Abstract: Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragm...
Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems : Abstract: Mixture-of-Experts (MoE) models scale large language models through conditional computation, but inference becomes memory-bound once expert weights exceed the capacity of GPU memory. In this...
Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval : Abstract: Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, ...
Explainable Graph Representation Learning via Graph Pattern Analysis : Abstract: Explainable artificial intelligence (XAI) is an important area in the AI community, and interpretability is crucial for building robust and trustworthy AI models. While previous work has exp...
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference : Abstract: Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs). Despite the empirical success of methods such as best-of-$n$ (BoN) sampling ...
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function : Abstract: Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer fr...
LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models : Abstract: Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of che...
Reliable Statistical Guarantees for Conformal Predictors with Small Datasets : Abstract: Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output...
Temp-SCONE: A Novel Out-of-Distribution Detection and Domain Generalization Framework for Wild Data with Temporal Shift : Abstract: Open-world learning (OWL) requires models that can adapt to evolving environments while reliably detecting out-of-distribution (OOD) inputs. Existing approaches, such as SCONE, achieve robus...
Exploiting \texttt{ftrace}'s \texttt{function\_graph} Tracer Features for Machine Learning: A Case Study on Encryption Detection : Abstract: This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experim...
QoSDiff: An Implicit Topological Embedding Learning Framework Leveraging Denoising Diffusion and Adversarial Attention for Robust QoS Prediction : Abstract: Accurate Quality of Service (QoS) prediction is fundamental to service computing, providing essential data-driven guidance for service selection and ensuring superior user experiences. Howev...
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space : Abstract: Large language model (LLM) agents -- LLMs that dynamically interact with an environment over long horizons -- have become an increasingly important area of research, enabling automation in c...
Score Matching for Estimating Finite Point Processes : Abstract: Score matching estimators have garnered significant attention in recent years because they eliminate the need to compute normalizing constants, thereby mitigating the computational challenge...
Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective : Abstract: In the history of knowledge distillation, the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of De...
Federated Learning for Anomaly Detection in Maritime Movement Data : Abstract: This paper introduces M3fed, a novel solution for federated learning of movement anomaly detection models. This innovation has the potential to improve data privacy and reduce communication ...
Contract-Governed Training for Earth Observation: Observed Service Agreement Graphs and Coverage-Accuracy Trade-offs : Abstract: Earth observation (EO) models are frequently trained under implicit sampling policies that optimize global accuracy but provide no explicit guarantees on who (which regions, classes, or miss...
ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text : Abstract: Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with ta...
Decoding Large Language Diffusion Models with Foreseeing Movement : Abstract: Large Language Diffusion Models (LLDMs) benefit from a flexible decoding mechanism that enables parallelized inference and controllable generations over autoregressive models. Yet such flexi...
MechDetect: Detecting Data-Dependent Errors : Abstract: Data quality monitoring is a core challenge in modern information processing systems. While many approaches to detect data errors or shifts have been proposed, few studies investigate the me...
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity : Abstract: Two pressing topics in the theory of deep learning are the interpretation of feature learning mechanisms and the determination of implicit bias of networks in the rich regime. Current theori...
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training : Abstract: Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumpti...
Network of Theseus (like the ship) : Abstract: A standard assumption in deep learning is that the inductive bias introduced by a neural network architecture must persist from training through inference. The architecture you train with is...
ActVAE: Modelling human activity schedules with a deep conditional generative approach : Abstract: Modelling the complexity and diversity of human activity scheduling behaviour is inherently challenging. We demonstrate a deep conditional-generative machine learning approach for the modell...
Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning : Abstract: Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1)-a key target in overcoming cancer chemoresistance-remains a critical challenge in early dr...
Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness : Abstract: Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation functi...
Sponsored Questions and How to Auction Them : Abstract: Online platforms connect users with relevant products and services using ads. A key challenge is that a user's search query often leaves their true intent ambiguous. Typically, platforms pas...
TARA Test-by-Adaptive-Ranks for Quantum Anomaly Detection with Conformal Prediction Guarantees : Abstract: Quantum key distribution (QKD) security fundamentally relies on the ability to distinguish genuine quantum correlations from classical eavesdropper simulations, yet existing certification me...
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study : Abstract: This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary n...
Polarization by Design: How Elites Could Shape Mass Preferences as AI Reduces Persuasion Costs : Abstract: In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only throu...
Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation : Abstract: Mobile agents show immense potential, yet current state-of-the-art (SoTA) agents exhibit inadequate success rates on real-world, long-horizon, cross-application tasks. We attribute this bott...
Large Language Model-Based Agents for Software Engineering: A Survey : Abstract: The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the vers...
A Survey on Recommendation Unlearning: Fundamentals, Taxonomy, Evaluation, and Open Questions : Abstract: Recommender systems have become increasingly influential in shaping user behavior and decision-making, highlighting their growing impact in various domains. Meanwhile, the widespread adoptio...
Public Sentiment Analysis of Traffic Management Policies in Knoxville: A Social Media Driven Study : Abstract: This study presents a comprehensive analysis of public sentiment toward traffic management policies in Knoxville, Tennessee, utilizing social media data from Twitter and Reddit platforms. We...
The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyses of cystic fibrosis exacerbations : Abstract: Loss of lung function in cystic fibrosis (CF) occurs progressively, punctuated by acute pulmonary exacerbations (PEx) in which abrupt declines in lung function are not fully recovered. A key...
Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models : Abstract: Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these syste...
Thucy: An LLM-based Multi-Agent System for Claim Verification across Relational Databases : Abstract: In today's age, it is becoming increasingly difficult to decipher truth from lies. Every day, politicians, media outlets, and public figures make conflicting claims$\unicode{x2014}$often abo...
BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents : Abstract: As an effective method to boost the performance of Large Language Models (LLMs) on the question answering (QA) task, Retrieval-Augmented Generation (RAG), which queries highly relevant infor...
World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations : Abstract: Autonomous navigation of terrestrial robots using Reinforcement Learning (RL) from LIDAR observations remains challenging due to the high dimensionality of sensor data and the sample ineffic...
AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation : Abstract: Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We...
Cell-cell communication inference and analysis: biological mechanisms, computational approaches, and future opportunities : Abstract: In multicellular organisms, cells coordinate their activities through cell-cell communication (CCC), which are crucial for development, tissue homeostasis, and disease progression. Recent ad...
A Learning-based Control Methodology for Transitioning VTOL UAVs : Abstract: Transition control poses a critical challenge in Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL UAV) development due to the tilting rotor mechanism, which shifts the center of g...
State Space Models for Bioacoustics: A comparative Evaluation with Transformers : Abstract: In this study, we evaluate the efficacy of the Mamba model in the field of bioacoustics. We first pretrain a Mamba-based audio large language model (LLM) on a large corpus of audio data usin...
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing : Abstract: Deploying large language models (LLMs) on edge devices enables personalized agents with strong privacy and low cost. However, with tens to hundreds of billions of parameters, single-batch au...
Matrix Editing Meets Fair Clustering: Parameterized Algorithms and Complexity : Abstract: We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct col...
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs : Abstract: Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction hand...
AI/ML in 3GPP 5G Advanced - Services and Architecture : Abstract: The 3rd Generation Partnership Project (3GPP), the standards body for mobile networks, is in the final phase of Release 19 standardization and is beginning Release 20. Artificial Intelligenc...
Bayesian Optimization for Automatic Tuning of Torque-Level Nonlinear Model Predictive Control : Abstract: This paper presents an auto-tuning framework for torque-based Nonlinear Model Predictive Control (nMPC), where the MPC serves as a real-time controller for optimal joint torque commands. The...
MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving : Abstract: Autonomous Driving (AD) vehicles still struggle to exhibit human-like behavior in highly dynamic and interactive traffic scenarios. The key challenge lies in AD's limited ability to interact...
Hierarchical Vision Language Action Model Using Success and Failure Demonstrations : Abstract: Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collecti...
Beyond the Black Box: A Cognitive Architecture for Explainable and Aligned AI : Abstract: Current AI paradigms, as "architects of experience," face fundamental challenges in explainability and value alignment. This paper introduces "Weight-Calculatism," a novel cognitive architec...
When Do Symbolic Solvers Enhance Reasoning in Large Language Models? : Abstract: Large Reasoning Models (LRMs) achieve strong performance on complex reasoning tasks by generating long Chains of Thought (CoTs). However, this paradigm might incur substantial token overhead...
Prior preferences in active inference agents: soft, hard, and goal shaping : Abstract: Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative...
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia : Abstract: Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human a...
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents : Abstract: Agentic reasoning models trained with multimodal reinforcement learning (MMRL) have become increasingly capable, yet they are almost universally optimized using sparse, outcome-based rewards...
Multi-Agent Reinforcement Learning with Communication-Constrained Priors : Abstract: Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent is...
PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks : Abstract: We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task pla...
Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks : Abstract: Despite recent advances, autonomous agents often struggle to solve complex tasks in enterprise domains that require coordinating multiple tools and processing diverse data sources. This stru...
DeepRule: An Integrated Framework for Automated Business Rule Generation via Deep Predictive Modeling and Hybrid Search Optimization : Abstract: This paper proposes DeepRule, an integrated framework for automated business rule generation in retail assortment and pricing optimization. Addressing the systematic misalignment between exi...
MemVerse: Multimodal Memory for Lifelong Learning Agents : Abstract: Despite rapid progress in large-scale language and vision models, AI agents still suffer from a fundamental limitation: they cannot remember. Without reliable memory, agents catastrophically...
RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design : Abstract: Automatic Heuristic Design (AHD) has gained traction as a promising solution for solving combinatorial optimization problems (COPs). Large Language Models (LLMs) have emerged and become a pr...
Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning : Abstract: Recent advances in Omni models have enabled unified multimodal perception and generation. However, most existing systems still exhibit rigid reasoning behaviors, either overthinking simple p...
A Hierarchical Tree-based approach for creating Configurable and Static Deep Research Agent (Static-DRA) : Abstract: The advancement in Large Language Models has driven the creation of complex agentic systems, such as Deep Research Agents (DRAs), to overcome the limitations of static Retrieval Augmented Ge...
Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties : Abstract: This paper presents a logic programming-based framework for policy-aware autonomous agents that can reason about potential penalties for non-compliance and act accordingly. While prior work ...
Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol : Abstract: Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for su...
AI-Driven Document Redaction in UK Public Authorities: Implementation Gaps, Regulatory Challenges, and the Human Oversight Imperative : Abstract: Document redaction in public authorities faces critical challenges as traditional manual approaches struggle to balance growing transparency demands with increasingly stringent data protecti...
Quantifying the Potential to Escape Filter Bubbles: A Behavior-Aware Measure via Contrastive Simulation : Abstract: Nowadays, recommendation systems have become crucial to online platforms, shaping user exposure by accurate preference modeling. However, such an exposure strategy can also reinforce users' ...
Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation : Abstract: The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded a...
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem : Abstract: Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June ...
Will Power Return to the Clouds? From Divine Authority to GenAI Authority : Abstract: Generative AI systems now mediate newsfeeds, search rankings, and creative content for hundreds of millions of users, positioning a handful of private firms as de-facto arbiters of truth. Dr...
Irresponsible AI: big tech's influence on AI research and associated impacts : Abstract: The accelerated development, deployment and adoption of artificial intelligence systems has been fuelled by the increasing involvement of big tech. This has been accompanied by increasing et...
AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations : Abstract: Advances in large language models (LLMs) are accelerating discovery in molecular science. However, adapting molecular information to the serialized, token-based processing of LLMs remains a ...
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation : Abstract: Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frame...
When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI : Abstract: Large vision-language models (LVLMs) are increasingly used for tasks where detecting multimodal harmful content is crucial, such as online content moderation. However, real-world harmful con...
Community Quality and Influence Maximization: An Empirical Study : Abstract: Influence maximization in social networks plays a vital role in applications such as viral marketing, epidemiology, product recommendation, opinion mining, and counter-terrorism. A common ap...
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks : Abstract: Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs) with external knowledge for diverse, kno...

Research Sources: 336 | Generated: 12/5/2025