AI RESEARCH PAPERS & ACADEMIC SOURCES
- Efficient stereo matching on embedded GPUs with zero-means cross correlation : Abstract: Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to ...
- Polygon Intersection-over-Union Loss for Viewpoint-Agnostic Monocular 3D Vehicle Detection : Abstract: Monocular 3D object detection is a challenging task because depth information is difficult to obtain from 2D images. A subset of viewpoint-agnostic monocular 3D detection methods also do not...
- Surface-Based Visibility-Guided Uncertainty for Continuous Active 3D Neural Reconstruction : Abstract: View selection is critical in active 3D neural reconstruction as it impacts the contents of training set and resulting final output quality. Recent view selection strategies emphasize the vi...
- OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation : Abstract: Category-level articulated object pose estimation focuses on the pose estimation of unknown articulated objects within known categories. Despite its significance, this task remains challengi...
- Multimodal Markup Document Models for Graphic Design Completion : Abstract: We introduce MarkupDM, a multimodal markup document model that represents graphic design as an interleaved multimodal document consisting of both markup language and images. Unlike existing ...
- Learning Geodesics of Geometric Shape Deformations From Images : Abstract: This paper presents a novel method, named geodesic deformable networks (GDN), that for the first time enables the learning of geodesic flows of deformation fields derived from images. In par...
- Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields : Abstract: Novel-view synthesis is an important problem in computer vision with applications in 3D reconstruction, mixed reality, and robotics. Recent methods like 3D Gaussian Splatting (3DGS) have bec...
- A dynamic memory assignment strategy for dilation-based ICP algorithm on embedded GPUs : Abstract: This paper proposes a memory-efficient optimization strategy for the high-performance point cloud registration algorithm VANICP, enabling lightweight execution on embedded GPUs with constrai...
- Reflection Removal through Efficient Adaptation of Diffusion Transformers : Abstract: We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting...
- Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects : Abstract: The perception of transparent objects is one of the well-known challenges in computer vision. Conventional depth sensors have difficulty in sensing the depth of transparent objects due to re...
- Generative Neural Video Compression via Video Diffusion Prior : Abstract: We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and s...
- RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation : Abstract: Earth observation (EO) data spans a wide range of spatial, spectral, and temporal resolutions, from high-resolution optical imagery to low resolution multispectral products or radar time ser...
- Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding : Abstract: Facial Image inpainting aim is to restore the missing or corrupted regions in face images while preserving identity, structural consistency and photorealistic image quality, a task specifica...
- Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image : Abstract: Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geome...
- 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer : Abstract: Constructing 4D language fields is crucial for embodied AI, augmented/virtual reality, and 4D scene understanding, as they provide enriched semantic representations of dynamic environments a...
- BulletTime: Decoupled Control of Time and Camera Pose for Video Generation : Abstract: Emerging video diffusion models achieve high visual fidelity but fundamentally couple scene dynamics with camera motion, limiting their ability to provide precise spatial and temporal contro...
- Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints : Abstract: Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occl...
- Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression : Abstract: Recent advances in autoregressive video diffusion have enabled real-time frame streaming, yet existing solutions still suffer from temporal repetition, drift, and motion deceleration. We fin...
- Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly improved performance on tasks such as visual grounding and visual question answering. However, the reasoning pr...
- SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards : Abstract: In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a syste...
- EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation : Abstract: All-in-One Image Restoration (AiOIR) tasks often involve diverse degradation that require robust and versatile strategies. However, most existing approaches typically lack explicit frequency...
- ShadowDraw: From Any Object to Shadow-Drawing Compositional Art : Abstract: We introduce ShadowDraw, a framework that transforms ordinary 3D objects into shadow-drawing compositional art. Given a 3D object, our system predicts scene parameters, including object pose...
- ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning : Abstract: Reward models are critical for aligning vision-language systems with human preferences, yet current approaches suffer from hallucination, weak visual grounding, and an inability to use tools...
- Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting : Abstract: Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on mod...
- Light-X: Generative 4D Video Rendering with Camera and Illumination Control : Abstract: Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key ...
- The changing surface of the world's roads : Abstract: Resilient road infrastructure is a cornerstone of the UN Sustainable Development Goals. Yet a primary indicator of network functionality and resilience is critically lacking: a comprehensive...
- Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex : Abstract: Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on re...
- Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators : Abstract: Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedd...
- Shared Multi-modal Embedding Space for Face-Voice Association : Abstract: The FAME 2026 challenge comprises two demanding tasks: training face-voice associations combined with a multilingual setting that includes testing on languages on which the model was not tra...
- From Generated Human Videos to Physically Plausible Robot Trajectories : Abstract: Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot contr...
- Towards Cross-View Point Correspondence in Vision-Language Models : Abstract: Cross-view correspondence is a fundamental capability for spatial understanding and embodied AI. However, it is still far from being realized in Vision-Language Models (VLMs), especially in ...
- OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution : Abstract: Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle ...
- Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild : Abstract: Generative psychological analysis of in-the-wild conversations faces two fundamental challenges: (1) existing Vision-Language Models (VLMs) fail to resolve Articulatory-Affective Ambiguity, ...
- E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving : Abstract: End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and ...
- MT-Depth: Multi-task Instance feature analysis for the Depth Completion : Abstract: Depth completion plays a vital role in 3D perception systems, especially in scenarios where sparse depth data must be densified for tasks such as autonomous driving, robotics, and augmented ...
- Order Matters: 3D Shape Generation from Sequential VR Sketches : Abstract: VR sketching lets users explore and iterate on ideas directly in 3D, offering a faster and more intuitive alternative to conventional CAD tools. However, existing sketch-to-shape models igno...
- PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling : Abstract: Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and char...
- LaFiTe: A Generative Latent Field for 3D Native Texturing : Abstract: Generating high-fidelity, seamless textures directly on 3D surfaces, what we term 3D-native texturing, remains a fundamental open challenge, with the potential to overcome long-standing limi...
- EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture : Abstract: We propose EMMA, an efficient and unified architecture for multimodal understanding, generation and editing. Specifically, EMMA primarily consists of 1) An efficient autoencoder with a 32x c...
- RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS : Abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with ac...
- LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation : Abstract: Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simu...
- FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis : Abstract: Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide...
- A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World : Abstract: Existing methods for deepfake detection aim to develop generalizable detectors. Although "generalizable" is the ultimate target once and for all, with limited training forgeries and domains,...
- Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens : Abstract: Autoregressive (AR) visual generation has emerged as a powerful paradigm for image and multimodal synthesis, owing to its scalability and generality. However, existing AR image generation su...
- Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing : Abstract: Capturing accurate 3D human pose in the wild would provide valuable data for training pose estimation and motion generation methods. While video-based estimation approaches have become incre...
- SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection : Abstract: Automated lesion detection in chest X-rays has demonstrated significant potential for improving clinical diagnosis by precisely localizing pathological abnormalities. While recent promptable...
- SDG-Track: A Heterogeneous Observer-Follower Framework for High-Resolution UAV Tracking on Embedded Platforms : Abstract: Real-time tracking of small unmanned aerial vehicles (UAVs) on edge devices faces a fundamental resolution-speed conflict. Downsampling high-resolution imagery to standard detector input siz...
- You Only Train Once (YOTO): A Retraining-Free Object Detection Framework : Abstract: Object detection constitutes the primary task within the domain of computer vision. It is utilized in numerous domains. Nonetheless, object detection continues to encounter the issue of cata...
- Equivariant Symmetry-Aware Head Pose Estimation for Fetal MRI : Abstract: We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of ac...
- ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching : Abstract: Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure ...
- Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion : Abstract: Latent Diffusion Models (LDMs) inherently follow a coarse-to-fine generation process, where high-level semantic structure is generated slightly earlier than fine-grained texture. This indica...
- Virtually Unrolling the Herculaneum Papyri by Diffeomorphic Spiral Fitting : Abstract: The Herculaneum Papyri are a collection of rolled papyrus documents that were charred and buried by the famous eruption of Mount Vesuvius. They promise to contain a wealth of previously unse...
- LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging : Abstract: 3D vision foundation models like Visual Geometry Grounded Transformer (VGGT) have advanced greatly in geometric perception. However, it is time-consuming and memory-intensive for long sequen...
- Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition : Abstract: This study introduces a pioneering methodology for human action recognition by harnessing deep neural network techniques and adaptive fusion strategies across multiple modalities, including ...
- FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization : Abstract: Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a...
- GeoPE:A Unified Geometric Positional Embedding for Structured Tensors : Abstract: Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation,...
- Balanced Few-Shot Episodic Learning for Accurate Retinal Disease Diagnosis : Abstract: Automated retinal disease diagnosis is vital given the rising prevalence of conditions such as diabetic retinopathy and macular degeneration. Conventional deep learning approaches require la...
- Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks : Abstract: We pilot a family of stable contrastive losses for learning pixel-level representations that jointly capture semantic and geometric information. Our approach maps each pixel of an image to a...
- Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model : Abstract: Diffusion models have emerged as a widely utilized and successful methodology in human motion synthesis. Task-oriented diffusion models have significantly advanced action-to-motion, text-to-...
- UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers : Abstract: Recent image diffusion transformers achieve high-fidelity generation, but struggle to generate images beyond these scales, suffering from content repetition and quality degradation. In this ...
- DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance : Abstract: Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked...
- EgoLCD: Egocentric Video Generation with Long Context Diffusion : Abstract: Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from conte...
- VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory : Abstract: Autoregressive (AR) diffusion enables streaming, interactive long-video generation by producing frames causally, yet maintaining coherence over minute-scale horizons remains challenging due ...
- Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation : Abstract: Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches...
- WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism : Abstract: While fulfilling communication tasks, wireless signals can also be used to sense the environment. Among various types of sensing media, WiFi signals offer advantages such as widespread avail...
- Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification : Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modal...
- Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification : Abstract: Traditional high-quality 3D scanning and reconstruction typically relies on human labor to plan the scanning procedure. With the rapid development of embodied systems such as drones and robo...
- PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement : Abstract: Video Large Language Models (Video LLMs) have shown impressive performance across a wide range of video-language tasks. However, they often fail in scenarios requiring a deeper understanding...
- Refa\c{c}ade: Editing Object with Given Reference Texture : Abstract: Recent advances in diffusion models have brought remarkable progress in image and video editing, yet some tasks remain underexplored. In this paper, we introduce a new task, Object Retexture...
- Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model : Abstract: Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approac...
- X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale : Abstract: The advancement of embodied AI has unlocked significant potential for intelligent humanoid robots. However, progress in both Vision-Language-Action (VLA) models and world models is severely ...
- VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management : Abstract: Ultra long video understanding remains an open challenge, as existing vision language models (VLMs) falter on such content due to limited context length and inefficient long term memory rete...
- Gaussian Entropy Fields: Driving Adaptive Sparsity in 3D Gaussian Optimization : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a leading technique for novel view synthesis, demonstrating exceptional rendering efficiency. \replaced[]{Well-reconstructed surfaces can be chara...
- Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering : Abstract: Document Visual Question Answering (DocVQA) enables end-to-end reasoning grounded on information present in a document input. While recent models have shown impressive capabilities, they rem...
- COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence : Abstract: Visual Spatial Reasoning is crucial for enabling Multimodal Large Language Models (MLLMs) to understand object properties and spatial relationships, yet current models still struggle with 3D...
- Dataset creation for supervised deep learning-based analysis of microscopic images - review of important considerations and recommendations : Abstract: Supervised deep learning (DL) receives great interest for automated analysis of microscopic images with an increasing body of literature supporting its potential. The development and validat...
- Prompt2Craft: Generating Functional Craft Assemblies with LLMs : Abstract: Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task tha...
- TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification : Abstract: Tumor segmentation and diagnosis in contrast-enhanced Computed Tomography (CT) rely heavily on the physiological dynamics of contrast agents. However, obtaining a complete multi-phase series...
- Infrared UAV Target Tracking with Dynamic Feature Refinement and Global Contextual Attention Knowledge Distillation : Abstract: Unmanned aerial vehicle (UAV) target tracking based on thermal infrared imaging has been one of the most important sensing technologies in anti-UAV applications. However, the infrared UAV ta...
- SAM3-I: Segment Anything with Instructions : Abstract: Segment Anything Model 3 (SAM3) has advanced open-vocabulary segmentation through promptable concept segmentation, allowing users to segment all instances corresponding to a given concept, t...
- Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot : Abstract: Detecting illicit visual content demands more than image-level NSFW flags; moderators must also know what objects make an image illegal and where those objects occur. We introduce a zero-sho...
- Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence : Abstract: In this work, we introduce HeFT (Head-Frequency Tracker), a zero-shot point tracking framework that leverages the visual priors of pretrained video diffusion models. To better understand how...
- I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models : Abstract: Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insuffi...
- Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length : Abstract: Existing diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, stre...
- Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation : Abstract: Efficient streaming video generation is critical for simulating interactive and dynamic worlds. Existing methods distill few-step video diffusion models with sliding window attention, using ...
- UniLight: A Unified Representation for Lighting : Abstract: Lighting has a strong influence on visual appearance, yet understanding and representing lighting in images remains notoriously difficult. Various lighting representations exist, such as env...
- Learning Single-Image Super-Resolution in the JPEG Compressed Domain : Abstract: Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to...
- Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications : Abstract: Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle con...
- How (Mis)calibrated is Your Federated CLIP and What To Do About It? : Abstract: While vision-language models like CLIP have been extensively studied, their calibration, crucial for reliable predictions, has received limited attention. Although a few prior works have exa...
- Real-time Cricket Sorting By Sex : Abstract: The global demand for sustainable protein sources is driving increasing interest in edible insects, with Acheta domesticus (house cricket) identified as one of the most suitable species for ...
- Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding : Abstract: Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non...
- DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision : Abstract: Vision Transformers face a fundamental limitation: standard self-attention jointly processes spatial and channel dimensions, leading to entangled representations that prevent independent mod...
- SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting : Abstract: Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion...
- A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks : Abstract: Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains di...
- Open Set Face Forgery Detection via Dual-Level Evidence Collection : Abstract: The proliferation of face forgeries has increasingly undermined confidence in the authenticity of online content. Given the rapid development of face forgery generation algorithms, new fake ...
- MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching : Abstract: Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs signifi...
- FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring : Abstract: Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifac...
- Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models : Abstract: Large-scale pre-trained Vision-Language Models (VLMs) have demonstrated strong few-shot learning capabilities. However, these methods typically learn holistic representations where an image'...
- Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection : Abstract: Medical image classification plays an increasingly vital role in identifying various diseases by classifying medical images, such as X-rays, MRIs and CT scans, into different categories base...
- Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection : Abstract: Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often en...
- UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3D Scenes : Abstract: Ray tracing 3D Gaussian particles enables realistic effects such as depth of field, refractions, and flexible camera modeling for novel-view synthesis. However, existing methods trace Gaussi...
- Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models : Abstract: Accurate and interpretable gait analysis plays a crucial role in the early detection of Parkinsons disease (PD),yet most existing approaches remain limited by single-modality inputs, low rob...
- Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation : Abstract: As a challenging video editing task, movie trailer generation involves selecting and reorganizing movie shots to create engaging trailers. Currently, most existing automatic trailer generati...
- MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving : Abstract: End-to-End autonomous driving (E2E-AD) has emerged as a new paradigm, where trajectory planning plays a crucial role. Existing studies mainly follow two directions: trajectory generation ori...
- StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios : Abstract: As embodied intelligence advances toward real-world deployment, the ability to continuously perceive and reason over streaming visual inputs becomes essential. In such settings, an agent mus...
- GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis : Abstract: Recent image denoising methods have leveraged generative modeling for real noise synthesis to address the costly acquisition of real-world noisy data. However, these generative models typica...
- dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning : Abstract: The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-...
- UniTS: Unified Time Series Generative Model for Remote Sensing : Abstract: One of the primary objectives of satellite remote sensing is to capture the complex dynamics of the Earth environment, which encompasses tasks such as reconstructing continuous cloud-free ti...
- DeRA: Decoupled Representation Alignment for Video Tokenization : Abstract: This paper presents DeRA, a novel 1D video tokenizer that decouples the spatial-temporal representation learning in video tokenization to achieve better training efficiency and performance. ...
- Not All Birds Look The Same: Identity-Preserving Generation For Birds : Abstract: Since the advent of controllable image generation, increasingly rich modes of control have enabled greater customization and accessibility for everyday users. Zero-shot, identity-preserving ...
- Controllable Long-term Motion Generation with Extended Joint Targets : Abstract: Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion d...
- Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal : Abstract: Inevitable specular highlights in practical environments severely impair the visual performance, thus degrading the task effectiveness and efficiency. Although there exist considerable metho...
- Dual-branch Prompting for Multimodal Machine Translation : Abstract: Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often...
- Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection : Abstract: Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipul...
- OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology : Abstract: The microscopic examination of surgical tissue remains a cornerstone of disease classification but relies on subjective interpretations and access to highly specialized experts, which can co...
- Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers : Abstract: This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geo...
- Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning : Abstract: Humans naturally perceive continuous experience as a hierarchy of temporally nested events, fine-grained actions embedded within coarser routines. Replicating this structure in computer visi...
- MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis : Abstract: While text-to-video (T2V) generation has achieved remarkable progress in photorealism, generating intent-aligned videos that faithfully obey physics principles remains a core challenge. In t...
- ReasonX: MLLM-Guided Intrinsic Image Decomposition : Abstract: Intrinsic image decomposition aims to separate images into physical components such as albedo, depth, normals, and illumination. While recent diffusion- and transformer-based models benefit ...
- 6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models : Abstract: Vision-language models are increasingly integrated into clinical workflows. However, existing benchmarks primarily assess performance on common anatomical presentations and fail to capture t...
- MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models : Abstract: We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage desi...
- Geschlechts\"ubergreifende Maskulina im Sprachgebrauch Eine korpusbasierte Untersuchung zu lexemspezifischen Unterschieden : Abstract: This study examines the distribution and linguistic characteristics of generic masculines (GM) in contemporary German press texts. The use of masculine personal nouns to refer to mixed-gende...
- OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models : Abstract: Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction...
- SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs : Abstract: Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4)....
- Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time : Abstract: It is a critical challenge to efficiently unlock the powerful reasoning potential of Large Language Models (LLMs) for specific tasks or new distributions. Existing test-time adaptation metho...
- EtCon: Edit-then-Consolidate for Reliable Knowledge Editing : Abstract: Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for maki...
- Challenging the Abilities of Large Language Models in Italian: a Community Initiative : Abstract: The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these model...
- AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages : Abstract: Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in...
- DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors : Abstract: We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a se...
- DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution : Abstract: In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-author...
- Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates : Abstract: Expanding the linguistic diversity of instruct large language models (LLMs) is crucial for global accessibility but is often hindered by the reliance on costly specialized target language la...
- SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs : Abstract: Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dependencies, and executing complex logical reasoning....
- LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics : Abstract: Large language models (LLMs) demonstrate remarkable potential across diverse language related tasks, yet whether they capture deeper linguistic properties, such as syntactic structure, phone...
- Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction : Abstract: The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven...
- Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking : Abstract: This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with con...
- Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o : Abstract: Effective communication is central to achieving positive healthcare outcomes in mental health contexts, yet international students often face linguistic and cultural barriers that hinder the...
- Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection : Abstract: Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depe...
- Towards Contextual Sensitive Data Detection : Abstract: The emergence of open data portals necessitates more attention to protecting sensitive data before datasets get published and exchanged. While an abundance of methods for suppressing sensiti...
- Can machines perform a qualitative data analysis? Reading the debate with Alan Turing : Abstract: This paper reflects on the literature that rejects the use of Large Language Models (LLMs) in qualitative data analysis. It illustrates through empirical evidence as well as critical reflect...
- Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment : Abstract: Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid ...
- Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction : Abstract: Image captioning has drawn considerable attention from the natural language processing and computer vision fields. Aiming to reduce the reliance on curated data, several studies have explore...
- Limit cycles for speech : Abstract: Rhythmic fluctuations in acoustic energy and accompanying neuronal excitations in cortical oscillations are characteristic of human speech, yet whether a corresponding rhythmicity inheres in...
- Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs : Abstract: Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework...
- Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective : Abstract: Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these syste...
- Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case : Abstract: Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, p...
- The AI Consumer Index (ACE) : Abstract: We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform high-value consumer tasks. ACE contains a hidden heldout set o...
- Algorithmic Thinking Theory : Abstract: Large language models (LLMs) have proven to be highly effective for solving complex reasoning tasks. Surprisingly, their capabilities can often be improved by iterating on previously generat...
- One-shot acceleration of transient PDE solvers via online-learned preconditioners : Abstract: Data-driven acceleration of scientific computing workflows has been a high-profile aim of machine learning (ML) for science, with numerical simulation of transient partial differential equat...
- On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral : Abstract: Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers...
- SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats : Abstract: Accurate question answering over real spreadsheets remains difficult due to multirow headers, merged cells, and unit annotations that disrupt naive chunking, while rigid SQL views fail on fi...
- DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle : Abstract: Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-ori...
- ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation : Abstract: Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without cos...
- LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving : Abstract: Our work presents a novel reinforcement learning (RL) based framework to optimize heuristic selection within the conflict-driven clause learning (CDCL) process, improving the efficiency of B...
- MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation : Abstract: Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decis...
- RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning : Abstract: Removing specific data influence from large language models (LLMs) remains challenging, as retraining is costly and existing approximate unlearning methods are often unstable. The challenge ...
- MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection : Abstract: LLM-based approaches have recently achieved impressive results in zero-shot stance detection. However, they still struggle in complex real-world scenarios, where stance understanding require...
- UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction : Abstract: The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our meth...
- EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion : Abstract: Adjusting the outdated knowledge of large language models (LLMs) after deployment remains a major challenge. This difficulty has spurred the development of knowledge editing, which seeks to ...
- AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees : Abstract: The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aim...
- ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning : Abstract: We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by h...
- LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence : Abstract: Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across ...
- ArterialNet: Reconstructing Arterial Blood Pressure Waveform with Wearable Pulsatile Signals, a Cohort-Aware Approach : Abstract: Goal: Continuous arterial blood pressure (ABP) waveform is invasive but essential for hemodynamic monitoring. Current non-invasive techniques reconstruct ABP waveforms with pulsatile signals...
- Convolutional Monge Mapping between EEG Datasets to Support Independent Component Labeling : Abstract: EEG recordings contain rich information about neural activity but are subject to artifacts, noise, and superficial differences due to sensors, amplifiers, and filtering. Independent componen...
- Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning : Abstract: The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode simil...
- Towards an end-to-end artificial intelligence driven global weather forecasting system : Abstract: The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting....
- Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs : Abstract: Subjective mean opinion scores (MOS) remain the de-facto target for non-intrusive speech and singing quality assessment. However, MOS is a scalar that collapses heterogeneous user expectatio...
- Tokenizing Buildings: A Transformer for Layout Synthesis : Abstract: We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize build...
- Series of quasi-uniform scatterings with fast search, root systems and neural network classifications : Abstract: In this paper we describe an approach to construct large extendable collections of vectors in predefined spaces of given dimensions. These collections are useful for neural network latent sp...
- STELLA: Guiding Large Language Models for Time Series Forecasting with Semantic Abstractions : Abstract: Recent adaptations of Large Language Models (LLMs) for time series forecasting often fail to effectively enhance information for raw series, leaving LLM reasoning capabilities underutilized....
- Shorting Dynamics and Structured Kernel Regularization : Abstract: This paper develops a nonlinear operator dynamic that progressively removes the influence of a prescribed feature subspace while retaining maximal structure elsewhere. The induced sequence o...
- Environment-Aware Channel Inference via Cross-Modal Flow: From Multimodal Sensing to Wireless Channels : Abstract: Accurate channel state information (CSI) underpins reliable and efficient wireless communication. However, acquiring CSI via pilot estimation incurs substantial overhead, especially in massi...
- Rethinking the Use of Vision Transformers for AI-Generated Image Detection : Abstract: Rich feature representations derived from CLIP-ViT have been widely utilized in AI-generated image detection. While most existing methods primarily leverage features from the final layer, we...
- Learning Causality for Longitudinal Data : Abstract: This thesis develops methods for causal inference and causal representation learning (CRL) in high-dimensional, time-varying data. The first contribution introduces the Causal Dynamic Vari...
- Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models : Abstract: Large vision-language model (LVLM) based text-to-image (T2I) systems have become the dominant paradigm in image generation, yet whether they amplify social biases remains insufficiently unde...
- Towards a unified framework for guided diffusion models : Abstract: Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025prov...
- Evolutionary Architecture Search through Grammar-Based Sequence Alignment : Abstract: Neural architecture search (NAS) in expressive search spaces is a computationally hard problem, but it also holds the potential to automatically discover completely novel and performant arch...
- HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition : Abstract: Handwritten Text Recognition remains challenging due to the limited data, high writing style variance, and scripts with complex diacritics. Existing approaches, though partially address thes...
- Model-Free Assessment of Simulator Fidelity via Quantile Curves : Abstract: Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. Howe...
- Arbitrage: Efficient Reasoning via Advantage-Aware Speculation : Abstract: Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniq...
- QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory : Abstract: Long short-term memory (LSTM) models are a particular type of recurrent neural networks (RNNs) that are central to sequential modeling tasks in domains such as urban telecommunication foreca...
- Meta-Learning for Quantum Optimization via Quantum Sequence Model : Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is a leading approach for solving combinatorial optimization problems on near-term quantum processors. However, finding good variational...
- Control Consistency Losses for Diffusion Bridges : Abstract: Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly p...
- Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction : Abstract: Although diffusion models now occupy a central place in generative modeling, introductory treatments commonly assume Euclidean data and seldom clarify their connection to discrete-state anal...
- Structured Document Translation via Format Reinforcement Learning : Abstract: Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, w...
- Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning : Abstract: Long context reasoning in large language models (LLMs) has demonstrated enhancement of their cognitive capabilities via chain-of-thought (CoT) inference. Training such models is usually done...
- NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation : Abstract: Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corr...
- DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation : Abstract: Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, exi...
- Safe Online Bid Optimization with Return on Investment and Budget Constraints : Abstract: In online marketing, the advertisers aim to balance achieving high volumes and high profitability. The companies' business units address this tradeoff by maximizing the volumes while guarant...
- ImageNot: A contrast with ImageNet preserves model rankings : Abstract: We introduce ImageNot, a dataset constructed explicitly to be drastically different than ImageNet while matching its scale. ImageNot is designed to test the external validity of deep learnin...
- FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion : Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single better-performing model in a cost-effective and data-effic...
- NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks : Abstract: Quantization is a pivotal technique for managing the growing computational and memory demands of Deep Neural Networks (DNNs). By reducing the number of bits used to represent weights and act...
- Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks : Abstract: In this work, we explore the intersection of sparse coding theory and deep learning to enhance our understanding of feature extraction capabilities in advanced neural network architectures. ...
- Educational Cone Model in Embedding Vector Spaces : Abstract: Human-annotated datasets with explicit difficulty ratings are essential in intelligent educational systems. Although embedding vector spaces are widely used to represent semantic closeness a...
- Computational Linguistics Meets Libyan Dialect: A Study on Dialect Identification : Abstract: This study investigates logistic regression, linear support vector machine, multinomial Naive Bayes, and Bernoulli Naive Bayes for classifying Libyan dialect utterances gathered from Twitter...
- Polynomiogram: An Integrated Framework for Root Visualization and Generative Art : Abstract: This work presents the Polynomiogram framework, an integrated computational platform for exploring, visualizing, and generating art from polynomial root systems. The main innovation is a fle...
- The Geometry of Benchmarks: A New Path Toward AGI : Abstract: Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoni...
- Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer : Abstract: Real-time video motion transfer applications such as immersive gaming and vision-based anomaly detection require accurate yet diverse future predictions to support realistic synthesis and ro...
- Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint : Abstract: Flow matching-based generative models have been integrated into the plug-and-play image restoration framework, and the resulting plug-and-play flow matching (PnP-Flow) model has achieved som...
- Bayes-DIC Net: Estimating Digital Image Correlation Uncertainty with Bayesian Neural Networks : Abstract: This paper introduces a novel method for generating high-quality Digital Image Correlation (DIC) dataset based on non-uniform B-spline surfaces. By randomly generating control point coordina...
- One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises : Abstract: The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversari...
- Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment : Abstract: Recent advancement in multimodal LLMs (MLLMs) has demonstrated their remarkable capability to generate descriptive captions for input videos. However, these models suffer from factual inaccu...
- AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning : Abstract: Contemporary DevSecOps pipelines have to deal with the evolution of security in an ever-continuously integrated and deployed environment. Existing methods,such as rule-based intrusion detect...
- Constructive Approximation under Carleman's Condition, with Applications to Smoothed Analysis : Abstract: A classical result of Carleman, based on the theory of quasianalytic functions, shows that polynomials are dense in $L^2(μ)$ for any $μ$ such that the moments $\int x^k dμ$ do not grow too r...
- Informative missingness and its implications in semi-supervised learning : Abstract: Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-int...
- Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering : Abstract: Sarcasm is common in online discussions, yet difficult for machines to identify because the intended meaning often contradicts the literal wording. In this work, I study sarcasm detection us...
- Predicting Time-Dependent Flow Over Complex Geometries Using Operator Networks : Abstract: Fast, geometry-generalizing surrogates for unsteady flow remain challenging. We present a time-dependent, geometry-aware Deep Operator Network that predicts velocity fields for moderate-Re f...
- NORi: An ML-Augmented Ocean Boundary Layer Parameterization : Abstract: NORi is a machine-learned (ML) parameterization of ocean boundary layer turbulence that is physics-based and augmented with neural networks. NORi stands for neural ordinary differential equa...
- Mathematical Framing for Different Agent Strategies : Abstract: We introduce a unified mathematical and probabilistic framework for understanding and comparing diverse AI agent strategies. We bridge the gap between high-level agent design concepts, such ...
- When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering : Abstract: Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that e...
- SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding : Abstract: Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal informat...
- Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control : Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow eithe...
- Fermionic neural Gibbs states : Abstract: We introduce fermionic neural Gibbs states (fNGS), a variational framework for modeling finite-temperature properties of strongly interacting fermions. fNGS starts from a reference mean-fiel...
- Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting : Abstract: We present a novel recurrent neural network architecture designed explicitly for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational manage...
- Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond : Abstract: We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to a...
- Continuous-time reinforcement learning for optimal switching over multiple regimes : Abstract: This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regular...
- Sequential Enumeration in Large Language Models : Abstract: Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily ha...
- Complementary Characterization of Agent-Based Models via Computational Mechanics and Diffusion Models : Abstract: This article extends the preprint "Characterizing Agent-Based Model Dynamics via $ε$-Machines and Kolmogorov-Style Complexity" by introducing diffusion models as orthogonal and complementary...
- Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety Guarantees : Abstract: Data-driven methods have become paramount in modern systems and control problems characterized by growing levels of complexity. In safety-critical environments, deploying these methods requi...
- TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation : Abstract: Effective earthquake risk reduction relies on accurate site-specific evaluations. This requires models that can represent the influence of local site conditions on ground motion characterist...
- TRINITY: An Evolved LLM Coordinator : Abstract: Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orches...
- Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement : Abstract: Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in pract...
- A Tutorial on Regression Analysis: From Linear Models to Deep Learning -- Lecture Notes on Artificial Intelligence : Abstract: This article serves as the regression analysis lecture notes in the Intelligent Computing course cluster (including the courses of Artificial Intelligence, Data Mining, Machine Learning, and...
- RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting : Abstract: Reinforcement Learning from Human Feedback (RLHF) is an important fine-tuning technique for large language models (LLMs) and comprises three stages: generation, inference, and training. The ...
- MemLoRA: Distilling Expert Adapters for On-Device Memory Systems : Abstract: Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-...
- A result relating convex n-widths to covering numbers with some applications to neural networks : Abstract: In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typica...
- Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty : Abstract: Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overti...
- CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent : Abstract: Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conve...
- Amortized Inference of Multi-Modal Posteriors using Likelihood-Weighted Normalizing Flows : Abstract: We present a novel technique for amortized posterior estimation using Normalizing Flows trained with likelihood-weighted importance sampling. This approach allows for the efficient inference...
- Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning : Abstract: The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combinin...
- Efficient Generative Transformer Operators For Million-Point PDEs : Abstract: We introduce ECHO, a transformer-operator framework for generating million-point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential e...
- Dual-Path Region-Guided Attention Network for Ground Reaction Force and Moment Regression : Abstract: Accurate estimation of three-dimensional ground reaction forces and moments (GRFs/GRMs) is crucial for both biomechanics research and clinical rehabilitation evaluation. In this study, we fo...
- SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals : Abstract: Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their utility is often limited by noisy and inconsistent ac...
- Multi-LLM Collaboration for Medication Recommendation : Abstract: As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language...
- Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection : Abstract: Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hyb...
- David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? : Abstract: Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always ...
- OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design : Abstract: Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often ma...
- Gradient Descent with Provably Tuned Learning-rate Schedules : Abstract: Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one ...
- The Geometry of Intelligence: Deterministic Functional Topology as a Foundation for Real-World Perception : Abstract: Real-world physical processes do not generate arbitrary variability: their signals concentrate on compact and low-variability subsets of functional space. This geometric structure enables ra...
- TV2TV: A Unified Framework for Interleaved Language and Video Generation : Abstract: Video generation models are rapidly advancing, but can still struggle with complex video outputs that require significant semantic branching or repeated high-level reasoning about what shoul...
- Deep infant brain segmentation from multi-contrast MRI : Abstract: Segmentation of magnetic resonance images (MRI) facilitates analysis of human brain development by delineating anatomical structures. However, in infants and young children, accurate segment...
- Value Gradient Guidance for Flow Matching Alignment : Abstract: While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficie...
- The Universal Weight Subspace Hypothesis : Abstract: We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demon...
- Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants : Abstract: As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency ...
- AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy : Abstract: Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large...
- Patient Safety Risks from AI Scribes: Signals from End-User Feedback : Abstract: AI scribes are transforming clinical documentation at scale. However, their real-world performance remains understudied, especially regarding their impacts on patient safety. To this end, we...
- Measuring Agents in Production : Abstract: AI agents are actively running in production across diverse industries, yet little is publicly known about which technical approaches enable successful real-world deployments. We present the...
- Enhancing next token prediction based pre-training for jet foundation models : Abstract: Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datas...
- The Initialization Determines Whether In-Context Learning Is Gradient Descent : Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attenti...
- Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order : Abstract: Post-training with reinforcement learning (RL) typically optimizes a single scalar objective and ignores structure in how solutions are produced. We ask whether a scalar hint toward a canoni...
- GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers : Abstract: Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP ...
- When do spectral gradient updates help in deep learning? : Abstract: Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transform...
- Evaluating Long-Context Reasoning in LLM-Based WebAgents : Abstract: As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for provi...
- RNNs perform task computations by dynamically warping neural representations : Abstract: Analysing how neural networks represent data features in their activations can help interpret how they perform tasks. Hence, a long line of work has focused on mathematically characterising ...
- Data-regularized Reinforcement Learning for Diffusion Models at Scale : Abstract: Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such...
- RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection : Abstract: Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and c...
- Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism : Abstract: Popular offline reinforcement learning (RL) methods rely on conservatism, either by penalizing out-of-dataset actions or by restricting planning horizons. In this work, we question the unive...
- Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models : Abstract: Detecting when large language models (LLMs) are uncertain is critical for building reliable systems, yet existing methods are overly complicated, relying on brittle semantic clustering or in...
- SmartAlert: Implementing Machine Learning-Driven Clinical Decision Support for Inpatient Lab Utilization Reduction : Abstract: Repetitive laboratory testing unlikely to yield clinically useful information is a common practice that burdens patients and increases healthcare costs. Education and feedback interventions ...
- STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting : Abstract: Fine-grained air pollution forecasting is crucial for urban management and the development of healthy buildings. Deploying portable sensors on mobile platforms such as cars and buses offers ...
- Learning to Orchestrate Agents in Natural Language with the Conductor : Abstract: Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Cond...
- Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles : Abstract: We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feat...
- GraphBench: Next-generation graph learning benchmarking : Abstract: Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragm...
- Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems : Abstract: Mixture-of-Experts (MoE) models scale large language models through conditional computation, but inference becomes memory-bound once expert weights exceed the capacity of GPU memory. In this...
- Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval : Abstract: Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, ...
- Explainable Graph Representation Learning via Graph Pattern Analysis : Abstract: Explainable artificial intelligence (XAI) is an important area in the AI community, and interpretability is crucial for building robust and trustworthy AI models. While previous work has exp...
- On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference : Abstract: Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs). Despite the empirical success of methods such as best-of-$n$ (BoN) sampling ...
- Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function : Abstract: Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer fr...
- LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models : Abstract: Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of che...
- Reliable Statistical Guarantees for Conformal Predictors with Small Datasets : Abstract: Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output...
- Temp-SCONE: A Novel Out-of-Distribution Detection and Domain Generalization Framework for Wild Data with Temporal Shift : Abstract: Open-world learning (OWL) requires models that can adapt to evolving environments while reliably detecting out-of-distribution (OOD) inputs. Existing approaches, such as SCONE, achieve robus...
- Exploiting \texttt{ftrace}'s \texttt{function\_graph} Tracer Features for Machine Learning: A Case Study on Encryption Detection : Abstract: This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experim...
- QoSDiff: An Implicit Topological Embedding Learning Framework Leveraging Denoising Diffusion and Adversarial Attention for Robust QoS Prediction : Abstract: Accurate Quality of Service (QoS) prediction is fundamental to service computing, providing essential data-driven guidance for service selection and ensuring superior user experiences. Howev...
- Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space : Abstract: Large language model (LLM) agents -- LLMs that dynamically interact with an environment over long horizons -- have become an increasingly important area of research, enabling automation in c...
- Score Matching for Estimating Finite Point Processes : Abstract: Score matching estimators have garnered significant attention in recent years because they eliminate the need to compute normalizing constants, thereby mitigating the computational challenge...
- Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective : Abstract: In the history of knowledge distillation, the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of De...
- Federated Learning for Anomaly Detection in Maritime Movement Data : Abstract: This paper introduces M3fed, a novel solution for federated learning of movement anomaly detection models. This innovation has the potential to improve data privacy and reduce communication ...
- Contract-Governed Training for Earth Observation: Observed Service Agreement Graphs and Coverage-Accuracy Trade-offs : Abstract: Earth observation (EO) models are frequently trained under implicit sampling policies that optimize global accuracy but provide no explicit guarantees on who (which regions, classes, or miss...
- ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text : Abstract: Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with ta...
- Decoding Large Language Diffusion Models with Foreseeing Movement : Abstract: Large Language Diffusion Models (LLDMs) benefit from a flexible decoding mechanism that enables parallelized inference and controllable generations over autoregressive models. Yet such flexi...
- MechDetect: Detecting Data-Dependent Errors : Abstract: Data quality monitoring is a core challenge in modern information processing systems. While many approaches to detect data errors or shifts have been proposed, few studies investigate the me...
- Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity : Abstract: Two pressing topics in the theory of deep learning are the interpretation of feature learning mechanisms and the determination of implicit bias of networks in the rich regime. Current theori...
- BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training : Abstract: Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumpti...
- Network of Theseus (like the ship) : Abstract: A standard assumption in deep learning is that the inductive bias introduced by a neural network architecture must persist from training through inference. The architecture you train with is...
- ActVAE: Modelling human activity schedules with a deep conditional generative approach : Abstract: Modelling the complexity and diversity of human activity scheduling behaviour is inherently challenging. We demonstrate a deep conditional-generative machine learning approach for the modell...
- Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning : Abstract: Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1)-a key target in overcoming cancer chemoresistance-remains a critical challenge in early dr...
- Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness : Abstract: Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation functi...
- Sponsored Questions and How to Auction Them : Abstract: Online platforms connect users with relevant products and services using ads. A key challenge is that a user's search query often leaves their true intent ambiguous. Typically, platforms pas...
- TARA Test-by-Adaptive-Ranks for Quantum Anomaly Detection with Conformal Prediction Guarantees : Abstract: Quantum key distribution (QKD) security fundamentally relies on the ability to distinguish genuine quantum correlations from classical eavesdropper simulations, yet existing certification me...
- Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study : Abstract: This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary n...
- Polarization by Design: How Elites Could Shape Mass Preferences as AI Reduces Persuasion Costs : Abstract: In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only throu...
- Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation : Abstract: Mobile agents show immense potential, yet current state-of-the-art (SoTA) agents exhibit inadequate success rates on real-world, long-horizon, cross-application tasks. We attribute this bott...
- Large Language Model-Based Agents for Software Engineering: A Survey : Abstract: The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the vers...
- A Survey on Recommendation Unlearning: Fundamentals, Taxonomy, Evaluation, and Open Questions : Abstract: Recommender systems have become increasingly influential in shaping user behavior and decision-making, highlighting their growing impact in various domains. Meanwhile, the widespread adoptio...
- Public Sentiment Analysis of Traffic Management Policies in Knoxville: A Social Media Driven Study : Abstract: This study presents a comprehensive analysis of public sentiment toward traffic management policies in Knoxville, Tennessee, utilizing social media data from Twitter and Reddit platforms. We...
- The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyses of cystic fibrosis exacerbations : Abstract: Loss of lung function in cystic fibrosis (CF) occurs progressively, punctuated by acute pulmonary exacerbations (PEx) in which abrupt declines in lung function are not fully recovered. A key...
- Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models : Abstract: Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these syste...
- Thucy: An LLM-based Multi-Agent System for Claim Verification across Relational Databases : Abstract: In today's age, it is becoming increasingly difficult to decipher truth from lies. Every day, politicians, media outlets, and public figures make conflicting claims$\unicode{x2014}$often abo...
- BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents : Abstract: As an effective method to boost the performance of Large Language Models (LLMs) on the question answering (QA) task, Retrieval-Augmented Generation (RAG), which queries highly relevant infor...
- World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations : Abstract: Autonomous navigation of terrestrial robots using Reinforcement Learning (RL) from LIDAR observations remains challenging due to the high dimensionality of sensor data and the sample ineffic...
- AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation : Abstract: Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We...
- Cell-cell communication inference and analysis: biological mechanisms, computational approaches, and future opportunities : Abstract: In multicellular organisms, cells coordinate their activities through cell-cell communication (CCC), which are crucial for development, tissue homeostasis, and disease progression. Recent ad...
- A Learning-based Control Methodology for Transitioning VTOL UAVs : Abstract: Transition control poses a critical challenge in Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL UAV) development due to the tilting rotor mechanism, which shifts the center of g...
- State Space Models for Bioacoustics: A comparative Evaluation with Transformers : Abstract: In this study, we evaluate the efficacy of the Mamba model in the field of bioacoustics. We first pretrain a Mamba-based audio large language model (LLM) on a large corpus of audio data usin...
- KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing : Abstract: Deploying large language models (LLMs) on edge devices enables personalized agents with strong privacy and low cost. However, with tens to hundreds of billions of parameters, single-batch au...
- Matrix Editing Meets Fair Clustering: Parameterized Algorithms and Complexity : Abstract: We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct col...
- Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs : Abstract: Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction hand...
- AI/ML in 3GPP 5G Advanced - Services and Architecture : Abstract: The 3rd Generation Partnership Project (3GPP), the standards body for mobile networks, is in the final phase of Release 19 standardization and is beginning Release 20. Artificial Intelligenc...
- Bayesian Optimization for Automatic Tuning of Torque-Level Nonlinear Model Predictive Control : Abstract: This paper presents an auto-tuning framework for torque-based Nonlinear Model Predictive Control (nMPC), where the MPC serves as a real-time controller for optimal joint torque commands. The...
- MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving : Abstract: Autonomous Driving (AD) vehicles still struggle to exhibit human-like behavior in highly dynamic and interactive traffic scenarios. The key challenge lies in AD's limited ability to interact...
- Hierarchical Vision Language Action Model Using Success and Failure Demonstrations : Abstract: Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collecti...
- Beyond the Black Box: A Cognitive Architecture for Explainable and Aligned AI : Abstract: Current AI paradigms, as "architects of experience," face fundamental challenges in explainability and value alignment. This paper introduces "Weight-Calculatism," a novel cognitive architec...
- When Do Symbolic Solvers Enhance Reasoning in Large Language Models? : Abstract: Large Reasoning Models (LRMs) achieve strong performance on complex reasoning tasks by generating long Chains of Thought (CoTs). However, this paradigm might incur substantial token overhead...
- Prior preferences in active inference agents: soft, hard, and goal shaping : Abstract: Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative...
- Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia : Abstract: Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human a...
- Multimodal Reinforcement Learning with Agentic Verifier for AI Agents : Abstract: Agentic reasoning models trained with multimodal reinforcement learning (MMRL) have become increasingly capable, yet they are almost universally optimized using sparse, outcome-based rewards...
- Multi-Agent Reinforcement Learning with Communication-Constrained Priors : Abstract: Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent is...
- PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks : Abstract: We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task pla...
- Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks : Abstract: Despite recent advances, autonomous agents often struggle to solve complex tasks in enterprise domains that require coordinating multiple tools and processing diverse data sources. This stru...
- DeepRule: An Integrated Framework for Automated Business Rule Generation via Deep Predictive Modeling and Hybrid Search Optimization : Abstract: This paper proposes DeepRule, an integrated framework for automated business rule generation in retail assortment and pricing optimization. Addressing the systematic misalignment between exi...
- MemVerse: Multimodal Memory for Lifelong Learning Agents : Abstract: Despite rapid progress in large-scale language and vision models, AI agents still suffer from a fundamental limitation: they cannot remember. Without reliable memory, agents catastrophically...
- RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design : Abstract: Automatic Heuristic Design (AHD) has gained traction as a promising solution for solving combinatorial optimization problems (COPs). Large Language Models (LLMs) have emerged and become a pr...
- Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning : Abstract: Recent advances in Omni models have enabled unified multimodal perception and generation. However, most existing systems still exhibit rigid reasoning behaviors, either overthinking simple p...
- A Hierarchical Tree-based approach for creating Configurable and Static Deep Research Agent (Static-DRA) : Abstract: The advancement in Large Language Models has driven the creation of complex agentic systems, such as Deep Research Agents (DRAs), to overcome the limitations of static Retrieval Augmented Ge...
- Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties : Abstract: This paper presents a logic programming-based framework for policy-aware autonomous agents that can reason about potential penalties for non-compliance and act accordingly. While prior work ...
- Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol : Abstract: Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for su...
- AI-Driven Document Redaction in UK Public Authorities: Implementation Gaps, Regulatory Challenges, and the Human Oversight Imperative : Abstract: Document redaction in public authorities faces critical challenges as traditional manual approaches struggle to balance growing transparency demands with increasingly stringent data protecti...
- Quantifying the Potential to Escape Filter Bubbles: A Behavior-Aware Measure via Contrastive Simulation : Abstract: Nowadays, recommendation systems have become crucial to online platforms, shaping user exposure by accurate preference modeling. However, such an exposure strategy can also reinforce users' ...
- Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation : Abstract: The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded a...
- Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem : Abstract: Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June ...
- Will Power Return to the Clouds? From Divine Authority to GenAI Authority : Abstract: Generative AI systems now mediate newsfeeds, search rankings, and creative content for hundreds of millions of users, positioning a handful of private firms as de-facto arbiters of truth. Dr...
- Irresponsible AI: big tech's influence on AI research and associated impacts : Abstract: The accelerated development, deployment and adoption of artificial intelligence systems has been fuelled by the increasing involvement of big tech. This has been accompanied by increasing et...
- AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations : Abstract: Advances in large language models (LLMs) are accelerating discovery in molecular science. However, adapting molecular information to the serialized, token-based processing of LLMs remains a ...
- Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation : Abstract: Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frame...
- When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI : Abstract: Large vision-language models (LVLMs) are increasingly used for tasks where detecting multimodal harmful content is crucial, such as online content moderation. However, real-world harmful con...
- Community Quality and Influence Maximization: An Empirical Study : Abstract: Influence maximization in social networks plays a vital role in applications such as viral marketing, epidemiology, product recommendation, opinion mining, and counter-terrorism. A common ap...
- Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks : Abstract: Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs) with external knowledge for diverse, kno...
Research Sources: 336 | Generated: 12/5/2025
