AI RESEARCH PAPERS & ACADEMIC SOURCES
- Self-Supervised One-Step Diffusion Refinement for Snapshot Compressive Imaging : Abstract: Snapshot compressive imaging (SCI) captures multispectral images (MSIs) using a single coded two-dimensional (2-D) measurement, but reconstructing high-fidelity MSIs from these compressed in...
- Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration : Abstract: In dynamic environments, robots often encounter constrained movement trajectories when manipulating objects with specific properties, such as doors. Therefore, applying the appropriate force...
- 3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation : Abstract: The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating ...
- AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving : Abstract: Vision-Language-Action (VLA) models in autonomous driving systems have recently demonstrated transformative potential by integrating multimodal perception with decision-making capabilities. ...
- SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation : Abstract: Modeling and synthesizing complex hand-object interactions remains a significant challenge, even for state-of-the-art physics engines. Conventional simulation-based approaches rely on explic...
- SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning : Abstract: Controllable image semantic understanding tasks, such as captioning or segmentation, necessitate users to input a prompt (e.g., text or bounding boxes) to predict a unique outcome, presentin...
- Artemis: Structured Visual Reasoning for Perception Policy Learning : Abstract: Recent reinforcement-learning frameworks for visual perception policy have begun to incorporate intermediate reasoning chains expressed in natural language. Empirical observations indicate t...
- PAI-Bench: A Comprehensive Benchmark For Physical AI : Abstract: Physical AI aims to develop models that can perceive and predict real-world dynamics; yet, the extent to which current multi-modal large language models and video generative models support t...
- Learning Visual Affordance from Audio : Abstract: We introduce Audio-Visual Affordance Grounding (AV-AG), a new task that segments object interaction regions from action sounds. Unlike existing approaches that rely on textual instructions o...
- MV-TAP: Tracking Any Point in Multi-View Videos : Abstract: Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to various applications. In this ...
- AirSim360: A Panoramic Simulation Platform within Drone View : Abstract: The field of 360-degree omnidirectional understanding has been receiving increasing attention for advancing spatial intelligence. However, the lack of large-scale and diverse data remains a ...
- TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models : Abstract: Unified multimodal models (UMMs) aim to jointly perform multimodal understanding and generation within a single framework. We present TUNA, a native UMM that builds a unified continuous visu...
- Generative Video Motion Editing with 3D Point Tracks : Abstract: Camera and object motions are central to a video's narrative. However, precisely editing these captured motions remains a significant challenge, especially under complex object movements. Cu...
- Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now : Abstract: Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: g...
- Data-Centric Visual Development for Self-Driving Labs : Abstract: Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent preci...
- Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges : Abstract: The emergence of multi-modal foundation models has markedly transformed the technology for autonomous driving, shifting away from conventional and mostly hand-crafted design choices towards ...
- Learning from Watching: Scalable Extraction of Manipulation Trajectories from Human Videos : Abstract: Collecting high-quality data for training large-scale robotic models typically relies on real robot platforms, which is labor-intensive and costly, whether via teleoperation or scripted demo...
- ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM : Abstract: Visual-inertial SLAM systems often exhibit suboptimal performance due to multiple confounding factors including imperfect sensor calibration, noisy measurements, rapid motion dynamics, low i...
- VISTAv2: World Imagination for Indoor Vision-and-Language Navigation : Abstract: Vision-and-Language Navigation (VLN) requires agents to follow language instructions while acting in continuous real-world spaces. Prior image imagination based VLN work shows benefits for d...
- Coarse-to-Fine Non-Rigid Registration for Side-Scan Sonar Mosaicking : Abstract: Side-scan sonar mosaicking plays a crucial role in large-scale seabed mapping but is challenged by complex non-linear, spatially varying distortions due to diverse sonar acquisition conditio...
- Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning : Abstract: Despite strong results on recognition and segmentation, current 3D visual pre-training methods often underperform on robotic manipulation. We attribute this gap to two factors: the lack of s...
- Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning : Abstract: We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or...
- HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion Models : Abstract: Modern generative diffusion models rely on vast training datasets, often including images with uncertain ownership or usage rights. Radioactive watermarks -- marks that transfer to a model's...
- MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning : Abstract: In this paper, we propose Mixture of Layer-Wise Tokens (MoLT), a parameter- and memory-efficient adaptation framework for audio-visual learning. The key idea of MoLT is to replace convention...
- Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS : Abstract: Miniature imaging systems are essential for space-constrained applications but are limited by memory and power constraints. While machine learning can reduce data size by extracting key feat...
- MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation : Abstract: Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pip...
- Fast, Robust, Permutation-and-Sign Invariant SO(3) Pattern Alignment : Abstract: We address the correspondence-free alignment of two rotation sets on \(SO(3)\), a core task in calibration and registration that is often impeded by missing time alignment, outliers, and unk...
- Sign Language Recognition using Bidirectional Reservoir Computing : Abstract: Sign language recognition (SLR) facilitates communication between deaf and hearing individuals. Deep learning is widely used to develop SLR-based systems; however, it is computationally inte...
- Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound : Abstract: World models simulate environmental dynamics to enable agents to plan and reason about future states. While existing approaches have primarily focused on visual observations, real-world perc...
- FOM-Nav: Frontier-Object Maps for Object Goal Navigation : Abstract: This paper addresses the Object Goal Navigation problem, where a robot must efficiently find a target object in an unknown environment. Existing implicit memory-based methods struggle with l...
- Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer : Abstract: Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to ...
- Estimation of Kinematic Motion from Dashcam Footage : Abstract: The goal of this paper is to explore the accuracy of dashcam footage to predict the actual kinematic motion of a car-like vehicle. Our approach uses ground truth information from the vehicle...
- Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics : Abstract: Liquid argon time projection chambers (LArTPCs) provide dense, high-fidelity 3D measurements of particle interactions and underpin current and future neutrino and rare-event experiments. Phy...
- TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking : Abstract: Topology-consistent dynamic model sequences are essential for applications such as animation and model editing. However, existing 4D reconstruction methods face challenges in generating high...
- NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction : Abstract: Embodied navigation for long-horizon tasks, guided by complex natural language instructions, remains a formidable challenge in artificial intelligence. Existing agents often struggle with ro...
- Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks : Abstract: Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static inputs are typically repeated...
- Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific Strategies : Abstract: Medical image registration drives quantitative analysis across organs, modalities, and patient populations. Recent deep learning methods often combine low-level "trend-driven" computational ...
- Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models : Abstract: Robust robotic manipulation requires reliable failure detection and recovery. Although current Vision-Language Models (VLMs) show promise, their accuracy and generalization are limited by th...
- Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-scale Feature Extraction : Abstract: The outcome of text recognition for degraded color documents is often unsatisfactory due to interference from various contaminants. To extract information more efficiently for text recogniti...
- Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models : Abstract: Continuous perception, the ability to integrate visual observations over time in a continuous stream fashion, is essential for robust real-world understanding, yet remains largely untested i...
- SynPlay: Large-Scale Synthetic Human Data with Real-World Diversity for Aerial-View Perception : Abstract: We introduce SynPlay, a large-scale synthetic human dataset purpose-built for advancing multi-perspective human localization, with a predominant focus on aerial-view perception. SynPlay depa...
- Sketch-guided Cage-based 3D Gaussian Splatting Deformation : Abstract: 3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision. While various systems have introdu...
- SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates : Abstract: Decomposing physically-based materials from images into their constituent properties remains challenging, particularly when maintaining both computational efficiency and physical consistency...
- Manual-PA: Learning 3D Part Assembly from Instruction Diagrams : Abstract: Assembling furniture amounts to solving the discrete-continuous optimization task of selecting the furniture parts to assemble and estimating their connecting poses in a physically realistic...
- SizeGS: Size-aware Compression of 3D Gaussian Splatting via Mixed Integer Programming : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have greatly improved 3D reconstruction. However, its substantial data size poses a significant challenge for transmission and storage. While ...
- U-FaceBP: Uncertainty-aware Bayesian Ensemble Deep Learning for Face Video-based Blood Pressure Measurement : Abstract: Blood pressure (BP) measurement is crucial for daily health assessment. Remote photoplethysmography (rPPG), which extracts pulse waves from face videos captured by a camera, has the potentia...
- DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy : Abstract: Despite recent text-to-image models achieving highfidelity text rendering, they still struggle with long or multiple texts due to diluted global attention. We propose DCText, a training-free...
- Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D Gaussians : Abstract: Branches swaying in the breeze, flags rippling in the wind, and boats rocking on the water all show how aerodynamics shape natural motion -- an effect crucial for realism in vision and graph...
- Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age Prediction : Abstract: Large-scale brain imaging datasets provide unprecedented opportunities for developing domain foundation models through pretraining. However, unlike natural image datasets in computer vision,...
- IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval : Abstract: In recent years, significant developments have been made in both video retrieval and video moment retrieval tasks, which respectively retrieve complete videos or moments for a given text que...
- TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance : Abstract: In the digital economy era, digital watermarking serves as a critical basis for ownership proof of massive replicable content, including AI-generated and other virtual assets. Designing robu...
- AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation : Abstract: Text-guided image-to-video (TI2V) generation has recently achieved remarkable progress, particularly in maintaining subject consistency and temporal coherence. However, existing methods stil...
- EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans : Abstract: Speech-driven Talking Human (TH) generation, commonly known as "Talker," currently faces limitations in multi-subject driving capabilities. Extending this paradigm to "Multi-Talker," capable...
- InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision : Abstract: Large-scale video-text pretraining achieves strong performance but depends on noisy, synthetic captions with limited semantic coverage, often overlooking implicit world knowledge such as obj...
- Handwritten Text Recognition for Low Resource Languages : Abstract: Despite considerable progress in handwritten text recognition, paragraph-level handwritten text recognition, especially in low-resource languages, such as Hindi, Urdu and similar scripts, re...
- OpenBox: Annotate Any Bounding Boxes in 3D : Abstract: Unsupervised and open-vocabulary 3D object detection has recently gained attention, particularly in autonomous driving, where reducing annotation costs and recognizing unseen objects are cri...
- SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation : Abstract: 3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes...
- Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network : Abstract: Textured high-fidelity 3D models are crucial for games, AR/VR, and film, but human-aligned evaluation methods still fall behind despite recent advances in 3D reconstruction and generation. E...
- Reversible Inversion for Training-Free Exemplar-guided Image Editing : Abstract: Exemplar-guided Image Editing (EIE) aims to modify a source image according to a visual reference. Existing approaches often require large-scale pre-training to learn relationships between t...
- PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications : Abstract: Understanding dynamic 4D environments-3D space evolving over time-is critical for robotic and interactive systems. These applications demand systems that can process streaming point cloud vi...
- FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution : Abstract: Real-image super-resolution (Real-ISR) seeks to recover HR images from LR inputs with mixed, unknown degradations. While diffusion models surpass GANs in perceptual quality, they under-recon...
- MDiff4STR: Mask Diffusion Model for Scene Text Recognition : Abstract: Mask Diffusion Models (MDMs) have recently emerged as a promising alternative to auto-regressive models (ARMs) for vision-language tasks, owing to their flexible balance of efficiency and ac...
- \textit{ViRectify}: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models : Abstract: As multimodal large language models (MLLMs) frequently exhibit errors in complex video reasoning scenarios, correcting these errors is critical for uncovering their weaknesses and improving ...
- ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers : Abstract: Leveraging pre-trained Diffusion Transformers (DiTs) for high-resolution (HR) image synthesis often leads to spatial layout collapse and degraded texture fidelity. Prior work mitigates these...
- Language-Guided Open-World Anomaly Segmentation : Abstract: Open-world and anomaly segmentation methods seek to enable autonomous driving systems to detect and segment both known and unknown objects in real-world scenes. However, existing methods do ...
- FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation : Abstract: 3D human avatar animation aims at transforming a human avatar from an arbitrary initial pose to a specified target pose using deformation algorithms. Existing approaches typically divide thi...
- CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball : Abstract: This paper presents CourtMotion, a spatiotemporal modeling framework for analyzing and predicting game events and plays as they develop in professional basketball. Anticipating basketball ev...
- ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling : Abstract: Although prevailing camera-controlled video generation models can produce cinematic results, lifting them directly to the generation of 3D-consistent and high-fidelity time-synchronized mult...
- A variational method for curve extraction with curvature-dependent energies : Abstract: We introduce a variational approach for extracting curves between a list of possible endpoints, based on the discretization of an energy and Smirnov's decomposition theorem for vector fields...
- ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark : Abstract: Video instance segmentation (VIS) for low-light content remains highly challenging for both humans and machines alike, due to adverse imaging conditions including noise, blur and low-contras...
- QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions : Abstract: Despite rapid advances in molecular and materials machine learning, most models still lack physical transferability: they fit correlations across whole molecules or crystals rather than lear...
- FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention : Abstract: 3D reconstruction from multi-view images is a core challenge in computer vision. Recently, feed-forward methods have emerged as efficient and robust alternatives to traditional per-scene opt...
- Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess Segmentation : Abstract: Abscesses in the head and neck represent an acute infectious process that can potentially lead to sepsis or mortality if not diagnosed and managed promptly. Accurate detection and delineatio...
- Depth Matching Method Based on ShapeDTW for Oil-Based Mud Imager : Abstract: In well logging operations using the oil-based mud (OBM) microresistivity imager, which employs an interleaved design with upper and lower pad sets, depth misalignment issues persist between...
- SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge : Abstract: Articulated 3D objects are critical for embodied AI, robotics, and interactive scene understanding, yet creating simulation-ready assets remains labor-intensive and requires expert modeling ...
- Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval : Abstract: Composed Image Retrieval (CIR) enables fine-grained visual search by combining a reference image with a textual modification. While supervised CIR methods achieve high accuracy, their relian...
- ViT$^3$: Unlocking Test-Time Training in Vision : Abstract: Test-Time Training (TTT) has recently emerged as a promising direction for efficient sequence modeling. TTT reformulates attention operation as an online learning problem, constructing a com...
- DB-KAUNet: An Adaptive Dual Branch Kolmogorov-Arnold UNet for Retinal Vessel Segmentation : Abstract: Accurate segmentation of retinal vessels is crucial for the clinical diagnosis of numerous ophthalmic and systemic diseases. However, traditional Convolutional Neural Network (CNN) methods e...
- Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery : Abstract: Tiny object detection in remote sensing imagery has attracted significant research interest in recent years. Despite recent progress, achieving balanced detection performance across diverse ...
- GRASP: Guided Residual Adapters with Sample-wise Partitioning : Abstract: Recent advances in text-to-image diffusion models enable high-fidelity generation across diverse prompts. However, these models falter in long-tail settings, such as medical imaging, where r...
- Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation : Abstract: Generating realistic hand-object interactions (HOI) videos is a significant challenge due to the difficulty of modeling physical constraints (e.g., contact and occlusion between hands and ma...
- Cross-Domain Validation of a Resection-Trained Self-Supervised Model on Multicentre Mesothelioma Biopsies : Abstract: Accurate subtype classification and outcome prediction in mesothelioma are essential for guiding therapy and patient care. Most computational pathology models are trained on large tissue ima...
- DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models : Abstract: Current story visualization methods tend to position subjects solely by text and face challenges in maintaining artistic consistency. To address these limitations, we introduce DreamingComic...
- SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised Segmentation : Abstract: In recent years, Contrastive Language-Image Pretraining (CLIP) has been widely applied to Weakly Supervised Semantic Segmentation (WSSS) tasks due to its powerful cross-modal semantic unders...
- FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing : Abstract: Instruction-based image editing through natural language has emerged as a powerful paradigm for intuitive visual manipulation. While recent models achieve impressive results on single edits,...
- HiconAgent: History Context-aware Policy Optimization for GUI Agents : Abstract: Graphical User Interface (GUI) agents require effective use of historical context to perform sequential navigation tasks. While incorporating past actions and observations can improve decisi...
- VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis : Abstract: Automatically understanding video contents is important for several applications in Civic Monitoring (CM), general Surveillance (SL), Assisted Living (AL), etc. Decades of Image and Video An...
- Robust Rigid and Non-Rigid Medical Image Registration Using Learnable Edge Kernels : Abstract: Medical image registration is crucial for various clinical and research applications including disease diagnosis or treatment planning which require alignment of images from different modali...
- Evaluating SAM2 for Video Semantic Segmentation : Abstract: The Segmentation Anything Model 2 (SAM2) has proven to be a powerful foundation model for promptable visual object segmentation in both images and videos, capable of storing object-aware mem...
- Learned Image Compression for Earth Observation: Implications for Downstream Segmentation Tasks : Abstract: The rapid growth of data from satellite-based Earth observation (EO) systems poses significant challenges in data transmission and storage. We evaluate the potential of task-specific learned...
- SAM3-UNet: Simplified Adaptation of Segment Anything Model 3 : Abstract: In this paper, we introduce SAM3-UNet, a simplified variant of Segment Anything Model 3 (SAM3), designed to adapt SAM3 for downstream tasks at a low cost. Our SAM3-UNet consists of three com...
- Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos : Abstract: Despite rapid advances in video generative models, robust metrics for evaluating visual and temporal correctness of complex human actions remain elusive. Critically, existing pure-vision enc...
- Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling : Abstract: Spatial reasoning, the ability to understand and interpret the 3D structure of the world, is a critical yet underdeveloped capability in Multimodal Large Language Models (MLLMs). Current met...
- CauSight: Learning to Supersense for Visual Causal Discovery : Abstract: Causal thinking enables humans to understand not just what is seen, but why it happens. To replicate this capability in modern AI systems, we introduce the task of visual causal discovery. I...
- OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic : Abstract: Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcem...
- PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models : Abstract: Driven by the growing capacity and training scale, Text-to-Video (T2V) generation models have recently achieved substantial progress in video quality, length, and instruction-following capab...
- Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching : Abstract: Point cloud registration aligns multiple unposed point clouds into a common frame, and is a core step for 3D reconstruction and robot localization. In this work, we cast registration as cond...
- COACH: Collaborative Agents for Contextual Highlighting - A Multi-Agent Framework for Sports Video Analysis : Abstract: Intelligent sports video analysis demands a comprehensive understanding of temporal context, from micro-level actions to macro-level game strategies. Existing end-to-end models often struggl...
- TransientTrack: Advanced Multi-Object Tracking and Classification of Cancer Cells with Transient Fluorescent Signals : Abstract: Tracking cells in time-lapse videos is an essential technique for monitoring cell population dynamics at a single-cell level. Current methods for cell tracking are developed on videos with m...
- KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM : Abstract: We present KM-ViPE (Knowledge Mapping Video Pose Engine), a real-time open-vocabulary SLAM framework for uncalibrated monocular cameras in dynamic environments. Unlike systems requiring dept...
- StyleYourSmile: Cross-Domain Face Retargeting Without Paired Multi-Style Data : Abstract: Cross-domain face retargeting requires disentangled control over identity, expressions, and domain-specific stylistic attributes. Existing methods, typically trained on real-world faces, eit...
- SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile Perception : Abstract: Contact-rich robotic manipulation requires representations that encode local geometry. Vision provides global context but lacks direct measurements of properties such as texture and hardness...
- Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding : Abstract: Large vision-language models (LVLMs) are now central to healthcare applications such as medical visual question answering and imaging report generation. Yet, these models remain vulnerable t...
- Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory : Abstract: Multi-Object Tracking (MOT) is a critical task in computer vision, with applications ranging from surveillance systems to autonomous driving. However, threats to MOT algorithms have yet been...
- Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models : Abstract: The rapid growth of visual tokens in multimodal large language models (MLLMs) leads to excessive memory consumption and inference latency, especially when handling high-resolution images and...
- NeuroVolve: Evolving Visual Stimuli toward Programmable Neural Objectives : Abstract: What visual information is encoded in individual brain regions, and how do distributed patterns combine to create their neural representations? Prior work has used generative models to repli...
- SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension : Abstract: Satire, a form of artistic expression combining humor with implicit critique, holds significant social value by illuminating societal issues. Despite its cultural and societal significance, ...
- Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models : Abstract: Foundation models trained via vision-language pretraining have demonstrated strong zero-shot capabilities across diverse image domains, yet their application to volumetric medical imaging re...
- Silhouette-based Gait Foundation Model : Abstract: Gait patterns play a critical role in human identification and healthcare analytics, yet current progress remains constrained by small, narrowly designed models that fail to scale or general...
- Affordance-First Decomposition for Continual Learning in Video-Language Understanding : Abstract: Continual learning for video--language understanding is increasingly important as models face non-stationary data, domains, and query styles, yet prevailing solutions blur what should stay s...
- CAR-Net: A Cascade Refinement Network for Rotational Motion Deblurring under Angle Information Uncertainty : Abstract: We propose a new neural network architecture called CAR-net (CAscade Refinement Network) to deblur images that are subject to rotational motion blur. Our architecture is specifically designe...
- RS-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images : Abstract: Interactive image segmentation(IIS) plays a critical role in generating precise annotations for remote sensing imagery, where objects often exhibit scale variations, irregular boundaries and...
- TrajDiff: End-to-end Autonomous Driving without Perception Annotation : Abstract: End-to-end autonomous driving systems directly generate driving policies from raw sensor inputs. While these systems can extract effective environmental features for planning, relying on aux...
- Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards : Abstract: Recently, Group Relative Policy Optimization (GRPO) has shown promising potential for aligning text-to-image (T2I) models, yet existing GRPO-based methods suffer from two critical limitation...
- Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image Compression : Abstract: Recently, learned image compression methods have made remarkable achievements, some of which have outperformed the traditional image codec VVC. The advantages of learned image compression me...
- Charts Are Not Images: On the Challenges of Scientific Chart Editing : Abstract: Generative models, such as diffusion and autoregressive approaches, have demonstrated impressive capabilities in editing natural images. However, applying these tools to scientific charts re...
- Seeing the Wind from a Falling Leaf : Abstract: A longstanding goal in computer vision is to model motions from videos, while the representations behind motions, i.e. the invisible physical interactions that cause objects to deform and mo...
- The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches : Abstract: Intelligent driving systems are vulnerable to physical adversarial attacks on traffic signs. These attacks can cause misclassification, leading to erroneous driving decisions that compromise...
- DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering : Abstract: This work addresses the scarcity of high-quality, large-scale resources for Japanese Vision-and-Language (V&L) modeling. We present a scalable and reproducible pipeline that integrates large...
- PolarGS: Polarimetric Cues for Ambiguity-Free Gaussian Splatting with Accurate Geometry Recovery : Abstract: Recent advances in surface reconstruction for 3D Gaussian Splatting (3DGS) have enabled remarkable geometric accuracy. However, their performance degrades in photometrically ambiguous region...
- CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid Target : Abstract: The point spread function (PSF) serves as a fundamental descriptor linking the real-world scene to the captured signal, manifesting as camera blur. Accurate PSF estimation is crucial for bot...
- Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding : Abstract: Long video understanding is essential for human-like intelligence, enabling coherent perception and reasoning over extended temporal contexts. While the emerging thinking-with-frames paradig...
- IRPO: Boosting Image Restoration via Post-training GRPO : Abstract: Recent advances in post-training paradigms have achieved remarkable success in high-level generation tasks, yet their potential for low-level vision remains rarely explored. Existing image r...
- PanFlow: Decoupled Motion Control for Panoramic Video Generation : Abstract: Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle...
- AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent : Abstract: There is a growing demand for mobile user interface (UI) automation, driven by its broad applications across industries. With the advent of visual language models (VLMs), GUI automation has ...
- Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting : Abstract: We present Smol-GS, a novel method for learning compact representations for 3D Gaussian Splatting (3DGS). Our approach learns highly efficient encodings in 3D space that integrate both spati...
- Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical Evaluation : Abstract: Cone beam computed tomography (CBCT)-guided puncture has become an established approach for diagnosing and treating early- to mid-stage thoracic tumours, yet the associated radiation exposur...
- Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a revolutionary 3D representation. However, its substantial data size poses a major barrier to widespread adoption. While feed-forward 3DGS compre...
- Quantum-Inspired Spectral Geometry for Neural Operator Equivalence and Structured Pruning : Abstract: The rapid growth of multimodal intelligence on resource-constrained and heterogeneous domestic hardware exposes critical bottlenecks: multimodal feature heterogeneity, real-time requirements...
- HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics : Abstract: Hand-object interaction (HOI) inherently involves dynamics where human manipulations produce distinct spatio-temporal effects on objects. However, existing semantic HOI benchmarks focused ei...
- Multilingual Training-Free Remote Sensing Image Captioning : Abstract: Remote sensing image captioning has advanced rapidly through encoder--decoder models, although the reliance on large annotated datasets and the focus on English restricts global applicabilit...
- Accelerating Streaming Video Large Language Models via Hierarchical Token Compression : Abstract: Streaming Video Large Language Models (VideoLLMs) have demonstrated impressive performance across various video understanding tasks, but they face significant challenges in real-time deploym...
- SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead : Abstract: Vision-Language-Action (VLA) models built on pretrained Vision-Language Models (VLMs) show strong potential but are limited in practicality due to their large parameter counts. To mitigate t...
- TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model : Abstract: Recent advancements in diffusion models have significantly improved the realism and generalizability of character-driven animation, enabling the synthesis of high-quality motion from just a ...
- Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision : Abstract: Panoramic cameras, capable of capturing a 360-degree field of view, are crucial in robotic vision, particularly in environments with sparse features. However, non-upright panoramas due to un...
- LAHNet: Local Attentive Hashing Network for Point Cloud Registration : Abstract: Most existing learning-based point cloud descriptors for point cloud registration focus on perceiving local information of point clouds to generate distinctive features. However, a reasonabl...
- SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding : Abstract: Grounding complex, compositional visual queries with multiple objects and relationships is a fundamental challenge for vision-language models. While standard phrase grounding methods excel a...
- Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation : Abstract: 3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation method...
- Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval : Abstract: In the domain of moment retrieval, accurately identifying temporal segments within videos based on natural language queries remains challenging. Traditional methods often employ pre-trained ...
- Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction : Abstract: Generalized robots must learn from diverse, large-scale human-object interactions (HOI) to operate robustly in the real world. Monocular internet videos offer a nearly limitless and readily ...
- PhotoFramer: Multi-modal Image Composition Instruction : Abstract: Composition matters during the photo-taking process, yet many casual users struggle to frame well-composed images. To provide composition guidance, we introduce PhotoFramer, a multi-modal co...
- S2AM3D: Scale-controllable Part Segmentation of 3D Point Cloud : Abstract: Part-level point cloud segmentation has recently attracted significant attention in 3D computer vision. Nevertheless, existing research is constrained by two major challenges: native 3D mode...
- LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View Consistency : Abstract: Text-driven 3D reconstruction demands a mask generator that simultaneously understands open-vocabulary instructions and remains consistent across viewpoints. We present LISA-3D, a two-stage ...
- Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model : Abstract: Recovering pixel-wise geometric properties from a single image is fundamentally ill-posed due to appearance ambiguity and non-injective mappings between 2D observations and 3D structures. Wh...
- TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models : Abstract: Vision-language models (VLMs) have made great strides in addressing temporal understanding tasks, which involve characterizing visual changes across a sequence of images. However, recent wor...
- Accelerating Inference of Masked Image Generators via Reinforcement Learning : Abstract: Masked Generative Models (MGM)s demonstrate strong capabilities in generating high-fidelity images. However, they need many sampling steps to create high-quality generations, resulting in sl...
- Learning Eigenstructures of Unstructured Data Manifolds : Abstract: We introduce a novel framework that directly learns a spectral basis for shape and manifold analysis from unstructured data, eliminating the need for traditional operator selection, discreti...
- Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis : Abstract: The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inte...
- OmniFD: A Unified Model for Versatile Face Forgery Detection : Abstract: Face forgery detection encompasses multiple critical tasks, including identifying forged images and videos and localizing manipulated regions and temporal segments. Current approaches typica...
- Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network : Abstract: Micro-facial expressions are brief and involuntary facial movements that reflect genuine emotional states. While most prior work focuses on classifying discrete micro-expression categories, ...
- VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering : Abstract: Monocular 3D object detection is a fundamental yet challenging task in 3D scene understanding. Existing approaches heavily depend on supervised learning with extensive 3D annotations, which ...
- TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image : Abstract: Generating high-fidelity, physically interactive 3D simulated tabletop scenes is essential for embodied AI--especially for robotic manipulation policy learning and data synthesis. However, c...
- PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards : Abstract: Personalized generation models for a single subject have demonstrated remarkable effectiveness, highlighting their significant potential. However, when extended to multiple subjects, existin...
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition : Abstract: Table recognition (TR) aims to transform table images into semi-structured representations such as HTML or Markdown. As a core component of document parsing, TR has long relied on supervised...
- ViscNet: Vision-Based In-line Viscometry for Fluid Mixing Process : Abstract: Viscosity measurement is essential for process monitoring and autonomous laboratory operation, yet conventional viscometers remain invasive and require controlled laboratory environments tha...
- nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image Analysis : Abstract: Retinal imaging is a critical, non-invasive modality for the early detection and monitoring of ocular and systemic diseases. Deep learning, particularly convolutional neural networks (CNNs),...
- Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI : Abstract: Acoustic sonar image analysis plays a critical role in object detection and classification, with applications in both civilian and defense domains. Despite the availability of real and synth...
- EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly : Abstract: Real-time 3D reconstruction is a fundamental task in computer graphics. Recently, differentiable-rendering-based SLAM system has demonstrated significant potential, enabling photorealistic s...
- TBT-Former: Learning Temporal Boundary Distributions for Action Localization : Abstract: Temporal Action Localization (TAL) remains a fundamental challenge in video understanding, aiming to identify the start time, end time, and category of all action instances within untrimmed ...
- GermanPartiesQA: Benchmarking Commercial Large Language Models and AI Companions for Political Alignment and Sycophancy : Abstract: Large language models (LLMs) are increasingly shaping citizens' information ecosystems. Products incorporating LLMs, such as chatbots and AI Companions, are now widely used for decision supp...
- DL-CapsNet: A Deep and Light Capsule Network : Abstract: Capsule Network (CapsNet) is among the promising classifiers and a possible successor of the classifiers built based on Convolutional Neural Network (CNN). CapsNet is more accurate than CNNs...
- ProvRain: Rain-Adaptive Denoising and Vehicle Detection via MobileNet-UNet and Faster R-CNN : Abstract: Provident vehicle detection has a lot of scope in the detection of vehicle during night time. The extraction of features other than the headlamps of vehicles allows us to detect oncoming veh...
- Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation : Abstract: With the rapid progress in diffusion models, image synthesis has advanced to the stage of zero-shot image-to-image generation, where high-fidelity replication of facial identities or artisti...
- Conceptual Evaluation of Deep Visual Stereo Odometry for the MARWIN Radiation Monitoring Robot in Accelerator Tunnels : Abstract: The MARWIN robot operates at the European XFEL to perform autonomous radiation monitoring in long, monotonous accelerator tunnels where conventional localization approaches struggle. Its cur...
- Exploring Diagnostic Prompting Approach for Multimodal LLM-based Visual Complexity Assessment: A Case Study of Amazon Search Result Pages : Abstract: This study investigates whether diagnostic prompting can improve Multimodal Large Language Model (MLLM) reliability for visual complexity assessment of Amazon Search Results Pages (SRP). We ...
- Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs : Abstract: Monocular depth estimation (MDE) plays a crucial role in enabling spatially-aware applications in Ultra-low-power (ULP) Internet-of-Things (IoT) platforms. However, the limited number of par...
- Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data : Abstract: Observation of classroom interactions can provide concrete feedback to teachers, but current methods rely on manual annotation, which is resource-intensive and hard to scale. This work explo...
- TeleViT1.0: Teleconnection-aware Vision Transformers for Subseasonal to Seasonal Wildfire Pattern Forecasts : Abstract: Forecasting wildfires weeks to months in advance is difficult, yet crucial for planning fuel treatments and allocating resources. While short-term predictions typically rely on local weather...
- Deep Filament Extraction for 3D Concrete Printing : Abstract: The architecture, engineering and construction (AEC) industry is constantly evolving to meet the demand for sustainable and effective design and construction of the built environment. In the...
- TinyViT: Field Deployable Transformer Pipeline for Solar Panel Surface Fault and Severity Screening : Abstract: Sustained operation of solar photovoltaic assets hinges on accurate detection and prioritization of surface faults across vast, geographically distributed modules. While multi modal imaging ...
- Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition : Abstract: Cutmix-based data augmentation, which uses a cut-and-paste strategy, has shown remarkable generalization capabilities in deep learning. However, existing methods primarily consider global se...
- Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting : Abstract: Breast cancer is one of the leading causes of death among women worldwide. We introduce Mammo-FM, the first foundation model specifically for mammography, pretrained on the largest and most ...
- ReactionMamba: Generating Short &Long Human Reaction Sequences : Abstract: We present ReactionMamba, a novel framework for generating long 3D human reaction motions. Reaction-Mamba integrates a motion VAE for efficient motion encoding with Mamba-based state-space m...
- Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views : Abstract: We present Relightable Holoported Characters (RHC), a novel person-specific method for free-view rendering and relighting of full-body and highly dynamic humans solely observed from sparse-v...
- UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations : Abstract: Sparse annotations fundamentally constrain multimodal remote sensing: even recent state-of-the-art supervised methods such as MSFMamba are limited by the availability of labeled data, restri...
- HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud Reconstruction : Abstract: We present the first geometric deep learning framework based on point cloud representation for 3D four-chamber cardiac reconstruction from cine MRI data. This work addresses a long-standing ...
- Rethinking Lung Cancer Screening: AI Nodule Detection and Diagnosis Outperforms Radiologists, Leading Models, and Standards Beyond Size and Growth : Abstract: Early detection of malignant lung nodules is critical, but its dependence on size and growth in screening inherently delays diagnosis. We present an AI system that redefines lung cancer scre...
- TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion : Abstract: Embodied 3D Semantic Scene Completion (SSC) infers dense geometry and semantics from continuous egocentric observations. Most existing Gaussian-based methods rely on random initialization of...
- Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation : Abstract: Dataset distillation seeks to synthesize a compact distilled dataset, enabling models trained on it to achieve performance comparable to models trained on the full dataset. Recent methods fo...
- ART-ASyn: Anatomy-aware Realistic Texture-based Anomaly Synthesis Framework for Chest X-Rays : Abstract: Unsupervised anomaly detection aims to identify anomalies without pixel-level annotations. Synthetic anomaly-based methods exhibit a unique capacity to introduce controllable irregularities ...
- Odometry Without Correspondence from Inertially Constrained Ruled Surfaces : Abstract: Visual odometry techniques typically rely on feature extraction from a sequence of images and subsequent computation of optical flow. This point-to-point correspondence between two consecuti...
- MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection : Abstract: The rapid advancement of AI-generated multimodal video-audio content has raised significant concerns regarding information security and content authenticity. Existing synthetic video dataset...
- Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models : Abstract: Vision-language pretrained models (VLPs) such as CLIP have achieved remarkable success, but are also highly vulnerable to backdoor attacks. Given a model fine-tuned by an untrusted third par...
- mmPred: Radar-based Human Motion Prediction in the Dark : Abstract: Existing Human Motion Prediction (HMP) methods based on RGB-D cameras are sensitive to lighting conditions and raise privacy concerns, limiting their real-world applications such as firefigh...
- SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction : Abstract: With intelligent room-side sensing and service robots widely deployed, human motion prediction (HMP) is essential for safe, proactive assistance. However, many existing HMP methods either pr...
- MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters : Abstract: Multimodal remote sensing object detection aims to achieve more accurate and robust perception under challenging conditions by fusing complementary information from different modalities. How...
- THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering : Abstract: Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However...
- POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models : Abstract: The Inversion-Denoising Paradigm, which is based on diffusion models, excels in diverse image editing and restoration tasks. We revisit its mechanism and reveal a critical, overlooked factor...
- Pore-scale Image Patch Dataset and A Comparative Evaluation of Pore-scale Facial Features : Abstract: The weak-texture nature of facial skin regions presents significant challenges for local descriptor matching in applications such as facial motion analysis and 3D face reconstruction. Althou...
- EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation : Abstract: Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a lear...
- WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing : Abstract: Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for ...
- Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction : Abstract: Integrating segmentation into Multimodal Large Language Models (MLLMs) presents a core trilemma: simultaneously preserving dialogue ability, achieving high segmentation performance, and ensu...
- SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control : Abstract: Artistic font generation (AFG) can assist human designers in creating innovative artistic fonts. However, most previous studies primarily focus on 2D artistic fonts in flat design, leaving p...
- PhysGen: Physically Grounded 3D Shape Generation for Industrial Design : Abstract: Existing generative models for 3D shapes can synthesize high-fidelity and visually plausible shapes. For certain classes of shapes that have undergone an engineering design process, the real...
- Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali : Abstract: Public transport in sub-Saharan Africa (SSA) often operates in overcrowded conditions where existing automated systems fail to capture reliable passenger flow data. Leveraging onboard CCTV a...
- What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards : Abstract: Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-reveali...
- Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano Banana : Abstract: We trained a classifier with synthetic chest X-ray (CXR) images generated by Nano Banana, the latest AI model for image generation and editing, released by Google. When directly applied to r...
- Structured Context Learning for Generic Event Boundary Detection : Abstract: Generic Event Boundary Detection (GEBD) aims to identify moments in videos that humans perceive as event boundaries. This paper proposes a novel method for addressing this task, called Struc...
- Learning What Helps: Task-Aligned Context Selection for Vision Tasks : Abstract: Humans often resolve visual uncertainty by comparing an image with relevant examples, but ViTs lack the ability to identify which examples would improve their predictions. We present Task-Al...
- CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration : Abstract: High-quality 3D scene generation from a single image is crucial for AR/VR and embodied AI applications. Early approaches struggle to generalize due to reliance on specialized models trained ...
- Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern Matching : Abstract: Low-cost mobile rovers often operate on uneven terrain where small bumps or tilts are difficult to perceive visually but can significantly affect locomotion stability. To address this proble...
- Image Generation as a Visual Planner for Robotic Manipulation : Abstract: Generating realistic robotic manipulation videos is an important step toward unifying perception, planning, and action in embodied agents. While existing video diffusion models require large...
- Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update : Abstract: Maintaining consistent 3D scene representations over time is a significant challenge in computer vision. Updating 3D scenes from sparse-view observations is crucial for various real-world ap...
- SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning : Abstract: The widespread misuse of image generation technologies has raised security concerns, driving the development of AI-generated image detection methods. However, generalization has become a key...
- Asset-Driven Sematic Reconstruction of Dynamic Scene with Multi-Human-Object Interactions : Abstract: Real-world human-built environments are highly dynamic, involving multiple humans and their complex interactions with surrounding objects. While 3D geometry modeling of such scenes is crucia...
- A Comparison of Human and ChatGPT Classification Performance on Complex Social Media Data : Abstract: Generative artificial intelligence tools, like ChatGPT, are an increasingly utilized resource among computational social scientists. Nevertheless, there remains space for improved understand...
- FastPOS: Language-Agnostic Scalable POS Tagging Framework Low-Resource Use Case : Abstract: This study proposes a language-agnostic transformer-based POS tagging framework designed for low-resource languages, using Bangla and Hindi as case studies. With only three lines of framewor...
- Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation : Abstract: Token sampling strategies critically influence text generation quality in large language models (LLMs). However, existing methods introduce additional hyperparameters, requiring extensive tu...
- WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models : Abstract: Watermarking acts as a critical safeguard in text generated by Large Language Models (LLMs). By embedding identifiable signals into model outputs, watermarking enables reliable attribution a...
- Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios : Abstract: Reliable reward models (RMs) are critical for ensuring the safe alignment of large language models (LLMs). However, current evaluation methods focus solely on preference perception accuracie...
- Dr.Mi-Bench: A Modular-integrated Benchmark for Scientific Deep Research Agent : Abstract: The explosive growth in academic literature necessitates automated deep research (DR) agents, yet their evaluation remains a significant challenge. First, existing benchmarks often focus nar...
- Advancing Academic Chatbots: Evaluation of Non Traditional Outputs : Abstract: Most evaluations of large language models focus on standard tasks such as factual question answering or short summarization. This research expands that scope in two directions: first, by com...
- ELR-1000: A Community-Generated Dataset for Endangered Indic Indigenous Languages : Abstract: We present a culturally-grounded multimodal dataset of 1,060 traditional recipes crowdsourced from rural communities across remote regions of Eastern India, spanning 10 endangered languages....
- How do we measure privacy in text? A survey of text anonymization metrics : Abstract: In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP researc...
- DrawingBench: Evaluating Spatial Reasoning and UI Interaction Capabilities of Large Language Models through Mouse-Based Drawing Tasks : Abstract: As agentic AI systems increasingly operate autonomously, establishing trust through verifiable evaluation becomes critical. Yet existing benchmarks lack the transparency and auditability nee...
- Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks : Abstract: Specialized clinical AI assistants are rapidly entering medical practice, often framed as safer or more reliable than general-purpose large language models (LLMs). Yet, unlike frontier model...
- Conveying Imagistic Thinking in Traditional Chinese Medicine Translation: A Prompt Engineering and LLM-Based Evaluation Framework : Abstract: Traditional Chinese Medicine (TCM) theory is built on imagistic thinking, in which medical principles and diagnostic and therapeutic logic are structured through metaphor and metonymy. Howev...
- MARSAD: A Multi-Functional Tool for Real-Time Social Media Analysis : Abstract: MARSAD is a multifunctional natural language processing (NLP) platform designed for real-time social media monitoring and analysis, with a particular focus on the Arabic-speaking world. It e...
- DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis : Abstract: Understanding sentiment in complex textual expressions remains a fundamental challenge in affective computing. To address this, we propose a Dynamic Fusion Learning Model (DyFuLM), a multimo...
- Multilingual Conversational AI for Financial Assistance: Bridging Language Barriers in Indian FinTech : Abstract: India's linguistic diversity presents both opportunities and challenges for fintech platforms. While the country has 31 major languages and over 100 minor ones, only 10\% of the population u...
- MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages : Abstract: Multimodal Large Language Models (MLLMs) have achieved great success in Speech-to-Text Translation (S2TT) tasks. However, current research is constrained by two key challenges: language cove...
- Language Diversity: Evaluating Language Usage and AI Performance on African Languages in Digital Spaces : Abstract: This study examines the digital representation of African languages and the challenges this presents for current language detection tools. We evaluate their performance on Yoruba, Kinyarwand...
- MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark : Abstract: Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally...
- MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications : Abstract: Large Language Models (LLMs) excel at generating coherent text within a single prompt but fall short in sustaining relevance, personalization, and continuity across extended interactions. Hu...
- Self-Supervised Borrowing Detection on Multilingual Wordlists : Abstract: This paper presents a fully self-supervised approach to borrowing detection in multilingual wordlists. The method combines two sources of information: PMI similarities based on a global corr...
- Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks : Abstract: Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks that require generating comprehensive and diverse ans...
- Reasoning About the Unsaid: Misinformation Detection with Omission-Aware Graph Inference : Abstract: This paper investigates the detection of misinformation, which deceives readers by explicitly fabricating misleading content or implicitly omitting important information necessary for inform...
- Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability : Abstract: Large reasoning models (LRMs) extend large language models by generating explicit chain-of-thought (CoT) reasoning, significantly improving mathematical and logical problem solving. However,...
- OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation : Abstract: Online Public Opinion Reports consolidate news and social media for timely crisis management by governments and enterprises. While large language models have made automated report generation...
- Latent Debate: A Surrogate Framework for Interpreting LLM Thinking : Abstract: Understanding the internal thinking process of Large Language Models (LLMs) and the cause of hallucinations remains a key challenge. To this end, we introduce latent debate, a novel framewor...
- How Far Are We from Genuinely Useful Deep Research Agents? : Abstract: Deep Research Agents (DRAs) aim to automatically produce analyst-level reports through iterative information retrieval and synthesis. However, most existing DRAs were validated on question-a...
- The Art of Scaling Test-Time Compute for Large Language Models : Abstract: Test-time scaling (TTS) -- the dynamic allocation of compute during inference -- is a promising direction for improving reasoning in large language models (LLMs). However, a systematic compa...
- VeriPy - A New Python-Based Approach for SDR Pipelined/Unrolled Hardware Accelerator Generation : Abstract: Software-defined radio (SDR) plays an important role in the communication field by providing a flexible and customized communication system for different purposes according to the needs. To ...
- Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation : Abstract: Document chunking is a crucial component of Retrieval-Augmented Generation (RAG), as it directly affects the retrieval of relevant and precise context. Conventional fixed-length and recursiv...
- Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency : Abstract: Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerab...
- Bias Testing and Mitigation in Black Box LLMs using Metamorphic Relations : Abstract: The widespread deployment of Large Language Models (LLMs) has intensified concerns about subtle social biases embedded in their outputs. Existing guardrails often fail when faced with indire...
- Generalized Medical Phrase Grounding : Abstract: Medical phrase grounding (MPG) maps textual descriptions of radiological findings to corresponding image regions. These grounded reports are easier to interpret, especially for non-experts. ...
- Large Language Models Cannot Reliably Detect Vulnerabilities in JavaScript: The First Systematic Benchmark and Evaluation : Abstract: Researchers have proposed numerous methods to detect vulnerabilities in JavaScript, especially those assisted by Large Language Models (LLMs). However, the actual capability of LLMs in JavaS...
- BackportBench: A Multilingual Benchmark for Automated Backporting of Patches : Abstract: Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use ol...
- DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 : Abstract: The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability an...
- Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts : Abstract: Whisper is a multitask and multilingual speech model covering 99 languages. It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the mod...
- Extending Multilingual Machine Translation through Imitation Learning : Abstract: Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to ext...
- Koopman operators with intrinsic observables in rigged reproducing kernel Hilbert spaces : Abstract: This paper presents a novel approach for estimating the Koopman operator defined on a reproducing kernel Hilbert space (RKHS) and its spectra. We propose an estimation method, what we call J...
- GuideGen: A Text-Guided Framework for Paired Full-torso Anatomy and CT Volume Generation : Abstract: The recently emerging conditional diffusion models seem promising for mitigating the labor and expenses in building large 3D medical imaging datasets. However, previous studies on 3D CT gene...
- Finite Operator Learning: Bridging Neural Operators and Numerical Methods for Efficient Parametric Solution and Optimization of PDEs : Abstract: We introduce a method that combines neural operators, physics-informed machine learning, and standard numerical methods for solving PDEs. The proposed approach extends each of the aforementi...
- Hi-EF: Benchmarking Emotion Forecasting in Human-interaction : Abstract: Affective Forecasting is an psychology task that involves predicting an individual's future emotional responses, often hampered by reliance on external factors leading to inaccuracies, and t...
- What can we learn about Reionization astrophysical parameters using Gaussian Process Regression? : Abstract: Reionization is one of the least understood processes in the evolution history of the Universe, mostly because of the numerous astrophysical processes occurring simultaneously about which we...
- Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering : Abstract: Model Inversion Attacks (MIAs) pose a significant threat to data privacy by reconstructing sensitive training samples from the knowledge embedded in trained machine learning models. Despite ...
- Heterogeneous transfer learning for high-dimensional regression with feature mismatch : Abstract: We consider Heterogeneous Transfer Learning (HTL) from a source to a new target domain for high-dimensional regression with differing feature sets. Most homogeneous TL methods assume that ta...
- Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects : Abstract: Recent technological advances have made it easier to collect large and complex networks of time-stamped relational events connecting two or more entities. Relational hyper-event models (RHEM...
- On detection probabilities of link invariants : Abstract: We prove that the detection rate of n-crossing alternating links by many standard link invariants decays exponentially in n, implying that they detect alternating links with probability zero...
- Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis : Abstract: Empirical grammar research has become increasingly data-driven, but the systematic analysis of annotated corpora still requires substantial methodological and technical effort. We explore ho...
- Minimal-Edit Instruction Tuning for Low-Resource Indic GEC : Abstract: Grammatical error correction for Indic languages faces limited supervision, diverse scripts, and rich morphology. We propose an augmentation-free setup that uses instruction-tuned large lang...
- Lost without translation -- Can transformer (language models) understand mood states? : Abstract: Background: Large Language Models show promise in psychiatry but are English-centric. Their ability to understand mood states in other languages is unclear, as different languages have their...
- IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages : Abstract: While large language models excel on high-resource multilingual tasks, low- and extremely low-resource Indic languages remain severely under-evaluated. We present IndicParam, a human-curated...
- CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA : Abstract: We study timestamped question answering over educational lecture videos under a single-GPU latency/memory budget. Given a natural-language query, the system retrieves relevant timestamped se...
- Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing : Abstract: Recent research has explored LLMs as scalable tools for relevance labeling, but studies indicate they are susceptible to priming effects, where prior relevance judgments influence later ones...
- A Taxonomy of Errors in English as she is spoke: Toward an AI-Based Method of Error Analysis for EFL Writing Instruction : Abstract: This study describes the development of an AI-assisted error analysis system designed to identify, categorize, and correct writing errors in English. Utilizing Large Language Models (LLMs) l...
- CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency : Abstract: This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate the real-world capabilities of Large Language Model (LLM) agents in the uniquel...
- Developing a Comprehensive Framework for Sentiment Analysis in Turkish : Abstract: In this thesis, we developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish. We have also proposed several approaches specific t...
- Catch Me If You Can: How Smaller Reasoning Models Pretend to Reason with Mathematical Fidelity : Abstract: Current evaluation of mathematical reasoning in language models relies primarily on answer accuracy, potentially masking fundamental failures in logical computation. We introduce a diagnosti...
- Prism: A Minimal Compositional Metalanguage for Specifying Agent Behavior : Abstract: Prism is a small, compositional metalanguage for specifying the behaviour of tool-using software agents. Rather than introducing ad hoc control constructs, Prism is built around a fixed core...
- Sycophancy Claims about Language Models: The Missing Human-in-the-Loop : Abstract: Sycophantic response patterns in Large Language Models (LLMs) have been increasingly claimed in the literature. We review methodological challenges in measuring LLM sycophancy and identify f...
- Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs : Abstract: Large Language Models have rapidly advanced in their ability to interpret and generate natural language. In enterprise settings, they are frequently augmented with closed-source domain knowl...
- ECO: Energy-Constrained Operator Learning for Chaotic Dynamics with Boundedness Guarantees : Abstract: Chaos is a fundamental feature of many complex dynamical systems, including weather systems and fluid turbulence. These systems are inherently difficult to predict due to their extreme sensi...
- A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry : Abstract: Study Objectives: Wrist accelerometry is widely used for inferring sleep-wake state. Previous works demonstrated poor wake detection, without cross-device generalizability and validation in ...
- Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling : Abstract: As large language models have grown larger, low-precision numerical formats such as NVFP4 have become increasingly popular due to the speed and memory benefits they provide. However, to acce...
- Improved Mean Flows: On the Challenges of Fastforward Generative Models : Abstract: MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and ...
- Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks : Abstract: Federated Learning (FL) is a collaborative machine learning (ML) framework that combines on-device training and server-based aggregation to train a common ML model among distributed agents. ...
- Deep sub-ensembles meets quantile regression: uncertainty-aware imputation for time series : Abstract: Real-world time series data often exhibits substantial missing values, posing challenges for advanced analysis. A common approach to addressing this issue is imputation, where the primary ch...
- Accelerating data-driven algorithm selection for combinatorial partitioning problems : Abstract: In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an u...
- LCEN: A Nonlinear, Interpretable Feature Selection and Machine Learning Algorithm : Abstract: Interpretable models can have advantages over black-box models, and interpretability is essential for the application of machine learning in critical settings, such as aviation or medicine. ...
- STC-ViT: Spatio Temporal Continuous Vision Transformer for Medium-range Global Weather Forecasting : Abstract: Operational Numerical Weather Prediction (NWP) system relies on computationally expensive physics-based models. Recently, transformer models have shown remarkable potential in weather foreca...
- Spectral Convolutional Conditional Neural Processes : Abstract: Neural processes (NPs) are probabilistic meta-learning models that map sets of observations to posterior predictive distributions, enabling inference at arbitrary domain points. Their capaci...
- CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival Analysis : Abstract: The Cox Proportional Hazards (CPH) model has long been the preferred survival model for its explainability. However, to increase its predictive power beyond its linear log-risk, it was exten...
- Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System : Abstract: Effective feature selection is essential for optimizing contextual multi-armed bandits (CMABs) in large-scale online systems, where suboptimal features can degrade rewards, interpretability,...
- Skewed Neuronal Heterogeneity Enhances Efficiency On Various Computing Systems : Abstract: Heterogeneity is a ubiquitous property of many biological systems and has profound implications for computation. While it is conceivable to optimize neuronal and synaptic heterogeneity for a...
- Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It : Abstract: The success of federated learning (FL) ultimately depends on how strategic participants behave under partial observability, yet most formulations still treat FL as a static optimization prob...
- Exploring Variational Graph Autoencoders for Distribution Grid Data Generation : Abstract: To address the lack of public power system data for machine learning research in energy networks, we investigate the use of variational graph autoencoders (VGAEs) for synthetic distribution ...
- Event2Vec: A Geometric Approach to Learning Composable Representations of Event Sequences : Abstract: The study of neural representations, both in biological and artificial systems, is increasingly revealing the importance of geometric and topological structures. Inspired by this, we introdu...
- How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data : Abstract: The growing adoption of spectrum-aware matrix-valued optimizers such as Muon and Shampoo in deep learning motivates a systematic study of their generalization properties and, in particular, ...
- Families of costs with zero and nonnegative MTW tensor in optimal transport and the c-divergences : Abstract: We study the information geometry of $\bcc$-divergences from families of costs of the form $\mathsf{c}(x, \barx) =\mathsf{u}(x^{\mathfrak{t}}\barx)$ through the optimal transport point of vi...
- Non-Negative Matrix Factorization Using Non-Von Neumann Computers : Abstract: Non-negative matrix factorization (NMF) is a matrix decomposition problem with applications in unsupervised learning. The general form of this problem (along with many of its variants) is NP...
- Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges : Abstract: Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same p...
- Fragmentation is Efficiently Learnable by Quantum Neural Networks : Abstract: Hilbert space fragmentation is a phenomenon in which the Hilbert space of a quantum system is dynamically decoupled into exponentially many Krylov subspaces. We can define the Schur transfor...
- FC-ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity : Abstract: Microservices have transformed software architecture through the creation of modular and independent services. However, they introduce operational complexities in service integration and sys...
- Hierarchical Semantic Alignment for Image Clustering : Abstract: Image clustering is a classic problem in computer vision, which categorizes images into different groups. Recent studies utilize nouns as external semantic knowledge to improve clus- tering ...
- Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression : Abstract: We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimato...
- Thompson Sampling for Multi-Objective Linear Contextual Bandit : Abstract: We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first}...
- Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis : Abstract: Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical threat to LLM-powered agents. In this paper,...
- MM-ACT: Learn from Multimodal Parallel Generation to Act : Abstract: A generalist robotic policy needs both semantic understanding for task planning and the ability to interact with the environment through predictive capabilities. To tackle this, we present M...
- Sleep Apnea Detection on a Wireless Multimodal Wearable Device Without Oxygen Flow Using a Mamba-based Deep Learning Approach : Abstract: Objectives: We present and evaluate a Mamba-based deep-learning model for diagnosis and event-level characterization of sleep disordered breathing based on signals from the ANNE One, a non-i...
- Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI : Abstract: Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers p...
- The Silence that Speaks: Neural Estimation via Communication Gaps : Abstract: Accurate remote state estimation is a fundamental component of many autonomous and networked dynamical systems, where multiple decision-making agents interact and communicate over shared, ba...
- Building Trustworthy AI for Materials Discovery: From Autonomous Laboratories to Z-scores : Abstract: Accelerated material discovery increasingly relies on artificial intelligence and machine learning, collectively termed "AI/ML". A key challenge in using AI is ensuring that human scientists...
- Neural Variable Name Repair: Learning to Rename Identifiers for Readability : Abstract: Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, inc...
- High-dimensional Mean-Field Games by Particle-based Flow Matching : Abstract: Mean-field games (MFGs) study the Nash equilibrium of systems with a continuum of interacting agents, which can be formulated as the fixed-point of optimal control problems. They provide a u...
- The Evolution of Learning Algorithms for Artificial Neural Networks : Abstract: In this paper we investigate a neural network model in which weights between computational nodes are modified according to a local learning rule. To determine whether local learning rules ar...
- Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations : Abstract: As a variant of the Area Under the ROC Curve (AUC), the partial AUC (PAUC) focuses on a specific range of false positive rate (FPR) and/or true positive rate (TPR) in the ROC curve. It is a ...
- Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics : Abstract: Many online learning algorithms, including classical online PCA methods, enforce explicit normalization steps that discard the evolving norm of the parameter vector. We show that this norm c...
- Bayesian Optimization for Non-Cooperative Game-Based Radio Resource Management : Abstract: Radio resource management in modern cellular networks often calls for the optimization of complex utility functions that are potentially conflicting between different base stations (BSs). Co...
- Samplability makes learning easier : Abstract: The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to sampl...
- Experimental Methods, Health Indicators, and Diagnostic Strategies for Retired Lithium-ion Batteries: A Comprehensive Review : Abstract: Reliable health assessment of retired lithium-ion batteries is essential for safe and economically viable second-life deployment, yet remains difficult due to sparse measurements, incomplete...
- Securing Large Language Models (LLMs) from Prompt Injection Attacks : Abstract: Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's in...
- Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI : Abstract: Stroke is a major cause of death and permanent impairment, making it a major worldwide health concern. For prompt intervention and successful preventative tactics, early risk assessment is e...
- Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1 : Abstract: This paper presents a modality-augmented fine-tuning framework designed to adapt foundation robot policies to diverse humanoid embodiments. We validate our approach across two distinct setti...
- SocialDriveGen: Generating Diverse Traffic Scenarios with Controllable Social Interactions : Abstract: The generation of realistic and diverse traffic scenarios in simulation is essential for developing and evaluating autonomous driving systems. However, most simulation frameworks rely on rul...
- Modeling Wavelet Transformed Quantum Support Vector for Network Intrusion Detection : Abstract: Network traffic anomaly detection is a critical cy- bersecurity challenge requiring robust solutions for complex Internet of Things (IoT) environments. We present a novel hybrid quantum-clas...
- BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud : Abstract: Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a sing...
- Masked Symbol Modeling for Demodulation of Oversampled Baseband Communication Signals in Impulsive Noise-Dominated Channels : Abstract: Recent breakthroughs in natural language processing show that attention mechanism in Transformer networks, trained via masked-token prediction, enables models to capture the semantic context...
- MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification : Abstract: We present Conformer-based decoders for the LibriBrain 2025 PNPL competition, targeting two foundational MEG tasks: Speech Detection and Phoneme Classification. Our approach adapts a compact...
- Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages : Abstract: Limited data for low-resource languages typically yield weaker language models (LMs). Since pre-training is compute-intensive, it is more pragmatic to target improvements during fine-tuning....
- hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware : Abstract: We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrat...
- Heuristic algorithms for the stochastic critical node detection problem : Abstract: Given a network, the critical node detection problem finds a subset of nodes whose removal disrupts the network connectivity. Since many real-world systems are naturally modeled as graphs, a...
- Learning Reduced Representations for Quantum Classifiers : Abstract: Data sets that are specified by a large number of features are currently outside the area of applicability for quantum machine learning algorithms. An immediate solution to this impasse is t...
- Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image Segmentation : Abstract: We tackle the challenging problem of single-source domain generalization (DG) for medical image segmentation. To this end, we aim for training a network on one domain (e.g., CT) and directly...
- Neural Networks for Predicting Permeability Tensors of 2D Porous Media: Comparison of Convolution- and Transformer-based Architectures : Abstract: Permeability is a central concept in the macroscopic description of flow through porous media, with applications spanning from oil recovery to hydrology. Traditional methods for determining ...
- Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios : Abstract: Cardiovascular disease (CVD) is a leading cause of morbidity and mortality worldwide, and sustained hypertension is an often silent risk factor, making cuffless continuous blood pressure (BP...
- Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions : Abstract: Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act st...
- Differentially Private and Federated Structure Learning in Bayesian Networks : Abstract: Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication co...
- Common Structure Discovery in Collections of Bipartite Networks: Application to Pollination Systems : Abstract: Bipartite networks are widely used to encode the ecological interactions. Being able to compare the organization of bipartite networks is a first step toward a better understanding of how en...
- Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks : Abstract: Integrated sensing and communication (ISAC) is a key enabler for low-altitude wireless networks (LAWNs), providing simultaneous environmental perception and data transmission in complex aeri...
- GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation : Abstract: We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming ...
- Much Ado About Noising: Dispelling the Myths of Generative Robotic Control : Abstract: Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underly...
- Decision Tree Embedding by Leaf-Means : Abstract: Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretabi...
- Dimension-free error estimate for diffusion model and optimal scheduling : Abstract: Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of ...
- Fundamentals of Regression : Abstract: This chapter opens with a review of classic tools for regression, a subset of machine learning that seeks to find relationships between variables. With the advent of scientific machine learn...
- A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data : Abstract: Water quality monitoring is a core component of ecological environmental protection. However, due to sensor failure or other inevitable factors, data missing often exists in long-term monito...
- Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control : Abstract: We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Con...
- Winning Solutions for the Rayan AI Contest: Compositional Retrieval, Zero-Shot Anomaly Detection, and Backdoor Detection : Abstract: This report presents solutions to three machine learning challenges: compositional image retrieval, zero-shot anomaly detection, and backdoored model detection. In compositional image retrie...
- Walking on the Fiber: A Simple Geometric Approximation for Bayesian Neural Networks : Abstract: Bayesian Neural Networks provide a principled framework for uncertainty quantification by modeling the posterior distribution of network parameters. However, exact posterior inference is com...
- Label Forensics: Interpreting Hard Labels in Black-Box Text Classifier : Abstract: The widespread adoption of natural language processing techniques has led to an unprecedented growth of text classifiers across the modern web. Yet many of these models circulate with their ...
- End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW : Abstract: In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative o...
- TimePred: efficient and interpretable offline change point detection for high volume data - with application to industrial process monitoring : Abstract: Change-point detection (CPD) in high-dimensional, large-volume time series is challenging for statistical consistency, scalability, and interpretability. We introduce TimePred, a self-superv...
- Do Large Language Models Walk Their Talk? Measuring the Gap Between Implicit Associations, Self-Report, and Behavioral Altruism : Abstract: We investigate whether Large Language Models (LLMs) exhibit altruistic tendencies, and critically, whether their implicit associations and self-reports predict actual altruistic behavior. Us...
- Scaling and context steer LLMs along the same computational path as the human brain : Abstract: Recent studies suggest that the representations learned by large language models (LLMs) are partially aligned to those of the human brain. However, whether and why this alignment score arise...
- In-context Inverse Optimality for Fair Digital Twins: A Preference-based approach : Abstract: Digital Twins (DTs) are increasingly used as autonomous decision-makers in complex socio-technical systems. Their mathematically optimal decisions often diverge from human expectations, expo...
- Morphling: Fast, Fused, and Flexible GNN Training at Scale : Abstract: Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While framew...
- A unified framework for geometry-independent operator learning in cardiac electrophysiology simulations : Abstract: Accurate maps of atrial electrical activation are essential for personalised treatment of arrhythmias, yet biophysically detailed simulations remain computationally intensive for real-time c...
- Beyond Scaffold: A Unified Spatio-Temporal Gradient Tracking Method : Abstract: In distributed and federated learning algorithms, communication overhead is often reduced by performing multiple local updates between communication rounds. However, due to data heterogeneit...
- Automating modeling in mechanics: LLMs as designers of physics-constrained neural networks for constitutive modeling of materials : Abstract: Large language model (LLM)-based agentic frameworks increasingly adopt the paradigm of dynamically generating task-specific agents. We suggest that not only agents but also specialized softw...
- MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention : Abstract: A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across...
- SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models : Abstract: Despite advances in the use of large language models (LLMs) in downstream tasks, their ability to memorize information has raised privacy concerns. Therefore, protecting personally identifia...
- Mofasa: A Step Change in Metal-Organic Framework Generation : Abstract: Mofasa is an all-atom latent diffusion model with state-of-the-art performance for generating Metal-Organic Frameworks (MOFs). These are highly porous crystalline materials used to harvest w...
- On the Unreasonable Effectiveness of Last-layer Retraining : Abstract: Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an effi...
- How Does RL Post-training Induce Skill Composition? A Case Study on Countdown : Abstract: While reinforcement learning (RL) successfully enhances reasoning in large language models, its role in fostering compositional generalization (the ability to synthesize novel skills from kn...
- The Active and Noise-Tolerant Strategic Perceptron : Abstract: We initiate the study of active learning algorithms for classifying strategic agents. Active learning is a well-established framework in machine learning in which the learner selectively que...
- DeepCAVE: A Visualization and Analysis Tool for Automated Machine Learning : Abstract: Hyperparameter optimization (HPO), as a central paradigm of AutoML, is crucial for leveraging the full potential of machine learning (ML) models; yet its complexity poses challenges in under...
- Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning : Abstract: Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrit...
- The Mean-Field Dynamics of Transformers : Abstract: We develop a mathematical framework that interprets Transformer attention as an interacting particle system and studies its continuum (mean-field) limits. By idealizing attention continuous ...
- New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles : Abstract: This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles. The framework integrates heterogeneous sensory input, inc...
- Domain-Decomposed Graph Neural Network Surrogate Modeling for Ice Sheets : Abstract: Accurate yet efficient surrogate models are essential for large-scale simulations of partial differential equations (PDEs), particularly for uncertainty quantification (UQ) tasks that demand...
- Elastic Weight Consolidation for Knowledge Graph Continual Learning: An Empirical Evaluation : Abstract: Knowledge graphs (KGs) require continual updates as new information emerges, but neural embedding models suffer from catastrophic forgetting when learning new tasks sequentially. We evaluate...
- Provably Safe Model Updates : Abstract: Safety-critical environments are inherently dynamic. Distribution shifts, emerging vulnerabilities, and evolving requirements demand continuous updates to machine learning models. Yet even b...
- Delays in Spiking Neural Networks: A State Space Model Approach : Abstract: Spiking neural networks (SNNs) are biologically inspired, event-driven models that are suitable for processing temporal data and offer energy-efficient computation when implemented on neurom...
- A Footprint-Aware, High-Resolution Approach for Carbon Flux Prediction Across Diverse Ecosystems : Abstract: Natural climate solutions (NCS) offer an approach to mitigating carbon dioxide (CO2) emissions. However, monitoring the carbon drawdown of ecosystems over large geographic areas remains chal...
- KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference : Abstract: Long-context Large Language Models (LLMs) face significant memory bottlenecks during inference due to the linear growth of key-value (KV) cache with sequence length. While individual optimiz...
- Low-Rank Prehab: Preparing Neural Networks for SVD Compression : Abstract: Low-rank approximation methods such as singular value decomposition (SVD) and its variants (e.g., Fisher-weighted SVD, Activation SVD) have recently emerged as effective tools for neural net...
- Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning : Abstract: Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communicat...
- AlignSAE: Concept-Aligned Sparse Autoencoders : Abstract: Large Language Models (LLMs) encode factual knowledge within hidden parametric spaces that are difficult to inspect or control. While Sparse Autoencoders (SAEs) can decompose hidden activati...
- Efficient Turing Machine Simulation with Transformers : Abstract: Constant bit-size Transformers are known to be Turing complete, but existing constructions require $Ω(s(n))$ chain-of-thought (CoT) steps per simulated Turing machine (TM) step, leading to i...
- SetupKit: Efficient Multi-Corner Setup/Hold Time Characterization Using Bias-Enhanced Interpolation and Active Learning : Abstract: Accurate setup/hold time characterization is crucial for modern chip timing closure, but its reliance on potentially millions of SPICE simulations across diverse process-voltagetemperature (...
- SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators : Abstract: Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these parameters efficiently in traditional ac...
- A Fast and Efficient Modern BERT based Text-Conditioned Diffusion Model for Medical Image Segmentation : Abstract: In recent times, denoising diffusion probabilistic models (DPMs) have proven effective for medical image generation and denoising, and as representation learners for downstream segmentation....
- SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features : Abstract: We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is ...
- Predicting COVID-19 Prevalence Using Wastewater RNA Surveillance: A Semi-Supervised Learning Approach with Temporal Feature Trust : Abstract: As COVID-19 transitions into an endemic disease that remains constantly present in the population at a stable level, monitoring its prevalence without invasive measures becomes increasingly ...
- Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images : Abstract: This work examines how three different image-based methods, VGG16, ViT-B/16, and CoAtNet-Tiny, perform in identifying depression, schizophrenia, and healthy controls using daily actigraphy r...
- Learning with Physical Constraints : Abstract: This chapter provides three tutorial exercises on physics-constrained regression. These are implemented as toy problems that seek to mimic grand challenges in (1) the super-resolution and da...
- Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance : Abstract: Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality lab...
- Comparing Two Proxy Methods for Causal Identification : Abstract: Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We co...
- AutocleanEEG ICVision: Automated ICA Artifact Classification Using Vision-Language AI : Abstract: We introduce EEG Autoclean Vision Language AI (ICVision) a first-of-its-kind system that emulates expert-level EEG ICA component classification through AI-agent vision and natural language r...
- Beyond Expected Goals: A Probabilistic Framework for Shot Occurrences in Soccer : Abstract: Expected goals (xG) models estimate the probability that a shot results in a goal from its context (e.g., location, pressure), but they operate only on observed shots. We propose xG+, a poss...
- Statistical Inference under Adaptive Sampling with LinUCB : Abstract: Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistica...
- DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants : Abstract: Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical D...
- Stochastic Dominance Constrained Optimization with S-shaped Utilities: Poor-Performance-Region Algorithm and Neural Network : Abstract: We investigate the static portfolio selection problem of S-shaped and non-concave utility maximization under first-order and second-order stochastic dominance (SD) constraints. In many S-sha...
- An Interpretable Operator-Learning Model for Electric Field Profile Reconstruction in Discharges Based on the EFISH Method : Abstract: Machine learning (ML) models have recently been used to reconstruct electric field distributions from EFISH signal profiles-the 'inverse EFISH problem'. This addresses the line-of-sight EFIS...
- The Information Theory of Similarity : Abstract: We establish a precise mathematical equivalence between witness-based similarity systems (REWA) and Shannon's information theory. We prove that witness overlap is mutual information, that RE...
- EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants : Abstract: Accurate prediction of enzyme kinetic parameters is crucial for drug discovery, metabolic engineering, and synthetic biology applications. Current computational approaches face limitations i...
- An RKHS Perspective on Tree Ensembles : Abstract: Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictio...
- RECTor: Robust and Efficient Correlation Attack on Tor : Abstract: Tor is a widely used anonymity network that conceals user identities by routing traffic through encrypted relays, yet it remains vulnerable to traffic correlation attacks that deanonymize us...
- A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research : Abstract: Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven mo...
- No-Regret Gaussian Process Optimization of Time-Varying Functions : Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary set...
- Robust Precoding for Resilient Cell-Free Networks : Abstract: This paper presents a robust precoder design for resilient cell-free massive MIMO (CF-mMIMO) systems that minimizes the weighted sum of desired signal mean square error (MSE) and residual in...
- GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding : Abstract: Brain-Computer Interfaces (BCIs) based on Motor Execution (ME) and Motor Imagery (MI) electroencephalogram (EEG) signals offer a direct pathway for human-machine interaction. However, develo...
- Statistical-computational gap in multiple Gaussian graph alignment : Abstract: We investigate the existence of a statistical-computational gap in multiple Gaussian graph alignment. We first generalize a previously established informational threshold from Vassaux and Ma...
- Large Language Models for Software Engineering: A Reproducibility Crisis : Abstract: Reproducibility is a cornerstone of scientific progress, yet its state in large language model (LLM)-based software engineering (SE) research remains poorly understood. This paper presents t...
- Self-sufficient Independent Component Analysis via KL Minimizing Flows : Abstract: We study the problem of learning disentangled signals from data using non-linear Independent Component Analysis (ICA). Motivated by advances in self-supervised learning, we propose to learn ...
- Restricted Block Permutation for Two-Sample Testing : Abstract: We study a structured permutation scheme for two-sample testing that restricts permutations to single cross-swaps between block-selected representatives. Our analysis yields three main resul...
- Efficient Matroid Bandit Linear Optimization Leveraging Unimodality : Abstract: We study the combinatorial semi-bandit problem under matroid constraints. The regret achieved by recent approaches is optimal, in the sense that it matches the lower bound. Yet, time complex...
- Financial Text Classification Based On rLoRA Finetuning On Qwen3-8B model : Abstract: Financial text classification has increasingly become an important aspect in quantitative trading systems and related tasks, such as financial sentiment analysis and the classification of fi...
- Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation : Abstract: We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, r...
- Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks : Abstract: Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating ...
- Flow Matching for Tabular Data Synthesis : Abstract: Synthetic data generation is an important tool for privacy-preserving data sharing. While diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This...
- Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset : Abstract: Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimp...
- Exploiting Function-Family Structure in Analog Circuit Optimization : Abstract: Analog circuit optimization is typically framed as black-box search over arbitrary smooth functions, yet device physics constrains performance mappings to structured families: exponential de...
- Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking : Abstract: Reward models play a critical role in Reinforcement Learning from Human Feedback (RLHF) by assessing the consistency between generated outputs and human preferences. However, conventional re...
- ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering : Abstract: Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data...
- Forecasting India's Demographic Transition Under Fertility Policy Scenarios Using hybrid LSTM-PINN Model : Abstract: Demographic forecasting remains a fundamental challenge for policy planning in rapidly evolving nations such as India, where fertility transitions, policy interventions, and age structured d...
- Text Mining Analysis of Symptom Patterns in Medical Chatbot Conversations : Abstract: The fast growth of digital health systems has led to a need to better comprehend how they interpret and represent patient-reported symptoms. Chatbots have been used in healthcare to provide ...
- AI Agent for Source Finding by SoFiA-2 for SKA-SDC2 : Abstract: Source extraction is crucial in analyzing data from next-generation, large-scale sky surveys in radio bands, such as the Square Kilometre Array (SKA). Several source extraction programs, inc...
- What Is Preference Optimization Doing, How and Why? : Abstract: Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving gre...
- Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment : Abstract: To address the gap in humanoid robot cognitive systems regarding the lack of a time-updable mediating thought space between semantics and continuous control, this study constructs and trains...
- Estimating the Effective Rank of Vision Transformers via Low-Rank Factorization : Abstract: Deep networks are heavily over-parameterized, yet their learned representations often admit low-rank structure. We introduce a framework for estimating a model's intrinsic dimensionality by ...
- Soft Quality-Diversity Optimization : Abstract: Quality-Diversity (QD) algorithms constitute a branch of optimization that is concerned with discovering a diverse and high-quality set of solutions to an optimization problem. Current QD me...
- ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning : Abstract: Large Reasoning Models (LRMs) are Large Language Models (LLMs) explicitly trained to generate long-form Chain-of-Thoughts (CoTs), achieving impressive success on challenging tasks like math ...
- Uncertainty Quantification for Deep Regression using Contextualised Normalizing Flows : Abstract: Quantifying uncertainty in deep regression models is important both for understanding the confidence of the model and for safe decision-making in high-risk domains. Existing approaches that ...
- Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasks : Abstract: Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clie...
- City-Conditioned Memory for Multi-City Traffic and Mobility Forecasting : Abstract: Deploying spatio-temporal forecasting models across many cities is difficult: traffic networks differ in size and topology, data availability can vary by orders of magnitude, and new cities ...
- Robust Probabilistic Load Forecasting for a Single Household: A Comparative Study from SARIMA to Transformers on the REFIT Dataset : Abstract: Probabilistic forecasting is essential for modern risk management, allowing decision-makers to quantify uncertainty in critical systems. This paper tackles this challenge using the volatile ...
- The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation : Abstract: Modern deep networks are heavily overparameterized yet often generalize well, suggesting a form of low intrinsic complexity not reflected by parameter counts. We study this complexity at ini...
- Towards Active Synthetic Data Generation for Finetuning Language Models : Abstract: A common and effective means for improving language model capabilities involves finetuning a ``student'' language model's parameters on generations from a more proficient ``teacher'' model. ...
- Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments : Abstract: Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision ...
- D-CTNet: A Dual-Branch Channel-Temporal Forecasting Network with Frequency-Domain Correction : Abstract: Accurate Multivariate Time Series (MTS) forecasting is crucial for collaborative design of complex systems, Digital Twin building, and maintenance ahead of time. However, the collaborative i...
- Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks : Abstract: Organisms constantly pivot between tasks such as evading predators, foraging, traversing rugged terrain, and socializing, often within milliseconds. Remarkably, they preserve knowledge of on...
- WUSH: Near-Optimal Adaptive Transforms for LLM Quantization : Abstract: Quantization to low bitwidth is a standard approach for deploying large language models, however, a few extreme weights and activations stretch the dynamic range and reduce the effective res...
- Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning : Abstract: Reinforcement Learning (RL) has achieved remarkable success in various domains, yet it often relies on carefully designed programmatic reward functions to guide agent behavior. Designing suc...
- Subgroup Validity in Machine Learning for Echocardiogram Data : Abstract: Echocardiogram datasets enable training deep learning models to automate interpretation of cardiac ultrasound, thereby expanding access to accurate readings of diagnostically-useful images. ...
- Upper Approximation Bounds for Neural Oscillators : Abstract: Neural oscillators, originating from the second-order ordinary differential equations (ODEs), have demonstrated competitive performance in stably learning causal mappings between long-term s...
- Associative Syntax and Maximal Repetitions reveal context-dependent complexity in fruit bat communication : Abstract: This study presents an unsupervised method to infer discreteness, syntax and temporal structures of fruit-bats vocalizations, as a case study of graded vocal systems, and evaluates the compl...
- Bayesian dynamic scheduling of multipurpose batch processes under incomplete look-ahead information : Abstract: Multipurpose batch processes become increasingly popular in manufacturing industries since they adapt to low-volume, high-value products and shifting demands. These processes often operate i...
- Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining : Abstract: We investigate algorithmic variants of the Frank-Wolfe (FW) optimization method for pruning convolutional neural networks. This is motivated by the "Lottery Ticket Hypothesis", which suggest...
- Dynamic Algorithm for Explainable k-medians Clustering under lp Norm : Abstract: We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree...
- Fiber Bundle Networks: A Geometric Machine Learning Paradigm : Abstract: We propose Fiber Bundle Networks (FiberNet), a novel machine learning framework integrating differential geometry with machine learning. Unlike traditional deep neural networks relying on bl...
- From Regression to Classification: Exploring the Benefits of Categorical Representations of Energy in MLIPs : Abstract: Density Functional Theory (DFT) is a widely used computational method for estimating the energy and behavior of molecules. Machine Learning Interatomic Potentials (MLIPs) are models trained ...
- LGDC: Latent Graph Diffusion via Spectrum-Preserving Coarsening : Abstract: Graph generation is a critical task across scientific domains. Existing methods fall broadly into two categories: autoregressive models, which iteratively expand graphs, and one-shot models,...
- Learning to Reconstruct Temperature Field from Sparse Observations with Implicit Physics Priors : Abstract: Accurate reconstruction of temperature field of heat-source systems (TFR-HSS) is crucial for thermal monitoring and reliability assessment in engineering applications such as electronic devi...
- Know Thyself by Knowing Others: Learning Neuron Identity from Population Context : Abstract: Neurons process information in ways that depend on their cell type, connectivity, and the brain region in which they are embedded. However, inferring these factors from neural activity remai...
- Sum Rate Maximization in STAR-RIS-UAV-Assisted Networks: A CA-DDPG Approach for Joint Optimization : Abstract: With the rapid advances in programmable materials, reconfigurable intelligent surfaces (RIS) have become a pivotal technology for future wireless communications. The simultaneous transmittin...
- Research on Milling Machine Predictive Maintenance Based on Machine Learning and SHAP Analysis in Intelligent Manufacturing Environment : Abstract: In the context of intelligent manufacturing, this paper conducts a series of experimental studies on the predictive maintenance of industrial milling machine equipment based on the AI4I 2020...
- A Comparative Study of Machine Learning Algorithms for Electricity Price Forecasting with LIME-Based Interpretability : Abstract: With the rapid development of electricity markets, price volatility has significantly increased, making accurate forecasting crucial for power system operations and market decisions. Traditi...
- CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions : Abstract: Answer verification methods are widely employed in language model training pipelines spanning data curation, evaluation, and reinforcement learning with verifiable rewards (RLVR). While prio...
- On the Tension Between Optimality and Adversarial Robustness in Policy Optimization : Abstract: Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest ...
- Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe : Abstract: Recent efforts on Diffusion Mixture-of-Experts (MoE) models have primarily focused on developing more sophisticated routing mechanisms. However, we observe that the underlying architectural ...
- Efficient Hyperparameter Search for Non-Stationary Model Training : Abstract: Online learning is the cornerstone of applications like recommendation and advertising systems, where models continuously adapt to shifting data distributions. Model training for such system...
- milearn: A Python Package for Multi-Instance Machine Learning : Abstract: We introduce milearn, a Python package for multi-instance learning (MIL) that follows the familiar scikit-learn fit/predict interface while providing a unified framework for both classical a...
- Directed evolution algorithm drives neural prediction : Abstract: Neural prediction offers a promising approach to forecasting the individual variability of neurocognitive functions and disorders and providing prognostic indicators for personalized inventi...
- A Fine Evaluation Method for Cube Copying Test for Early Detection of Alzheimer's Disease : Abstract: Background: Impairment of visual spatial cognitive function is the most common early clinical manifestation of Alzheimer's Disease (AD). When the Montreal Cognitive Assessment (MoCA) uses th...
- CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace : Abstract: We present CLAPS, a posterior-aware conformal regression method that pairs a Last-Layer Laplace Approximation with split-conformal calibration. From the resulting Gaussian posterior, CLAPS d...
- RE-LLM: Integrating Large Language Models into Renewable Energy Systems : Abstract: Energy system models are increasingly employed to guide long-term planning in multi-sectoral environments where decisions span electricity, heat, transport, land use, and industry. While the...
- On Global Applicability and Location Transferability of Generative Deep Learning Models for Precipitation Downscaling : Abstract: Deep learning offers promising capabilities for the statistical downscaling of climate and weather forecasts, with generative approaches showing particular success in capturing fine-scale pr...
- Fantastic Features and Where to Find Them: A Probing Method to combine Features from Multiple Foundation Models : Abstract: Foundation models (FMs) trained with different objectives and data learn diverse representations, making some more effective than others for specific downstream tasks. Existing adaptation st...
- Fourier Neural Operators Explained: A Practical Perspective : Abstract: Partial differential equations (PDEs) govern a wide variety of dynamical processes in science and engineering, yet obtaining their numerical solutions often requires high-resolution discreti...
- Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging : Abstract: Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degrada...
- Conformal Prediction for Multi-Source Detection on a Network : Abstract: Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-sour...
- Tractable Weighted First-Order Model Counting with Bounded Treewidth Binary Evidence : Abstract: The Weighted First-Order Model Counting Problem (WFOMC) asks to compute the weighted sum of models of a given first-order logic sentence over a given domain. Conditioning WFOMC on evidence -...
- Maximizing the efficiency of human feedback in AI alignment: a comparative analysis : Abstract: Reinforcement Learning from Human Feedback (RLHF) relies on preference modeling to align machine learning systems with human values, yet the popular approach of random pair sampling with Bra...
- Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification : Abstract: Large Language Models (LLMs) have attracted significant attention for classification tasks, offering a flexible alternative to trusted classical machine learning models like LightGBM through...
- Faster Verified Explanations for Neural Networks : Abstract: Verified explanations are a theoretically-principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significan...
- We Still Don't Understand High-Dimensional Bayesian Optimization : Abstract: High-dimensional spaces have challenged Bayesian optimization (BO). Existing methods aim to overcome this so-called curse of dimensionality by carefully encoding structural assumptions, from...
- Hybrid Context-Fusion Attention (CFA) U-Net and Clustering for Robust Seismic Horizon Interpretation : Abstract: Interpreting seismic horizons is a critical task for characterizing subsurface structures in hydrocarbon exploration. Recent advances in deep learning, particularly U-Net-based architectures...
- Emergent Riemannian geometry over learning discrete computations on continuous manifolds : Abstract: Many tasks require mapping continuous input data (e.g. images) to discrete task outputs (e.g. class labels). Yet, how neural networks learn to perform such discrete computations on continuou...
- TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution Detection : Abstract: Deep neural networks often struggle to recognize when an input lies outside their training experience, leading to unreliable and overconfident predictions. Building dependable machine learni...
- Self-Supervised Dynamical System Representations for Physiological Time-Series : Abstract: The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological...
- SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks : Abstract: The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated Denial-of-Service (DoS) attacks and zero-day e...
- Scalable and Interpretable Scientific Discovery via Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) : Abstract: Kolmogorov-Arnold Networks (KANs) offer a promising alternative to Multi-Layer Perceptron (MLP) by placing learnable univariate functions on network edges, enhancing interpretability. Howeve...
- Data-Driven Modeling and Correction of Vehicle Dynamics : Abstract: We develop a data-driven framework for learning and correcting non-autonomous vehicle dynamics. Physics-based vehicle models are often simplified for tractability and therefore exhibit inher...
- Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains : Abstract: This study analyzes the impact of heterogeneity ("Variety") in Big Data by comparing classification strategies across structured (Epsilon) and unstructured (Rest-Mex, IMDB) domains. A dual m...
- Introducing AI-Driven IoT Energy Management Framework : Abstract: Power consumption has become a critical aspect of modern life due to the consistent reliance on technological advancements. Reducing power consumption or following power usage predictions ca...
- Adaptive prediction theory combining offline and online learning : Abstract: Real-world intelligence systems usually operate by combining offline learning and online adaptation with highly correlated and non-stationary system data or signals, which, however, has rare...
- Provable Memory Efficient Self-Play Algorithm for Model-free Reinforcement Learning : Abstract: The thriving field of multi-agent reinforcement learning (MARL) studies how a group of interacting agents make decisions autonomously in a shared dynamic environment. Existing theoretical st...
- Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning : Abstract: Multi-agent reinforcement learning (MARL), as a thriving field, explores how multiple agents independently make decisions in a shared dynamic environment. Due to environmental uncertainties,...
- Learning Causal States Under Partial Observability and Perturbation : Abstract: A critical challenge for reinforcement learning (RL) is making decisions based on incomplete and noisy observations, especially in perturbed and partially observable Markov decision processe...
- Efficient and Programmable Exploration of Synthesizable Chemical Space : Abstract: The constrained nature of synthesizable chemical space poses a significant challenge for sampling molecules that are both synthetically accessible and possess desired properties. In this wor...
- Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics : Abstract: Many emerging applications - such as adversarial training, AI alignment, and robust optimization - can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) ...
- TrendGNN: Towards Understanding of Epidemics, Beliefs, and Behaviors : Abstract: Epidemic outcomes have a complex interplay with human behavior and beliefs. Most of the forecasting literature has focused on the task of predicting epidemic signals using simple mechanistic...
- Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease : Abstract: Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinic...
- Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction : Abstract: In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role. Trdaitional QSAR models that utilizes molecular descriptor based data often s...
- Hyperbolic Continuous Structural Entropy for Hierarchical Clustering : Abstract: Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms. However, existing hierarchical clustering methods encounter two primary challen...
- Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine : Abstract: The widespread adoption of complex machine learning models in high-stakes domains has brought the "black-box" problem to the forefront of responsible AI research. This paper aims at addressi...
- Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets : Abstract: Given a training dataset, the goal of dataset distillation is to derive a synthetic dataset such that models trained on the latter perform as well as those trained on the training dataset. I...
- DQ4FairIM: Fairness-aware Influence Maximization using Deep Reinforcement Learning : Abstract: The Influence Maximization (IM) problem aims to select a set of seed nodes within a given budget to maximize the spread of influence in a social network. However, real-world social networks ...
- A Graph Neural Network Approach for Localized and High-Resolution Temperature Forecasting : Abstract: Heatwaves are intensifying worldwide and are among the deadliest weather disasters. The burden falls disproportionately on marginalized populations and the Global South, where under-resource...
- Pre-Generating Multi-Difficulty PDE Data for Few-Shot Neural PDE Solvers : Abstract: A key aspect of learned partial differential equation (PDE) solvers is that the main cost often comes from generating training data with classical solvers rather than learning the model itse...
- Non-Asymptotic Convergence of Discrete Diffusion Models: Masked and Random Walk dynamics : Abstract: We investigate the theoretical underpinnings of Discrete Diffusion Models (DDMs) on discrete state spaces. Unlike in the continuous setting-where diffusion models are well understood both th...
- Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D : Abstract: This work presents the development and evaluation of an NLP-enabled probabilistic classifier designed to estimate the probability of technical and regulatory success (pTRS) for clinical tria...
- Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning : Abstract: Aiming to identify precise evidence sources from visual documents, visual evidence attribution for visual document retrieval-augmented generation (VD-RAG) ensures reliable and verifiable pre...
- VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis : Abstract: The current remote sensing image analysis task is increasingly evolving from traditional object recognition to complex intelligence reasoning, which places higher requirements on the model's...
- Schema Matching on Graph: Iterative Graph Exploration for Efficient and Explainable Data Integration : Abstract: Schema matching is a critical task in data integration, particularly in the medical domain where disparate Electronic Health Record (EHR) systems must be aligned to standard models like OMOP...
- Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings : Abstract: Learning high quality sentence embeddings from dialogues has drawn increasing attentions as it is essential to solve a variety of dialogue-oriented tasks with low annotation cost. Annotating...
- Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR : Abstract: Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or special...
- Reliable Reasoning Beyond Natural Language : Abstract: Despite their linguistic competence, Large Language Models (LLMs) often struggle to reason reliably and flexibly. To identify these shortcomings, we introduce the Non-Linear Reasoning (NLR) ...
- NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model : Abstract: Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would...
- Localized Conformal Multi-Quantile Regression : Abstract: Standard conformal prediction methods guarantee marginal coverage but often produce inefficient intervals that fail to adapt to local heteroscedasticity, while recent localized approaches of...
- Adversarial Exploitation of Data Diversity Improves Visual Localization : Abstract: Visual localization, which estimates a camera's pose within a known scene, is a fundamental capability for autonomous systems. While absolute pose regression (APR) methods have shown promise...
- Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion : Abstract: This paper considers blind inverse image restoration, the task of predicting a target image from a degraded source when the degradation (i.e. the forward operator) is unknown. Existing solut...
- PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation : Abstract: Despite significant advancements in Large Vision-Language Models (LVLMs)' capabilities, existing pixel-grounding models operate in single-image settings, limiting their ability to perform de...
- Reasoning-Intensive Regression : Abstract: AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlik...
- Robust Detection of Synthetic Tabular Data under Schema Variability : Abstract: The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, des...
- AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions : Abstract: Although current large audio language models (LALMs) extend text large language models (LLMs) with generic acoustic understanding abilities, they usually suffer from prompt sensitivity, wher...
- ORACLE: Explaining Feature Interactions in Neural Networks with ANOVA : Abstract: We introduce ORACLE, a framework that explains neural networks on tabular and scientific design data. It fits ANOVA-style main and pairwise interaction effects to a model's prediction surfac...
- From Black Hole to Galaxy: Neural Operator: Framework for Accretion and Feedback Dynamics : Abstract: Modeling how supermassive black holes co-evolve with their host galaxies is notoriously hard because the relevant physics spans nine orders of magnitude in scale-from milliparsecs to megapar...
- RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained Descriptions : Abstract: In this paper, we introduce RoleMotion, a large-scale human motion dataset that encompasses a wealth of role-playing and functional motion data tailored to fit various specific scenes. Exist...
- HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment : Abstract: Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioner...
- Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems : Abstract: Ensuring LLM reliability requires not only solving complex problems but also recognizing when a problem is unsolvable. Current models often struggle to distinguish objective unsolvability (i...
- ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models : Abstract: Anomaly detection (AD) is a fundamental task of critical importance across numerous domains. Current systems increasingly operate in rapidly evolving environments that generate diverse yet i...
- StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos : Abstract: Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications like AR glasses. While prior st...
- Weight Space Representation Learning with Neural Fields : Abstract: In this work, we investigate the potential of weights to serve as effective representations, focusing on neural fields. Our key insight is that constraining the optimization space through a ...
- Dual Randomized Smoothing: Beyond Global Noise Variance : Abstract: Randomized Smoothing (RS) is a prominent technique for certifying the robustness of neural networks against adversarial perturbations. With RS, achieving high accuracy at small radii require...
- Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights : Abstract: Current multimodal models aim to transcend the limitations of single-modality representations by unifying understanding and generation, often using text-to-image (T2I) tasks to calibrate sem...
- InnoGym: Benchmarking the Innovation Potential of AI Agents : Abstract: LLMs and Agents have achieved impressive progress in code generation, mathematical reasoning, and scientific discovery. However, existing benchmarks primarily measure correctness, overlookin...
- Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models : Abstract: Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) ...
- Mitigating Gender Bias in Depression Detection via Counterfactual Inference : Abstract: Audio-based depression detection models have demonstrated promising performance but often suffer from gender bias due to imbalanced training data. Epidemiological statistics show a higher pr...
- BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages : Abstract: Large language models (LLMs) are increasingly deployed in multilingual applications but often generate plausible yet incorrect or misleading outputs, known as hallucinations. While hallucina...
- Topological Order in Deep State : Abstract: Topologically ordered states are among the most interesting quantum phases of matter that host emergent quasi-particles having fractional charge and obeying fractional quantum statistics. Th...
- Cross-Lingual Interleaving for Speech Language Models : Abstract: Spoken Language Models (SLMs) aim to learn linguistic competence directly from speech using discrete units, widening access to Natural Language Processing (NLP) technologies for languages wi...
- Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion : Abstract: The training of deep vision models is fundamentally a signal recovery problem amidst high-dimensional stochastic noise. Current optimization paradigms impose a static compromise on informati...
- Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models : Abstract: With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harm...
- Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model : Abstract: Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under ...
- Rectifying LLM Thought from Lens of Optimization : Abstract: Recent advancements in large language models (LLMs) have been driven by their emergent reasoning capabilities, particularly through long chain-of-thought (CoT) prompting, which enables thoro...
- SVRG and Beyond via Posterior Correction : Abstract: Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising ...
- An Empirical Study of Agent Developer Practices in AI Agent Frameworks : Abstract: The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that...
- Agentic Policy Optimization via Instruction-Policy Co-Evolution : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capability of large language models (LLMs), enabling autonomous agents that can conduct effective multi-turn ...
- GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment : Abstract: Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planni...
- AI-Driven Optimization under Uncertainty for Mineral Processing Operations : Abstract: The global capacity for mineral processing must expand rapidly to meet the demand for critical minerals, which are essential for building the clean energy technologies necessary to mitigate ...
- Forecasting in Offline Reinforcement Learning for Non-stationary Environments : Abstract: Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing ...
- RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies : Abstract: Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, lea...
- Learning Sim-to-Real Humanoid Locomotion in 15 Minutes : Abstract: Massively parallel simulation has reduced reinforcement learning (RL) training time for robots from days to minutes. However, achieving fast and reliable sim-to-real RL for humanoid control ...
- Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion : Abstract: Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchroniz...
- A Diffusion Model Framework for Maximum Entropy Reinforcement Learning : Abstract: Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum E...
- EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI : Abstract: Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generat...
- Meta-Reinforcement Learning for Building Energy Management System : Abstract: The building sector is one of the largest contributors to global energy consumption. Improving its energy efficiency is essential for reducing operational costs and greenhouse gas emissions....
- A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks : Abstract: This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and...
- The Station: An Open-World Environment for AI-Driven Discovery : Abstract: We introduce the STATION, an open-world multi-agent environment for autonomous scientific discovery. The Station simulates a complete scientific ecosystem, where agents can engage in long sc...
- Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision : Abstract: Region representation learning plays a pivotal role in urban computing by extracting meaningful features from unlabeled urban data. Analogous to how perceived facial age reflects an individu...
- World Model Robustness via Surprise Recognition : Abstract: AI systems deployed in the real world must contend with distractions and out-of-distribution (OOD) noise that can destabilize their policies and lead to unsafe behavior. While robust trainin...
- Mode-Conditioning Unlocks Superior Test-Time Scaling : Abstract: Parallel sampling promises substantial gains in test-time scaling, but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated sample...
- SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models : Abstract: Understanding social interactions from visual cues is a fundamental challenge for a socially competent AI. While powerful pre-trained vision-language models (VLMs) have shown remarkable gene...
- Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution : Abstract: As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may ...
- DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling : Abstract: Adversarially guided diffusion sampling often achieves the target class, but sample quality degrades as deviations between the adversarially controlled and nominal trajectories accumulate. W...
- Beyond Greenfield: AI-Driven Productivity in Documentation and Brownfield Engineering : Abstract: Brownfield engineering work involving legacy systems, incomplete documentation, and fragmented architectural knowledge poses unique challenges for the effective use of large language models ...
- 2D-ThermAl: Physics-Informed Framework for Thermal Analysis of Circuits using Generative AI : Abstract: Thermal analysis is increasingly critical in modern integrated circuits, where non-uniform power dissipation and high transistor densities can cause rapid temperature spikes and reliability ...
- Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset Generation : Abstract: Efficient and accurate annotation of datasets remains a significant challenge for deploying object detection models such as You Only Look Once (YOLO) in real-world applications, particularly...
- A TinyML Reinforcement Learning Approach for Energy-Efficient Light Control in Low-Cost Greenhouse Systems : Abstract: This study presents a reinforcement learning (RL)-based control strategy for adaptive lighting regulation in controlled environments using a low-power microcontroller. A model-free Q-learnin...
- Data assimilation and discrepancy modeling with shallow recurrent decoders : Abstract: The requirements of modern sensing are rapidly evolving, driven by increasing demands for data efficiency, real-time processing, and deployment under limited sensing coverage. Complex physic...
- Conversion rate prediction in online advertising: modeling techniques, performance evaluation and future directions : Abstract: Conversion and conversion rate (CVR) prediction play a critical role in efficient advertising decision-making. In past decades, although researchers have developed plenty of models for CVR p...
- Toward a benchmark for CTR prediction in online advertising: datasets, evaluation protocols and perspectives : Abstract: This research designs a unified architecture of CTR prediction benchmark (Bench-CTR) platform that offers flexible interfaces with datasets and components of a wide range of CTR prediction m...
- First On-Orbit Demonstration of a Geospatial Foundation Model : Abstract: Geospatial foundation models (GeoFMs) promise broad generalisation capacity for Earth observation (EO) tasks, particularly under data-limited conditions. However, their large size poses a ba...
- TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness : Abstract: The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. Th...
- Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement : Abstract: Transformer models often exhibit brittle extrapolation, failing on inputs that are longer or structurally more complex than those seen during training. We introduce Counter-Example-Driven Cu...
- Real-World Reinforcement Learning of Active Perception Behaviors : Abstract: A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to ...
- Physics-Constrained Neural Dynamics: A Unified Manifold Framework for Large-Scale Power Flow Computation : Abstract: Power flow analysis is a fundamental tool for power system analysis, planning, and operational control. Traditional Newton-Raphson methods suffer from limitations such as initial value sensi...
- Pay Attention Later: From Vector Space Diffusion to Linearithmic Spectral Phase-Locking : Abstract: Standard Transformers suffer from a "Semantic Alignment Tax", a prohibitive optimization cost required to organize a chaotic initialization into a coherent geometric map via local gradient d...
- M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis : Abstract: In the contemporary digital landscape, multi-modal media manipulation has emerged as a significant societal threat, impacting the reliability and integrity of information dissemination. Curr...
- How do trout regulate patterns of muscle contraction to optimize propulsive efficiency during steady swimming : Abstract: Understanding efficient fish locomotion offers insights for biomechanics, fluid dynamics, and engineering. Traditional studies often miss the link between neuromuscular control and whole-bod...
- Neural Network Optimal Power Flow via Energy Gradient Flow and Unified Dynamics : Abstract: Optimal Power Flow (OPF) is a core optimization problem in power system operation and planning, aiming to minimize generation costs while satisfying physical constraints such as power flow e...
- S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance : Abstract: 3D Visual Grounding (3DVG) focuses on locating objects in 3D scenes based on natural language descriptions, serving as a fundamental task for embodied AI and robotics. Recent advances in Mul...
- LLM-as-a-Judge for Scalable Test Coverage Evaluation: Accuracy, Operational Reliability, and Cost : Abstract: Assessing software test coverage at scale remains a bottleneck in QA pipelines. We present LLM-as-a-Judge (LAJ), a production-ready, rubric-driven framework for evaluating Gherkin acceptance...
- Proactive Agentic Whiteboards: Enhancing Diagrammatic Learning : Abstract: Educators frequently rely on diagrams to explain complex concepts during lectures, yet creating clear and complete visual representations in real time while simultaneously speaking can be co...
- First, do NOHARM: towards clinically safe large language models : Abstract: Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Opt...
- Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation : Abstract: We study abstract visual composition, in which identity is primarily determined by the spatial configuration and relations among a small set of geometric primitives (e.g., parts, symmetry, t...
- Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework : Abstract: This paper introduces a new family of multi-parent recombination operators for Genetic Algorithms (GAs), based on normalized Pascal (binomial) coefficients. Unlike classical two-parent cross...
- Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language - A Low-resource Language : Abstract: The Nagamese language, a.k.a Naga Pidgin, is an Assamese-lexified creole language developed primarily as a means of communication in trade between the people from Nagaland and people from As...
- Social Media Data Mining of Human Behaviour during Bushfire Evacuation : Abstract: Traditional data sources on bushfire evacuation behaviour, such as quantitative surveys and manual observations have severe limitations. Mining social media data related to bushfire evacuati...
- SUPERChem: A Multimodal Reasoning Benchmark in Chemistry : Abstract: Current benchmarks for evaluating the chemical reasoning capabilities of Large Language Models (LLMs) are limited by oversimplified tasks, lack of process-level evaluation, and misalignment ...
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding : Abstract: Reasoning language models have demonstrated remarkable capabilities on challenging tasks by generating elaborate chain-of-thought (CoT) solutions. However, such lengthy generation shifts the...
- Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning : Abstract: As web platforms evolve towards greater personalization and emotional complexity, conversational agents must transcend superficial empathy to demonstrate identity-aware emotional reasoning. ...
- Generative Modeling with Continuous Flows: Sample Complexity of Flow Matching : Abstract: Flow matching has recently emerged as a promising alternative to diffusion-based generative models, offering faster sampling and simpler training by learning continuous flows governed by ord...
- Diffusion Model in Latent Space for Medical Image Segmentation Task : Abstract: Medical image segmentation is crucial for clinical diagnosis and treatment planning. Traditional methods typically produce a single segmentation mask, failing to capture inherent uncertainty...
- FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection : Abstract: Foreign Object Debris (FOD) within aircraft fuel tanks presents critical safety hazards including fuel contamination, system malfunctions, and increased maintenance costs. Despite the severi...
- Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding : Abstract: Minimum Bayes risk (MBR) decoding generates high-quality translations by maximizing the expected utility of output candidates, but it evaluates all pairwise scores over the candidate set; he...
- Data-Driven Learnability Transition of Measurement-Induced Entanglement : Abstract: Measurement-induced entanglement (MIE) captures how local measurements generate long-range quantum correlations and drive dynamical phase transitions in many-body systems. Yet estimating MIE...
- Rethinking Intracranial Aneurysm Vessel Segmentation: A Perspective from Computational Fluid Dynamics Applications : Abstract: The precise segmentation of intracranial aneurysms and their parent vessels (IA-Vessel) is a critical step for hemodynamic analyses, which mainly depends on computational fluid dynamics (CFD...
- EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations : Abstract: Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study ...
- Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models : Abstract: As Large Language Models (LLMs) continue to scale in parameter count, deploying them on commodity hardware has become increasingly challenging. Post-Training Quantization (PTQ) addresses thi...
- Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity : Abstract: Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing. However,...
- Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators : Abstract: Diffusion-based solvers for partial differential equations (PDEs) are often bottle-necked by slow gradient-based test-time optimization routines that use PDE residuals for loss guidance. The...
- Structured Spectral Reasoning for Frequency-Adaptive Multimodal Recommendation : Abstract: Multimodal recommendation aims to integrate collaborative signals with heterogeneous content such as visual and textual information, but remains challenged by modality-specific noise, semant...
- Stabilizing Reinforcement Learning with LLMs: Formulation and Practices : Abstract: This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized v...
- Consistency Flow Model Achieves One-step Denoising Error Correction Codes : Abstract: Error Correction Codes (ECC) are fundamental to reliable digital communication, yet designing neural decoders that are both accurate and computationally efficient remains challenging. Recent...
- A Self-explainable Model of Long Time Series by Extracting Informative Structured Causal Patterns : Abstract: Explainability is essential for neural networks that model long time series, yet most existing explainable AI methods only produce point-wise importance scores and fail to capture temporal s...
- Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries : Abstract: Vision-Language Models (VLMs) excel in multimodal tasks but often exhibit Western-centric biases, limiting their effectiveness in culturally diverse regions like Southeast Asia (SEA). To add...
- PromptBridge: Cross-Model Prompt Transfer for Large Language Models : Abstract: Large language models (LLMs) underpin applications in code generation, mathematical reasoning, and agent-based workflows. In practice, systems access LLMs via commercial APIs or open-source ...
- PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis : Abstract: Multimodal sentiment analysis (MSA) is a research field that recognizes human sentiments by combining textual, visual, and audio modalities. The main challenge lies in integrating sentiment-...
- ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation : Abstract: Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time intr...
- Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReLU Network? : Abstract: We consider the problem of generalization of arbitrarily overparameterized two-layer ReLU Neural Networks with univariate input. Recent work showed that under square loss, flat solutions (mo...
- Multi-view diffusion geometry using intertwined diffusion trajectories : Abstract: This paper introduces a comprehensive unified framework for constructing multi-view diffusion geometries through intertwined multi-view diffusion trajectories (MDTs), a class of inhomogeneou...
- Formal Verification of Noisy Quantum Reinforcement Learning Policies : Abstract: Quantum reinforcement learning (QRL) aims to use quantum effects to create sequential decision-making policies that achieve tasks more effectively than their classical counterparts. However,...
- Teaching an Online Multi-Institutional Research Level Software Engineering Course with Industry - an Experience Report : Abstract: Covid has made online teaching and learning acceptable and students, faculty, and industry professionals are all comfortable with this mode. This comfort can be leveraged to offer an online ...
- Diffusion Fuzzy System: Fuzzy Rule Guided Latent Multi-Path Diffusion Modeling : Abstract: Diffusion models have emerged as a leading technique for generating images due to their ability to create high-resolution and realistic images. Despite their strong performance, diffusion mo...
- Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis : Abstract: Deep unsupervised anomaly detection in brain magnetic resonance imaging offers a promising route to identify pathological deviations without requiring lesion-specific annotations. Yet, fragm...
- Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization : Abstract: Recent neural audio codecs have achieved impressive reconstruction quality, typically relying on quantization methods such as Residual Vector Quantization (RVQ), Vector Quantization (VQ) and...
- LPCD: Unified Framework from Layer-Wise to Submodule Quantization : Abstract: Post-training quantization (PTQ) aims to preserve model-level behavior; however, most methods focus on individual linear layers. Even recent extensions, such as QEP and LoaQ, which mitigate ...
- Delta Sum Learning: an approach for fast and global convergence in Gossip Learning : Abstract: Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning fur...
- MasHeNe: A Benchmark for Head and Neck CT Mass Segmentation using Window-Enhanced Mamba with Frequency-Domain Integration : Abstract: Head and neck masses are space-occupying lesions that can compress the airway and esophagus and may affect nerves and blood vessels. Available public datasets primarily focus on malignant le...
- Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding : Abstract: We propose an always-feasible quadratic programming (QP) optimizer, FlexQP, which is based on an exact relaxation of the QP constraints. If the original constraints are feasible, then the op...
- Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade : Abstract: Reconstructing full fields from extremely sparse and random measurements is a longstanding ill-posed inverse problem. A powerful framework for addressing such challenges is hierarchical prob...
- Neuroscience-Inspired Memory Replay for Continual Learning: A Comparative Study of Predictive Coding and Backpropagation-Based Strategies : Abstract: Continual learning remains a fundamental challenge in artificial intelligence, with catastrophic forgetting posing a significant barrier to deploying neural networks in dynamic environments....
- Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning : Abstract: The rapid evolution of end-to-end AI music generation poses an escalating threat to artistic authenticity and copyright, demanding detection methods that can keep pace. While foundational, e...
- Automatic Pith Detection in Tree Cross-Section Images Using Deep Learning : Abstract: Pith detection in tree cross-sections is essential for forestry and wood quality analysis but remains a manual, error-prone task. This study evaluates deep learning models -- YOLOv9, U-Net, ...
- XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 Performance : Abstract: Accurate and timely diagnosis of multi-class skin lesions is hampered by subjective methods, inherent data imbalance in datasets like HAM10000, and the "black box" nature of Deep Learning (D...
- Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation : Abstract: The increasing prevalence of thyroid cancer globally has led to the development of various computer-aided detection methods. Accurate segmentation of thyroid nodules is a critical first step...
- Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition : Abstract: Cross-domain facial expression recognition (CD-FER) remains difficult due to severe domain shift between training and deployment data. We propose Graph-Attention Network with Adversarial Dom...
- MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba : Abstract: Vision Mamba has emerged as a promising and efficient alternative to Vision Transformers, yet its efficiency remains fundamentally constrained by the number of input tokens. Existing token r...
- ML-Tool-Bench: Tool-Augmented Planning for ML Tasks : Abstract: The development of autonomous machine learning (ML) agents capable of end-to-end data science workflows represents a significant frontier in artificial intelligence. These agents must orches...
- Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer : Abstract: Recent progress in 4D representations, such as Dynamic NeRF and 4D Gaussian Splatting (4DGS), has enabled dynamic 4D scene reconstruction. However, text-driven 4D scene editing remains under...
- Hierarchical Molecular Language Models (HMLMs) : Abstract: Cellular signaling networks represent complex information processing systems that have been modeled via traditional mathematical or statistical approaches. However, these methods often strug...
- Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation : Abstract: Recently, large vision-language models (LVLMs) have risen to be a promising approach for multimodal tasks. However, principled hallucination mitigation remains a critical challenge.In this w...
- Concept-Guided Backdoor Attack on Vision Language Models : Abstract: Vision-Language Models (VLMs) have achieved impressive progress in multimodal text generation, yet their rapid adoption raises increasing concerns about security vulnerabilities. Existing ba...
- Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration Frameworks : Abstract: Early cancer detection remains one of the most critical challenges in modern healthcare, where delayed diagnosis significantly reduces survival outcomes. Recent advancements in artificial in...
- Graph Data Augmentation with Contrastive Learning on Covariate Distribution Shift : Abstract: Covariate distribution shift occurs when certain structural features present in the test set are absent from the training set. It is a common type of out-of-distribution (OOD) problem, frequ...
- Deep Learning for Modeling and Dispatching Hybrid Wind Farm Power Generation : Abstract: Wind farms with integrated energy storage, or hybrid wind farms, are able to store energy and dispatch it to the grid following an operational strategy. For individual wind farms with integr...
- REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories : Abstract: Humans build viewpoint-independent cognitive maps through navigation, enabling intuitive reasoning about object permanence and spatial relations. We argue that multimodal large language mode...
- Orchestrating Rewards in the Era of Intelligence-Driven Commerce : Abstract: Despite their evolution from early copper-token schemes to sophisticated digital solutions, loyalty programs remain predominantly closed ecosystems, with brands retaining full control over a...
- MASCOT: Analyzing Malware Evolution Through A Well-Curated Source Code Dataset : Abstract: In recent years, the explosion of malware and extensive code reuse have formed complex evolutionary connections among malware specimens. The rapid pace of development makes it challenging fo...
- On the Regulatory Potential of User Interfaces for AI Agent Governance : Abstract: AI agents that take actions in their environment autonomously over extended time horizons require robust governance interventions to curb their potentially consequential risks. Prior proposa...
- Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization : Abstract: Medical image segmentation is inherently influenced by data uncertainty, arising from ambiguous boundaries in medical scans and inter-observer variability in diagnosis. To address this chall...
- Preventing Model Collapse via Contraction-Conditioned Neural Filters : Abstract: This paper presents a neural network filter method based on contraction operators to address model collapse in recursive training of generative models. Unlike \cite{xu2024probabilistic}, whi...
- Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tailed Class Imbalance : Abstract: Adaptive optimization methods (such as Adam) play a major role in LLM pretraining, significantly outperforming Gradient Descent (GD). Recent studies have proposed new smoothness assumptions ...
- EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes : Abstract: Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressin...
- SHRAG: AFrameworkfor Combining Human-Inspired Search with RAG : Abstract: Retrieval-Augmented Generation (RAG) is gaining recognition as one of the key technological axes for next generation information retrieval, owing to its ability to mitigate the hallucination...
- Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions : Abstract: When the distributions of the training and test data do not coincide, the problem of understanding generalization becomes considerably more complex, prompting a variety of questions. In this...
- Bias Injection Attacks on RAG Databases and Sanitization Defenses : Abstract: This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic conten...
- Causal Invariance and Counterfactual Learning Driven Cooperative Game for Multi-Label Classification : Abstract: Multi-label classification (MLC) remains vulnerable to label imbalance, spurious correlations, and distribution shifts, challenges that are particularly detrimental to rare label prediction....
- Accelerating Bangla NLP Tasks with Automatic Mixed Precision: Resource-Efficient Training Preserving Model Efficacy : Abstract: Training models for Natural Language Processing (NLP) requires substantial computational resources and time, posing significant challenges, especially for NLP development in Bangla, where ac...
- Topological Federated Clustering via Gravitational Potential Fields under Local Differential Privacy : Abstract: Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while mai...
- HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs : Abstract: We introduce HBLLM, a wavelet-enhanced high-fidelity $1$-bit post-training quantization method for Large Language Models (LLMs). By leveraging Haar wavelet transforms to enhance expressive c...
- TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models : Abstract: Existing foundation models (FMs) in the medical domain often require extensive fine-tuning or rely on training resource-intensive decoders, while many existing encoders are pretrained with o...
- Less is More: Resource-Efficient Low-Rank Adaptation : Abstract: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter...
- Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints : Abstract: Vision-Language Models (VLMs) exhibit significant performance plateaus in specialized domains like precision agriculture, primarily due to "Reasoning-Driven Hallucination" where linguistic p...
- Light-Weight Benchmarks Reveal the Hidden Hardware Cost of Zero-Shot Tabular Foundation Models : Abstract: Zero-shot foundation models (FMs) promise training-free prediction on tabular data, yet their hardware footprint remains poorly characterized. We present a fully reproducible benchmark that ...
- Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a central approach for improving the reasoning ability of large language models. Recent work studies RLVR through token entro...
- ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices : Abstract: This study presents a comprehensive deep learning pipeline for the automated classification of 12 foraminifera species using 2D micro-CT slices derived from 3D scans. We curated a scientific...
- Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study : Abstract: Large language models (LLMs) produce context inconsistency hallucinations, which are LLM generated outputs that are misaligned with the user prompt. This research project investigates whethe...
- DeformAr: Rethinking NER Evaluation through Component Analysis and Visual Analytics : Abstract: Transformer models have significantly advanced Natural Language Processing (NLP), demonstrating strong performance in English. However, their effectiveness in Arabic, particularly for Named ...
- Constant-Time Motion Planning with Manipulation Behaviors : Abstract: Recent progress in contact-rich robotic manipulation has been striking, yet most deployed systems remain confined to simple, scripted routines. One of the key barriers is the lack of motion ...
- Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data : Abstract: Large language models (LLMs) play an increasingly important role in finan- cial markets analysis by capturing signals from complex and heterogeneous textual data sources, such as tweets, new...
- Table as a Modality for Large Language Models : Abstract: To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular d...
- Multi-Modal AI for Remote Patient Monitoring in Cancer Care : Abstract: For patients undergoing systemic cancer therapy, the time between clinic visits is full of uncertainties and risks of unmonitored side effects. To bridge this gap in care, we developed and p...
- Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search : Abstract: Ranking relevance is a fundamental task in search engines, aiming to identify the items most relevant to a given user query. Traditional relevance models typically produce scalar scores or d...
- An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis : Abstract: Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster obser...
- Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent Fingerprints : Abstract: Medical imaging is essential for clinical diagnosis, yet real-world data frequently suffers from corruption, noise, and potential tampering, challenging the reliability of AI-assisted interp...
- Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis : Abstract: Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational sci...
- Operator-Theoretic Framework for Gradient-Free Federated Learning : Abstract: Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the $...
- VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference : Abstract: Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos a...
- AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning : Abstract: Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learnin...
- Goal-Oriented Multi-Agent Semantic Networking: Unifying Intents, Semantics, and Intelligence : Abstract: 6G services are evolving toward goal-oriented and AI-native communication, which are expected to deliver transformative societal benefits across various industries and promote energy sustain...
- When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals : Abstract: Safety-aligned language models often refuse prompts that are actually harmless. Current evaluations mostly report global rates such as false rejection or compliance. These scores treat each ...
- FMTK: A Modular Toolkit for Composable Time Series Foundation Model Pipelines : Abstract: Foundation models (FMs) have opened new avenues for machine learning applications due to their ability to adapt to new and unseen tasks with minimal or no further training. Time-series found...
- Adaptive-lambda Subtracted Importance Sampled Scores in Machine Unlearning for DDPMs and VAEs : Abstract: Machine Unlearning is essential for large generative models (VAEs, DDPMs) to comply with the right to be forgotten and prevent undesired content generation without costly retraining. Existin...
- Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction : Abstract: Although scaling laws and many empirical results suggest that increasing the size of Vision Transformers often improves performance, model accuracy and training behavior are not always monot...
- PIANO: Physics-informed Dual Neural Operator for Precipitation Nowcasting : Abstract: Precipitation nowcasting, key for early warning of disasters, currently relies on computationally expensive and restrictive methods that limit access to many countries. To overcome this chal...
- On The Finetuning of MLIPs Through the Lens of Iterated Maps With BPTT : Abstract: Vital to the creation of advanced materials is performing structural relaxations. Traditional approaches built on physics-derived first-principles calculations are computationally expensive,...
- CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions : Abstract: We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundam...
- Discriminative classification with generative features: bridging Naive Bayes and logistic regression : Abstract: We introduce Smart Bayes, a new classification framework that bridges generative and discriminative modeling by integrating likelihood-ratio-based generative features into a logistic-regress...
- Supporting Productivity Skill Development in College Students through Social Robot Coaching: A Proof-of-Concept : Abstract: College students often face academic challenges that hamper their productivity and well-being. Although self-help books and productivity apps are popular, they often fall short. Books provid...
- Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning : Abstract: Algorithmic reasoning -- the ability to perform step-by-step logical inference -- has become a core benchmark for evaluating reasoning in graph neural networks (GNNs) and large language mode...
- Comparative Evaluation of Generative AI Models for Chest Radiograph Report Generation in the Emergency Department : Abstract: Purpose: To benchmark open-source or commercial medical image-specific VLMs against real-world radiologist-written reports. Methods: This retrospective study included adult patients who pres...
- Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning : Abstract: Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introdu...
- HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention : Abstract: In remote sensing applications, such as disaster detection and response, real-time efficiency and model lightweighting are of critical importance. Consequently, existing remote sensing image...
- BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models : Abstract: Foundation models have revolutionized various fields such as natural language processing (NLP) and computer vision (CV). While efforts have been made to transfer the success of the foundatio...
- RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals : Abstract: Existing appliance assets suffer from poor rendering, incomplete mechanisms, and misalignment with manuals, leading to simulation-reality gaps that hinder appliance manipulation development....
- EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education : Abstract: Large language models (LLMs) demonstrate significant potential for educational applications. However, their unscrutinized deployment poses risks to educational standards, underscoring the ne...
- FiCoTS: Fine-to-Coarse LLM-Enhanced Hierarchical Cross-Modality Interaction for Time Series Forecasting : Abstract: Time series forecasting is central to data analysis and web technologies. The recent success of Large Language Models (LLMs) offers significant potential for this field, especially from the ...
- Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in AR : Abstract: Traditional augmented reality (AR) systems predominantly rely on fixed class detectors or fiducial markers, limiting their ability to interpret complex, open-vocabulary natural language quer...
- Gradient Inversion in Federated Reinforcement Learning : Abstract: Federated reinforcement learning (FRL) enables distributed learning of optimal policies while preserving local data privacy through gradient sharing.However, FRL faces the risk of data priva...
- VCWorld: A Biological World Model for Virtual Cell Simulation : Abstract: Virtual cell modeling aims to predict cellular responses to perturbations. Existing virtual cell models rely heavily on large-scale single-cell datasets, learning explicit mappings between g...
- Adversarial Signed Graph Learning with Differential Privacy : Abstract: Signed graphs with positive and negative edges can model complex relationships in social networks. Leveraging on balance theory that deduces edge signs from multi-hop node pairs, signed grap...
- Tracing Mathematical Proficiency Through Problem-Solving Processes : Abstract: Knowledge Tracing (KT) aims to model student's knowledge state and predict future performance to enable personalized learning in Intelligent Tutoring Systems. However, traditional KT methods...
- Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets : Abstract: Context-based question answering (CBQA) models provide more accurate and relevant answers by considering the contextual information. They effectively extract specific information given a con...
- Progressive Code Integration for Abstractive Bug Report Summarization : Abstract: Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-le...
- Evidence-Guided Schema Normalization for Temporal Tabular Reasoning : Abstract: Temporal reasoning over evolving semi-structured tables poses a challenge to current QA systems. We propose a SQL-based approach that involves (1) generating a 3NF schema from Wikipedia info...
- Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents : Abstract: Multi-turn tool-calling LLMs (models capable of invoking external APIs or tools across several user turns) have emerged as a key feature in modern AI assistants, enabling extended dialogues ...
- MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation : Abstract: We introduce MedCondDiff, a diffusion-based framework for multi-organ medical image segmentation that is efficient and anatomically grounded. The model conditions the denoising process on se...
- Towards aligned body representations in vision models : Abstract: Human physical reasoning relies on internal "body" representations - coarse, volumetric approximations that capture an object's extent and support intuitive predictions about motion and phys...
- S^2-KD: Semantic-Spectral Knowledge Distillation Spatiotemporal Forecasting : Abstract: Spatiotemporal forecasting often relies on computationally intensive models to capture complex dynamics. Knowledge distillation (KD) has emerged as a key technique for creating lightweight s...
- Evaluating LLMs in Open-Source Games : Abstract: Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions....
- Layer Probing Improves Kinase Functional Prediction with Protein Language Models : Abstract: Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful informa...
- An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines : Abstract: We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical i...
- From Coefficients to Directions: Rethinking Model Merging with Directional Alignment : Abstract: Model merging has emerged as a practical paradigm for integrating multiple independently trained models into a single model without joint retraining. Previous studies have demonstrated the e...
- Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement : Abstract: We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-constrained wearables and edge nodes. W...
- SelfAI: Building a Self-Training AI System with LLM Agents : Abstract: Recent work on autonomous scientific discovery has leveraged LLM-based agents to integrate problem specification, experiment planning, and execution into end-to-end systems. However, these f...
- Low-Bitrate Video Compression through Semantic-Conditioned Diffusion : Abstract: Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy ...
- Balancing Efficiency and Fairness: An Iterative Exchange Framework for Multi-UAV Cooperative Path Planning : Abstract: Multi-UAV cooperative path planning (MUCPP) is a fundamental problem in multi-agent systems, aiming to generate collision-free trajectories for a team of unmanned aerial vehicles (UAVs) to c...
- Red Teaming Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought ...
- Significant Other AI: Identity, Memory, and Emotional Regulation as Long-Term Relational Intelligence : Abstract: Significant Others (SOs) stabilize identity, regulate emotion, and support narrative meaning-making, yet many people today lack access to such relational anchors. Recent advances in large la...
- An Approach to Joint Hybrid Decision Making between Humans and Artificial Intelligence : Abstract: Due to the progress in artificial intelligence, it is important to understand how capable artificial agents should be used when interacting with humans, since high level authority and respon...
- FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal : Abstract: Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-t...
- PEOAT: Personalization-Guided Evolutionary Question Assembly for One-Shot Adaptive Testing : Abstract: With the rapid advancement of intelligent education, Computerized Adaptive Testing (CAT) has attracted increasing attention by integrating educational psychology with deep learning technolog...
- RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications : Abstract: Automated personality and soft skill assessment from multimodal behavioral data remains challenging due to limited datasets and methods that fail to capture geometric structure inherent in h...
- Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction : Abstract: Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulator...
- CausalAffect: Causal Discovery for Facial Affective Understanding : Abstract: Understanding human affect from facial behavior requires not only accurate recognition but also structured reasoning over the latent dependencies that drive muscle activations and their expr...
- SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling : Abstract: Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during infe...
- FairMT: Fairness for Heterogeneous Multi-Task Learning : Abstract: Fairness in machine learning has been extensively studied in single-task settings, while fair multi-task learning (MTL), especially with heterogeneous tasks (classification, detection, regre...
- RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards : Abstract: With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge Howe...
- CACARA: Cross-Modal Alignment Leveraging a Text-Centric Approach for Cost-Effective Multimodal and Multilingual Learning : Abstract: As deep learning models evolve, new applications and challenges are rapidly emerging. Tasks that once relied on a single modality, such as text, images, or audio, are now enriched by seamles...
- ESPO: Entropy Importance Sampling Policy Optimization : Abstract: Large language model (LLM) reinforcement learning has increasingly relied on group-based policy optimization frameworks, such as GRPO and GSPO, to achieve stable fine-tuning at scale. Howeve...
- G-KV: Decoding-Time KV Cache Eviction with Global Attention : Abstract: Recent reasoning large language models (LLMs) excel in complex tasks but encounter significant computational and memory challenges due to long sequence lengths. KV cache compression has emer...
- List Replicable Reinforcement Learning : Abstract: Replicability is a fundamental challenge in reinforcement learning (RL), as RL algorithms are empirically observed to be unstable and sensitive to variations in training conditions. To forma...
- Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals : Abstract: Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study pr...
- Describe Anything Anywhere At Any Moment : Abstract: Computer vision and robotics applications ranging from augmented reality to robot autonomy in large-scale environments require spatio-temporal memory frameworks that capture both geometric s...
- Enhancing Analogy-Based Software Effort Estimation with Firefly Algorithm Optimization : Abstract: Analogy-Based Estimation (ABE) is a popular method for non-algorithmic estimation due to its simplicity and effectiveness. The Analogy-Based Estimation (ABE) model was proposed by researcher...
- Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models : Abstract: Yoga is a popular form of exercise worldwide due to its spiritual and physical health benefits, but incorrect postures can lead to injuries. Automated yoga pose classification has therefore ...
- Slovak Conceptual Dictionary : Abstract: When solving tasks in the field of natural language processing, we sometimes need dictionary tools, such as lexicons, word form dictionaries or knowledge bases. However, the availability of ...
- Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models : Abstract: Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, l...
- IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference : Abstract: Modern AI inference faces an irreducible tension: no single computational resource simultaneously maximizes performance, preserves privacy, minimizes cost, and maintains trust. Existing orch...
- DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems : Abstract: Modern recommender systems struggle to effectively utilize the rich, yet high-dimensional and noisy, multi-modal features generated by Large Language Models (LLMs). Treating these features a...
- Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction : Abstract: Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit su...
- AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation : Abstract: The Open Digital Rights Language (ODRL) is a pivotal standard for automating data rights management. However, the inherent logical complexity of authorization policies, combined with the sca...
- On the Holographic Geometry of Deterministic Computation : Abstract: Standard simulations of Turing machines suggest a linear relationship between the temporal duration $t$ of a run and the amount of information that must be stored by known simulations to cer...
- Generalized Graph Transformer Variational Autoencoder : Abstract: Graph link prediction has long been a central problem in graph representation learning in both network analysis and generative modeling. Recent progress in deep learning has introduced incre...
- Hierarchical Decentralized Multi-Agent Coordination with Privacy-Preserving Knowledge Sharing: Extending AgentNet for Scalable Autonomous Systems : Abstract: Decentralized multi-agent systems have shown promise in enabling autonomous collaboration among LLM-based agents. While AgentNet demonstrated the feasibility of fully decentralized coordinat...
- Stable Voting and the Splitting of Cycles : Abstract: Algorithms for resolving majority cycles in preference aggregation have been studied extensively in computational social choice. Several sophisticated cycle-resolving methods, including Tide...
- ART: Adaptive Response Tuning Framework -- A Multi-Agent Tournament-Based Approach to LLM Response Optimization : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, single-model responses often exhibit inconsistencies, halluc...
- LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess : Abstract: We introduce LLM CHESS, an evaluation framework designed to probe the generalization of reasoning and instruction-following abilities in large language models (LLMs) through extended agentic...
- Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions : Abstract: Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained l...
- Use of Retrieval-Augmented Large Language Model Agent for Long-Form COVID-19 Fact-Checking : Abstract: The COVID-19 infodemic calls for scalable fact-checking solutions that handle long-form misinformation with accuracy and reliability. This study presents SAFE (system for accurate fact extra...
- MOTION: ML-Assisted On-Device Low-Latency Motion Recognition : Abstract: The use of tiny devices capable of low-latency gesture recognition is gaining momentum in everyday human-computer interaction and especially in medical monitoring fields. Embedded solutions ...
- Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant : Abstract: Qualitative research emphasizes constructing meaning through iterative engagement with textual data. Traditionally this human-driven process requires navigating coder fatigue and interpretat...
- Leveraging LLMs for Design Ideation: An AI Tool to Assist Creativity : Abstract: The creative potential of computers has intrigued researchers for decades. Since the emergence of Generative AI (Gen AI), computer creativity has found many new dimensions and applications. ...
- Cultural Prompting Improves the Empathy and Cultural Responsiveness of GPT-Generated Therapy Responses : Abstract: Large Language Model (LLM)-based conversational agents offer promising solutions for mental health support, but lack cultural responsiveness for diverse populations. This study evaluated the...
- The Impact of Concept Explanations and Interventions on Human-Machine Collaboration : Abstract: Deep Neural Networks (DNNs) are often considered black boxes due to their opaque decision-making processes. To reduce their opacity Concept Models (CMs), such as Concept Bottleneck Models (C...
- Architect in the Loop Agentic Hardware Design and Verification : Abstract: The ever increasing complexity of the hardware design process demands improved hardware design and verification methodologies. With the advent of generative AI various attempts have been mad...
- A Comprehensive Survey on Surgical Digital Twin : Abstract: With the accelerating availability of multimodal surgical data and real-time computation, Surgical Digital Twins (SDTs) have emerged as virtual counterparts that mirror, predict, and inform ...
- Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead : Abstract: Code generation has emerged as a critical research area at the intersection of Software Engineering (SE) and Artificial Intelligence (AI), attracting significant attention from both academia...
- A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation : Abstract: Vision-and-Language Navigation (VLN) is a multi-modal, cooperative task requiring agents to interpret human instructions, navigate 3D environments, and communicate effectively under ambiguit...
- Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning : Abstract: The robustness of Unmanned Surface Vehicles (USV) is crucial when facing unknown and complex marine environments, especially when heteroscedastic observational noise poses significant challe...
- Refined Bayesian Optimization for Efficient Beam Alignment in Intelligent Indoor Wireless Environments : Abstract: Future intelligent indoor wireless environments re- quire fast and reliable beam alignment to sustain high-throughput links under mobility and blockage. Exhaustive beam training achieves opt...
- LM4Opt-RA: A Multi-Candidate LLM Framework with Structured Ranking for Automating Network Resource Allocation : Abstract: Building on advancements in Large Language Models (LLMs), we can tackle complex analytical and mathematical reasoning tasks requiring nuanced contextual understanding. A prime example of suc...
- Constrained Network Slice Assignment via Large Language Models : Abstract: Modern networks support network slicing, which partitions physical infrastructure into virtual slices tailored to different service requirements (for example, high bandwidth or low latency)....
- Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions : Abstract: Multimodal reasoning has become a cornerstone of modern AI research. Standardized exam questions offer a uniquely rigorous testbed for such reasoning, providing structured visual contexts an...
- Assessing Large Language Models in Generating RTL Design Specifications : Abstract: As IC design grows more complex, automating comprehension and documentation of RTL code has become increasingly important. Engineers currently should manually interpret existing RTL code and...
- Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis : Abstract: This paper investigates the automation of qualitative data analysis, focusing on inductive coding using large language models (LLMs). Unlike traditional approaches that rely on deductive met...
- Emergent Convergence in Multi-Agent LLM Annotation : Abstract: Large language models (LLMs) are increasingly deployed in collaborative settings, yet little is known about how they coordinate when treated as black-box agents. We simulate 7500 multi-agent...
- Causal Reinforcement Learning based Agent-Patient Interaction with Clinical Domain Knowledge : Abstract: Reinforcement Learning (RL) faces significant challenges in adaptive healthcare interventions, such as dementia care, where data is scarce, decisions require interpretability, and underlying...
- Socially aware navigation for mobile robots: a survey on deep reinforcement learning approaches : Abstract: Socially aware navigation is a fast-evolving research area in robotics that enables robots to move within human environments while adhering to the implicit human social norms. The advent of ...
- Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control : Abstract: Conventional reinforcement learning (RL) approaches often struggle to learn effective policies under sparse reward conditions, necessitating the manual design of complex, task-specific rewar...
- KAN-SAs: Efficient Acceleration of Kolmogorov-Arnold Networks on Systolic Arrays : Abstract: Kolmogorov-Arnold Networks (KANs) have garnered significant attention for their promise of improved parameter efficiency and explainability compared to traditional Deep Neural Networks (DNNs...
- PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving : Abstract: This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume ...
- SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning : Abstract: Recent advances in robotic policy learning have enabled complex manipulation in real-world environments, yet the execution speed of these policies often lags behind hardware capabilities due...
- Satellite to Street : Disaster Impact Estimator : Abstract: Accurate post-disaster damage assessment is of high importance for prioritizing emergency response; however, manual interpretation of satellite imagery is slow, subjective, and hard to scale...
- Enhancing Cognitive Robotics with Commonsense through LLM-Generated Preconditions and Subgoals : Abstract: Robots often fail at everyday tasks because instructions skip commonsense details like hidden preconditions and small subgoals. Traditional symbolic planners need these details to be written...
- A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog Circuits : Abstract: We propose a technique to assist in converting a reference layout of an analog circuit into the procedural layout generator by efficiently reusing available generators for sub-cell creation....
- Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection : Abstract: Accurate single cell detection in brightfield microscopy is crucial for biological research, yet data scarcity and annotation bottlenecks limit the progress of deep learning methods. We inve...
- InF-ATPG: Intelligent FFR-Driven ATPG with Advanced Circuit Representation Guided Reinforcement Learning : Abstract: Automatic test pattern generation (ATPG) is a crucial process in integrated circuit (IC) design and testing, responsible for efficiently generating test patterns. As semiconductor technology...
- Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks : Abstract: Goal-conditioned policy learning for robotic manipulation presents significant challenges in maintaining performance across diverse objectives and environments. We introduce Hyper-GoalNet, a...
- Efficiently Sampling Interval Patterns from Numerical Databases : Abstract: Pattern sampling has emerged as a promising approach for information discovery in large databases, allowing analysts to focus on a manageable subset of patterns. In this approach, patterns a...
- From RISC-V Cores to Neuromorphic Arrays: A Tutorial on Building Scalable Digital Neuromorphic Processors : Abstract: Digital neuromorphic processors are emerging as a promising computing substrate for low-power, always-on EdgeAI applications. In this tutorial paper, we outline the main architectural design...
- NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration : Abstract: Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on motifs makes GNNs vulnerable to adversarial ...
- Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment : Abstract: With the rise of AI-generated content (AIGC), generating perceptually natural and feeling-aligned music from multimodal inputs has become a central challenge. Existing approaches often rely ...
- RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding : Abstract: Protein inverse folding, the design of an amino acid sequence based on a target 3D structure, is a fundamental problem of computational protein engineering. Existing methods either generate ...
- Generating Verifiable CoT from Execution-Traces : Abstract: Teaching language models to reason about code execution remains a fundamental challenge. While Chain-of-Thought (CoT) prompting has shown promise, current synthetic training data suffers fro...
- Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation : Abstract: Deep learning models for breast cancer detection from mammographic images have significant reliability problems when presented with Out-of-Distribution (OOD) inputs such as other imaging mod...
- Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation : Abstract: Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance, yet systematic benchmarks for evaluating large language models on this...
- DeFi TrustBoost: Blockchain and AI for Trustworthy Decentralized Financial Decisions : Abstract: This research introduces the Decentralized Finance (DeFi) TrustBoost Framework, which combines blockchain technology and Explainable AI to address challenges faced by lenders underwriting sm...
- Tuning Universality in Deep Neural Networks : Abstract: Deep neural networks (DNNs) exhibit crackling-like avalanches whose origin lacks a mechanistic explanation. Here, I derive a stochastic theory of deep information propagation (DIP) by incorp...
- Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting Systems : Abstract: Accurate material recognition is critical for safe and effective laser cutting, as misidentification can lead to poor cut quality, machine damage, or the release of hazardous fumes. Laser sp...
- Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning : Abstract: Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure...
- Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees : Abstract: In creating sentence embeddings for Natural Language Inference (NLI) tasks, using transformer-based models like BERT leads to high accuracy, but require hundreds of millions of parameters. T...
- Constructing Efficient Fact-Storing MLPs for Transformers : Abstract: The success of large language models (LLMs) can be attributed in part to their ability to efficiently store factual knowledge as key-value mappings within their MLP parameters. Recent work h...
- On the Prediction of Wi-Fi Performance through Deep Learning : Abstract: Ensuring reliable and predictable communications is one of the main goals in modern industrial systems that rely on Wi-Fi networks, especially in scenarios where continuity of operation and ...
- DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation : Abstract: 3D understanding is a key capability for real-world AI assistance. High-quality data plays an important role in driving the development of the 3D understanding community. Current 3D scene un...
- CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization : Abstract: This work introduces CodeFlowLM, an incremental learning framework for Just-In-Time Software Defect Prediction (JIT-SDP) that leverages pre-trained language models (PLMs). Unlike traditional...
- OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion : Abstract: There has been significant progress in open-source text-only translation large language models (LLMs) with better language coverage and quality. However, these models can be only used in cas...
- Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves : Abstract: Sheaf Neural Networks equip graph structures with a cellular sheaf: a geometric structure which assigns local vector spaces (stalks) and a linear learnable restriction/transport maps to node...
- Optimizing Information Asset Investment Strategies in the Exploratory Phase of the Oil and Gas Industry: A Reinforcement Learning Approach : Abstract: Our work investigates the economic efficiency of the prevailing "ladder-step" investment strategy in oil and gas exploration, which advocates for the incremental acquisition of geological in...
- A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations : Abstract: In the domain of combat simulations in support of wargaming, the development of intelligent agents has predominantly been characterized by rule-based, scripted methodologies with deep reinfo...
- USB: Unified Synthetic Brain Framework for Bidirectional Pathology-Healthy Generation and Editing : Abstract: Understanding the relationship between pathological and healthy brain structures is fundamental to neuroimaging, connecting disease diagnosis and detection with modeling, prediction, and tre...
- RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language generation and reasoning. However, their integration into automated software ecosystems is often hi...
- CogEvo-Edu: Cognitive Evolution Educational Multi-Agent Collaborative System : Abstract: Large language models (LLMs) are increasingly deployed as conversational tutors in STEM education, yet most systems still rely on a single LLM with a static retrieval-augmented generation (R...
- Echo-N1: Affective RL Frontier : Abstract: The LLM field has spent a year perfecting RL for tasks machines already excel at, math, code, and deterministic reasoning, while completely sidestepping the domain that actually defines huma...
- Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models : Abstract: Are frontier AI systems becoming more capable? Certainly. Yet such progress is not an unalloyed blessing but rather a Trojan horse: behind their performance leaps lie more insidious and dest...
- GreenPlanner: Practical Floorplan Layout Generation via an Energy-Aware and Function-Feasible Generative Framework : Abstract: Building design directly affects human well-being and carbon emissions, yet generating spatial-functional and energy-compliant floorplans remains manual, costly, and non-scalable. Existing m...
- Mind the data gap: Missingness Still Shapes Large Language Model Prognoses : Abstract: Data collection often reflects human decisions. In healthcare, for instance, a referral for a diagnostic test is influenced by the patient's health, their preferences, available resources, a...
- Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization : Abstract: Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1...
- EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients : Abstract: Diffusion-based large language models (dLLMs) refine token generations through iterative denoising, but answers often stabilize before all steps complete. We propose EDIT (Early Diffusion In...
- Model of human cognition : Abstract: The development of large language models (LLMs) is limited by a lack of explainability, the absence of a unifying theory, and prohibitive operational costs. We propose a neuro-theoretical fr...
- When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF : Abstract: Quality of datasets plays an important role in large language model (LLM) alignment. In collecting human feedback, however, preference flipping is ubiquitous and causes corruption in data an...
- SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs : Abstract: In this paper, we point out that the objective of the retrieval algorithms is to align with the LLM, which is similar to the objective of knowledge distillation in LLMs. We analyze the simil...
- Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens : Abstract: Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like beh...
- MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents : Abstract: With the advancement of computational resources, Large Vision-Language Models (LVLMs) exhibit impressive Perception and Reasoning (P&R) performance on Graphical User Interface (GUI) tasks. H...
- BioPro: On Difference-Aware Gender Fairness for Vision-Language Models : Abstract: Vision-Language Models (VLMs) inherit significant social biases from their training data, notably in gender representation. Current fairness interventions often adopt a difference-unaware pe...
- Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning : Abstract: MLLMs MLLMs are beginning to appear in clinical workflows, but their ability to perform complex medical reasoning remains unclear. We present Med-CMR, a fine-grained Medical Complex Multimod...
- SemAgent: Semantic-Driven Agentic AI Empowered Trajectory Prediction in Vehicular Networks : Abstract: Efficient information exchange and reliable contextual reasoning are essential for vehicle-to-everything (V2X) networks. Conventional communication schemes often incur significant transmissi...
- Assessing model error in counterfactual worlds : Abstract: Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, sc...
- ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI : Abstract: This paper introduces ARCADIA, an agentic AI framework for causal discovery that integrates large-language-model reasoning with statistical diagnostics to construct valid, temporally coheren...
- One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces : Abstract: Embedding spaces are fundamental to modern AI, translating raw data into high-dimensional vectors that encode rich semantic relationships. Yet, their internal structures remain opaque, with ...
- Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing : Abstract: Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final ans...
- Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood. In this paper, we investigate the structural vuln...
- Integrating Causal Foundation Model in Prescriptive Maintenance Framework for Optimizing Production Line OEE : Abstract: The transition to prescriptive maintenance in manufacturing is critically constrained by a dependence on predictive models. These models tend to rely on spurious correlations rather than ide...
- IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch : Abstract: We introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language...
- ChartAnchor: Chart Grounding with Structural-Semantic Fidelity : Abstract: Recent advances in multimodal large language models (MLLMs) highlight the need for benchmarks that rigorously evaluate structured chart comprehension.Chart grounding refers to the bidirectio...
- Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics : Abstract: Evaluating the quality of LLM-generated reasoning traces in expert domains (e.g., law) is essential for ensuring credibility and explainability, yet remains challenging due to the inherent c...
- Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal : Abstract: The scarcity of high-quality, logically annotated video datasets remains a primary bottleneck in advancing Multi-Modal Large Language Models (MLLMs) for the medical domain. Traditional manua...
- Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids : Abstract: Reinforcement learning (RL) is a powerful framework for optimizing decision-making in complex systems under uncertainty, an essential challenge in real-world settings, particularly in the co...
- Automating the Refinement of Reinforcement Learning Specifications : Abstract: Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful polici...
- SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds : Abstract: While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents th...
- Testing the Machine Consciousness Hypothesis : Abstract: The Machine Consciousness Hypothesis states that consciousness is a substrate-free functional property of computational systems capable of second-order perception. I propose a research progr...
- CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents : Abstract: Automated Scientific Discovery (ASD) systems can help automatically generate and run code-based experiments, but their capabilities are limited by the code they can reliably generate from pa...
- Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems : Abstract: As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. Today, the task of orchestrating...
- Foundation Priors : Abstract: Foundation models, and in particular large language models, can generate highly informative responses, prompting growing interest in using these ''synthetic'' outputs as data in empirical re...
- A Benchmark of Causal vs Correlation AI for Predictive Maintenance : Abstract: Predictive maintenance in manufacturing environments presents a challenging optimization problem characterized by extreme cost asymmetry, where missed failures incur costs roughly fifty time...
- fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment : Abstract: Understanding how the brain responds to external stimuli and decoding this process has been a significant challenge in neuroscience. While previous studies typically concentrated on brain-to...
- Knowledge Graph Augmented Large Language Models for Next-Visit Disease Prediction : Abstract: Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level ...
- Unsupervised decoding of encoded reasoning using language model interpretability : Abstract: As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate wheth...
- OntoMetric: An Ontology-Guided Framework for Automated ESG Knowledge Graph Construction : Abstract: Environmental, Social, and Governance (ESG) disclosure frameworks such as SASB, TCFD, and IFRS S2 require organizations to compute and report numerous metrics for compliance, yet these requi...
- RoboDriveVLM: A Novel Benchmark and Baseline towards Robust Vision-Language Models for Autonomous Driving : Abstract: Current Vision-Language Model (VLM)-based end-to-end autonomous driving systems often leverage large language models to generate driving decisions directly based on their understanding of th...
- CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL : Abstract: Large language model based agents are increasingly deployed in complex, tool augmented environments. While reinforcement learning provides a principled mechanism for such agents to improve t...
- Extending NGU to Multi-Agent RL: A Preliminary Study : Abstract: The Never Give Up (NGU) algorithm has proven effective in reinforcement learning tasks with sparse rewards by combining episodic novelty and intrinsic motivation. In this work, we extend NGU...
- A Fast Heuristic Search Approach for Energy-Optimal Profile Routing for Electric Vehicles : Abstract: We study the energy-optimal shortest path problem for electric vehicles (EVs) in large-scale road networks, where recuperated energy along downhill segments introduces negative energy costs....
- Benchmarking Overton Pluralism in LLMs : Abstract: We introduce a novel framework for measuring Overton pluralism in LLMs--the extent to which diverse viewpoints are represented in model outputs. We (i) formalize Overton pluralism as a set c...
- The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness : Abstract: Although synthetic data is widely promoted as a remedy, its prevailing production paradigm -- one optimizing for statistical smoothness -- systematically removes the long-tail, cognitively g...
- A Flexible Multi-Agent LLM-Human Framework for Fast Human Validated Tool Building : Abstract: We introduce CollabToolBuilder, a flexible multiagent LLM framework with expert-in-the-loop (HITL) guidance that iteratively learns to create tools for a target goal, aligning with human int...
- A Selective Temporal Hamming distance to find patterns in state transition event timeseries, at scale : Abstract: Discrete event systems are present both in observations of nature, socio economical sciences, and industrial systems. Standard analysis approaches do not usually exploit their dual event / s...
- Automated Risk-of-Bias Assessment of Randomized Controlled Trials: A First Look at a GEPA-trained Programmatic Prompting Framework : Abstract: Assessing risk of bias (RoB) in randomized controlled trials is essential for trustworthy evidence synthesis, but the process is resource-intensive and prone to variability across reviewers....
- Multi-Path Collaborative Reasoning via Reinforcement Learning : Abstract: Chain-of-Thought (CoT) reasoning has significantly advanced the problem-solving capabilities of Large Language Models (LLMs), yet conventional CoT often exhibits internal determinism during ...
- SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic Chemistry : Abstract: Modern computer-assisted synthesis planning (CASP) systems show promises at generating chemically valid reaction steps but struggle to incorporate strategic considerations such as convergent...
- LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems : Abstract: Large language models (LLMs) often generate unreliable answers, while heuristic uncertainty methods fail to fully distinguish correct from incorrect predictions, causing users to accept erro...
- CLIP-RL: Aligning Language and Policy Representations for Task Transfer in Reinforcement Learning : Abstract: Recently, there has been an increasing need to develop agents capable of solving multiple tasks within the same environment, especially when these tasks are naturally associated with languag...
- Probabilistic Neuro-Symbolic Reasoning for Sparse Historical Data: A Framework Integrating Bayesian Inference, Causal Models, and Game-Theoretic Allocation : Abstract: Modeling historical events poses fundamental challenges for machine learning: extreme data scarcity (N << 100), heterogeneous and noisy measurements, missing counterfactuals, and the require...
- Who Judges the Judge? LLM Jury-on-Demand: Building Trustworthy LLM Evaluation Systems : Abstract: As Large Language Models (LLMs) become integrated into high-stakes domains, there is a growing need for evaluation methods that are both scalable for real-time deployment and reliable for cr...
- H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons : Abstract: Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations fr...
- Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees : Abstract: We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a root...
- Graph Distance as Surprise: Free Energy Minimization in Knowledge Graph Reasoning : Abstract: In this work, we propose that reasoning in knowledge graph (KG) networks can be guided by surprise minimization. Entities that are close in graph distance will have lower surprise than those...
- Predicting Human Chess Moves: An AI Assisted Analysis of Chess Games Using Skill-group Specific n-gram Language Models : Abstract: Chess, a deterministic game with perfect information, has long served as a benchmark for studying strategic decision-making and artificial intelligence. Traditional chess engines or tools fo...
- Learned-Rule-Augmented Large Language Model Evaluators : Abstract: Large language models (LLMs) are predominantly used as evaluators for natural language generation (NLG) tasks, but their application to broader evaluation scenarios remains limited. In this ...
- From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning : Abstract: The mechanism by which RL contributes to reasoning capabilities-whether it incentivizes the synthesis of new skills or merely amplifies existing behaviors-remains a subject of intense debate...
- Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback : Abstract: GUI grounding aims to align natural language instructions with precise regions in complex user interfaces. Advanced multimodal large language models show strong ability in visual GUI groundi...
- Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions : Abstract: Automated theorem proving in Euclidean geometry, particularly for International Mathematical Olympiad (IMO) level problems, remains a major challenge and an important research focus in Artif...
- Chunking Strategies for Multimodal AI Systems : Abstract: Our goal is to consolidate the landscape of multimodal chunking strategies, providing researchers and practitioners with a technical foundation and design space for developing more effective...
- A Rosetta Stone for AI Benchmarks : Abstract: Most AI benchmarks saturate within years or even months after they are introduced, making it hard to study long-run trends in AI capabilities. To address this challenge, we build a statistic...
- Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability? : Abstract: AI systems that output their reasoning in natural language offer an opportunity for safety -- we can \emph{monitor} their chain of thought (CoT) for undesirable reasoning, such as the pursui...
- Trification: A Comprehensive Tree-based Strategy Planner and Structural Verification for Fact-Checking : Abstract: Technological advancement allows information to be shared in just a single click, which has enabled the rapid spread of false information. This makes automated fact-checking system necessary...
- ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning : Abstract: Multimodal Large Language Models (MLLMs) have emerged as powerful tools for chart comprehension. However, they heavily rely on extracted content via OCR, which leads to numerical hallucinati...
Research Sources: 823 | Generated: 12/2/2025
