AI RESEARCH PAPERS & ACADEMIC SOURCES
- Multiview point cloud registration with anisotropic and space-varying localization noise : Abstract: In this paper, we address the problem of registering multiple point clouds corrupted with high anisotropic localization noise. Our approach follows the widely used framework of Gaussian mixt...
- AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations : Abstract: AutoFocus-IL is a simple yet effective method to improve data efficiency and generalization in visual imitation learning by guiding policies to attend to task-relevant features rather than d...
- Inverse Rendering for High-Genus Surface Meshes from Multi-View Images : Abstract: We present a topology-informed inverse rendering approach for reconstructing high-genus surface meshes from multi-view images. Compared to 3D representations like voxels and point clouds, me...
- CNN-Based Camera Pose Estimation and Localisation of Scan Images for Aircraft Visual Inspection : Abstract: General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perf...
- Neural B-Frame Coding: Tackling Domain Shift Issues with Lightweight Online Motion Resolution Adaptation : Abstract: Learned B-frame codecs with hierarchical temporal prediction often encounter the domain-shift issue due to mismatches between the Group-of-Pictures (GOP) sizes for training and testing, lead...
- ChronoGS: Disentangling Invariants and Changes in Multi-Period Scenes : Abstract: Multi-period image collections are common in real-world applications. Cities are re-scanned for mapping, construction sites are revisited for progress tracking, and natural regions are monit...
- PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation : Abstract: Video-to-Audio (V2A) generation requires balancing four critical perceptual dimensions: semantic consistency, audio-visual temporal synchrony, aesthetic quality, and spatial accuracy; yet ex...
- MatMart: Material Reconstruction of 3D Objects via Diffusion : Abstract: Applying diffusion models to physically-based material estimation and generation has recently gained prominence. In this paper, we propose \ttt, a novel material reconstruction framework for...
- Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach : Abstract: The widespread application of AIGC contents has brought not only unprecedented opportunities, but also potential security concerns, e.g., audio-visual deepfakes. Therefore, it is of great im...
- FedPoisonTTP: A Threat Model and Poisoning Attack for Federated Test-Time Personalization : Abstract: Test-time personalization in federated learning enables models at clients to adjust online to local domain shifts, enhancing robustness and personalization in deployment. Yet, existing feder...
- The Shape of Sight: A Homological Framework for Unifying Visual Perception : Abstract: Visual perception, the brain's construction of a stable world from sensory data, faces several long-standing, fundamental challenges. While often studied separately, these problems have resi...
- K-FACE: A Large-Scale KIST Face Database in Consideration with Unconstrained Environments : Abstract: In this paper, we introduce a new large-scale face database from KIST, denoted as K-FACE, and describe a novel capturing device specifically designed to obtain the data. The K-FACE database ...
- Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification : Abstract: This work aims to adapt large-scale pre-trained vision-language models, such as contrastive language-image pretraining (CLIP), to enhance the performance of object reidentification (Re-ID) a...
- QGait: Toward Accurate Quantization for Gait Recognition : Abstract: Existing deep learning methods have made significant progress in gait recognition. Quantization can facilitate the application of gait models as a model-agnostic general compression techniqu...
- SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation : Abstract: We introduce SketchDeco, a training-free approach to sketch colourisation that bridges the gap between professional design needs and intuitive, region-based control. Our method empowers arti...
- PriorDrive: Enhancing Online HD Mapping with Unified Vector Priors : Abstract: High-Definition Maps (HD maps) are essential for the precise navigation and decision-making of autonomous vehicles, yet their creation and upkeep present significant cost and timeliness chal...
- Zero-Shot Coreset Selection via Iterative Subspace Sampling : Abstract: Deep learning increasingly relies on massive data with substantial storage, annotation, and training costs. To reduce costs, coreset selection finds a representative subset of data to train ...
- Splats in Splats: Robust and Effective 3D Steganography towards Gaussian Splatting : Abstract: 3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and ...
- STT-GS: Sample-Then-Transmit Edge Gaussian Splatting with Joint Client Selection and Power Control : Abstract: Edge Gaussian splatting (EGS), which aggregates data from distributed clients and trains a global GS model at the edge server, is an emerging paradigm for scene reconstruction. Unlike tradit...
- AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression : Abstract: Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There ...
- Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding : Abstract: Vision-Language-Action (VLA) models have emerged as a promising framework for enabling generalist robots capable of perceiving, reasoning, and acting in the real world. These models usually ...
- Can Modern Vision Models Understand the Difference Between an Object and a Look-alike? : Abstract: Recent advances in computer vision have yielded models with strong performance on recognition benchmarks; however, significant gaps remain in comparison to human perception. One subtle abili...
- NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting can exploit frustum culling and level-of-detail strategies to accelerate rendering of scenes containing a large number of primitives. However, the semi-transparent natu...
- ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment : Abstract: Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have...
- Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving : Abstract: Autonomous driving heavily relies on accurate and robust spatial perception. Many failures arise from inaccuracies and instability, especially in long-tail scenarios and complex interactions...
- IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes : Abstract: Reconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation. Although recent methods achieve high-fidelity reconstructions, they ...
- LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models : Abstract: Humans can perceive and understand 3D space and long videos from sequential visual observations. But do vision-language models (VLMs) can? Recent work demonstrates that even state-of-the-art...
- BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment : Abstract: Conditional image generation enhances text-to-image synthesis with structural, spatial, or stylistic priors, but current methods face challenges in handling conflicts between sources. These ...
- Diffusion Reconstruction-based Data Likelihood Estimation for Core-Set Selection : Abstract: Existing core-set selection methods predominantly rely on heuristic scoring signals such as training dynamics or model uncertainty, lacking explicit modeling of data likelihood. This omissio...
- ReMatch: Boosting Representation through Matching for Multimodal Retrieval : Abstract: We present ReMatch, a framework that leverages the generative strength of MLLMs for multimodal retrieval. Previous approaches treated an MLLM as a simple encoder, ignoring its generative nat...
- DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting : Abstract: This paper addresses the limitations of existing 3D Gaussian Splatting (3DGS) methods, particularly their reliance on adaptive density control, which can lead to floating artifacts and ineff...
- IDEAL-M3D: Instance Diversity-Enriched Active Learning for Monocular 3D Detection : Abstract: Monocular 3D detection relies on just a single camera and is therefore easy to deploy. Yet, achieving reliable 3D understanding from monocular images requires substantial annotation, and 3D ...
- Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection : Abstract: Infrared small target detection remains challenging due to limited feature representation and severe background interference, resulting in sub-optimal performance. While recent CLIP-inspired...
- SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis : Abstract: Hand-Object Interaction (HOI) generation plays a critical role in advancing applications across animation and robotics. Current video-based methods are predominantly single-view, which imped...
- SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation : Abstract: Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Vi...
- MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation : Abstract: Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and s...
- POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse : Abstract: In computer vision, machine unlearning aims to remove the influence of specific visual concepts or training images without retraining from scratch. Studies show that existing approaches ofte...
- Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning : Abstract: RL (reinforcement learning) methods (e.g., GRPO) for MLLM (Multimodal LLM) perception ability has attracted wide research interest owing to its remarkable generalization ability. Nevertheles...
- CellFMCount: A Fluorescence Microscopy Dataset, Benchmark, and Methods for Cell Counting : Abstract: Accurate cell counting is essential in various biomedical research and clinical applications, including cancer diagnosis, stem cell research, and immunology. Manual counting is labor-intensi...
- DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection : Abstract: Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire im...
- UISearch: Graph-Based Embeddings for Multimodal Enterprise UI Screenshots Retrieval : Abstract: Enterprise software companies maintain thousands of user interface screens across products and versions, creating critical challenges for design consistency, pattern discovery, and complianc...
- BackSplit: The Importance of Sub-dividing the Background in Biomedical Lesion Segmentation : Abstract: Segmenting small lesions in medical images remains notoriously difficult. Most prior work tackles this challenge by either designing better architectures, loss functions, or data augmentatio...
- SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation : Abstract: The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse visi...
- Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction : Abstract: SAM3D has garnered widespread attention for its strong 3D object reconstruction capabilities. However, a key limitation remains: SAM3D cannot reconstruct specific objects referred to by text...
- Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution : Abstract: Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often s...
- Cloud4D : Abstract: There has been great progress in improving numerical weather prediction and climate models using machine learning. However, most global models act at a kilometer-scale, making it challenging...
- Are Image-to-Video Models Good Zero-Shot Image Editors? : Abstract: Large-scale video diffusion models show strong world simulation and temporal reasoning abilities, but their use as zero-shot image editors remains underexplored. We introduce IF-Edit, a tuni...
- LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context : Abstract: Physically-based rendering (PBR) provides a principled standard for realistic material-lighting interactions in computer graphics. Despite recent advances in generating PBR textures, existin...
- Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices : Abstract: Augmented Reality (AR) applications often require robust real-time tracking of objects in the user's environment to correctly overlay virtual content. Recent advances in computer vision have...
- TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots : Abstract: Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current patholog...
- Robust Detection of Retinal Neovascularization in Widefield Optical Coherence Tomography : Abstract: Retinal neovascularization (RNV) is a vision threatening development in diabetic retinopathy (DR). Vision loss associated with RNV is preventable with timely intervention, making RNV clinica...
- MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots : Abstract: Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level...
- Spectral Super-Resolution Neural Operator with Atmospheric Radiative Transfer Prior : Abstract: Spectral super-resolution (SSR) aims to reconstruct hyperspectral images (HSIs) from multispectral observations, with broad applications in remote sensing. Data-driven methods are widely use...
- Animated Territorial Data Extractor (ATDE): A Computer-Vision Method for Extracting Territorial Data from Animated Historical Maps : Abstract: We present Animated Territorial Data Extractor (ATDE), a computer vision tool that extracts quantitative territorial data from animated historical map videos. ATDE employs HSV-based color se...
- Switch-JustDance: Benchmarking Whole Body Motion Tracking Policies Using a Commercial Console Game : Abstract: Recent advances in whole-body robot control have enabled humanoid and legged robots to perform increasingly agile and coordinated motions. However, standardized benchmarks for evaluating the...
- Linear Algebraic Approaches to Neuroimaging Data Compression: A Comparative Analysis of Matrix and Tensor Decomposition Methods for High-Dimensional Medical Images : Abstract: This paper evaluates Tucker decomposition and Singular Value Decomposition (SVD) for compressing neuroimaging data. Tucker decomposition preserves multi-dimensional relationships, achieving ...
- Enhancing UAV Search under Occlusion using Next Best View Planning : Abstract: Search and rescue missions are often critical following sudden natural disasters or in high-risk environmental situations. The most challenging search and rescue missions involve difficult-t...
- Self-Empowering VLMs: Achieving Hierarchical Consistency via Self-Elicited Knowledge Distillation : Abstract: Vision-language models (VLMs) possess rich knowledge but often fail on hierarchical understanding tasks, where the goal is to predict a coarse-to-fine taxonomy path that remains consistent a...
- Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting : Abstract: Vision Transformers (ViTs) have demonstrated strong capabilities in capturing global dependencies but often struggle to efficiently represent fine-grained local details. Existing multi-scale...
- Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric : Abstract: Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms ex...
- ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay : Abstract: Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs ...
- Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation : Abstract: Direct Preference Optimization (DPO) has shown promising results in aligning generative outputs with human preferences by distinguishing between chosen and rejected samples. However, a criti...
- LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space : Abstract: Perception of Low-Altitude Aircraft (LAA) in 3D space enables precise 3D object localization and behavior understanding. However, datasets tailored for 3D LAA perception remain scarce. To ad...
- Granular Computing-driven SAM: From Coarse-to-Fine Guidance for Prompt-Free Segmentation : Abstract: Prompt-free image segmentation aims to generate accurate masks without manual guidance. Typical pre-trained models, notably Segmentation Anything Model (SAM), generate prompts directly at a ...
- DEAP-3DSAM: Decoder Enhanced and Auto Prompt SAM for 3D Medical Image Segmentation : Abstract: The Segment Anything Model (SAM) has recently demonstrated significant potential in medical image segmentation. Although SAM is primarily trained on 2D images, attempts have been made to app...
- Graph-based 3D Human Pose Estimation using WiFi Signals : Abstract: WiFi-based human pose estimation (HPE) has attracted increasing attention due to its resilience to occlusion and privacy-preserving compared to camera-based methods. However, existing WiFi-b...
- HABIT: Human Action Benchmark for Interactive Traffic in CARLA : Abstract: Current autonomous driving (AD) simulations are critically limited by their inadequate representation of realistic and diverse human behavior, which is essential for ensuring safety and reli...
- 3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion : Abstract: The miniaturization of thermal sensors for mobile platforms inherently limits their spatial resolution and textural fidelity, leading to blurry and less informative images. Existing thermal ...
- MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images : Abstract: Spatial reasoning (SR), the ability to infer 3D spatial information from 2D inputs, is essential for real-world applications such as embodied AI and autonomous driving. However, existing res...
- Growing with the Generator: Self-paced GRPO for Video Generation : Abstract: Group Relative Policy Optimization (GRPO) has emerged as a powerful reinforcement learning paradigm for post-training video generation models. However, existing GRPO pipelines rely on static...
- When Semantics Regulate: Rethinking Patch Shuffle and Internal Bias for Generated Image Detection with CLIP : Abstract: The rapid progress of GANs and Diffusion Models poses new challenges for detecting AI-generated images. Although CLIP-based detectors exhibit promising generalization, they often rely on sem...
- MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery : Abstract: Small object detection in Unmanned Aerial Vehicle (UAV) imagery is a persistent challenge, hindered by low resolution and background clutter. While fusing RGB and infrared (IR) data offers a...
- FilmSceneDesigner: Chaining Set Design for Procedural Film Scene Generation : Abstract: Film set design plays a pivotal role in cinematic storytelling and shaping the visual atmosphere. However, the traditional process depends on expert-driven manual modeling, which is labor-in...
- ABM-LoRA: Activation Boundary Matching for Fast Convergence in Low-Rank Adaptation : Abstract: We propose Activation Boundary Matching for Low-Rank Adaptation (ABM-LoRA), a principled initialization strategy that substantially accelerates the convergence of low-rank adapters. While Lo...
- Test-Time Preference Optimization for Image Restoration : Abstract: Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introdu...
- MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes : Abstract: Recently, 3D Gaussian Splatting and its derivatives have achieved significant breakthroughs in large-scale scene reconstruction. However, how to efficiently and stably achieve high-quality g...
- Evaluating Deep Learning and Traditional Approaches Used in Source Camera Identification : Abstract: One of the most important tasks in computer vision is identifying the device using which the image was taken, useful for facilitating further comprehensive analysis of the image. This paper ...
- nnActive: A Framework for Evaluation of Active Learning in 3D Biomedical Segmentation : Abstract: Semantic segmentation is crucial for various biomedical applications, yet its reliance on large annotated datasets presents a bottleneck due to the high cost and specialized expertise requir...
- Three-Dimensional Anatomical Data Generation Based on Artificial Neural Networks : Abstract: Surgical planning and training based on machine learning requires a large amount of 3D anatomical models reconstructed from medical imaging, which is currently one of the major bottlenecks. ...
- Leveraging Metaheuristic Approaches to Improve Deep Learning Systems for Anxiety Disorder Detection : Abstract: Despite being among the most common psychological disorders, anxiety-related conditions are still primarily identified through subjective assessments, such as clinical interviews and self-ev...
- VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction : Abstract: The scalability of video understanding models is increasingly limited by the prohibitive storage and computational costs of large-scale video datasets. While data synthesis has improved data...
- FVAR: Visual Autoregressive Modeling via Next Focus Prediction : Abstract: Visual autoregressive models achieve remarkable generation quality through next-scale predictions across multi-scale token pyramids. However, the conventional method uses uniform scale downs...
- Robust Long-term Test-Time Adaptation for 3D Human Pose Estimation through Motion Discretization : Abstract: Online test-time adaptation addresses the train-test domain gap by adapting the model on unlabeled streaming test inputs before making the final prediction. However, online adaptation for 3D...
- Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling : Abstract: Dataset distillation creates a small distilled set that enables efficient training by capturing key information from the full dataset. While existing dataset distillation methods perform wel...
- DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection : Abstract: Recent salient object detection (SOD) methods aim to improve performance in four key directions: semantic enhancement, boundary refinement, auxiliary task supervision, and multi-modal fusion...
- HunyuanVideo 1.5 Technical Report : Abstract: We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters...
- Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a leading approach for high-quality novel view synthesis, with numerous variants extending its applicability to a broad spectrum of 3D and 4D scen...
- Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference : Abstract: Multimodal large language models (MLLMs) deliver impressive vision-language reasoning but suffer steep inference latency because self-attention scales quadratically with sequence length and ...
- Facade Segmentation for Solar Photovoltaic Suitability : Abstract: Building integrated photovoltaic (BIPV) facades represent a promising pathway towards urban decarbonization, especially where roof areas are insufficient and ground-mounted arrays are infeas...
- MagicWorld: Interactive Geometry-driven Video World Exploration : Abstract: Recent interactive video world model methods generate scene evolution conditioned on user instructions. Although they achieve impressive results, two key limitations remain. First, they fail...
- MFmamba: A Multi-function Network for Panchromatic Image Resolution Restoration Based on State-Space Model : Abstract: Remote sensing images are becoming increasingly widespread in military, earth resource exploration. Because of the limitation of a single sensor, we can obtain high spatial resolution graysc...
- EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models : Abstract: Video large language models have demonstrated strong video understanding capabilities but suffer from high inference costs due to the massive number of tokens in long videos. Inspired by eve...
- BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models : Abstract: Backdoor attacks undermine the reliability and trustworthiness of machine learning systems by injecting hidden behaviors that can be maliciously activated at inference time. While such threa...
- One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control : Abstract: We present One4D, a unified framework for 4D generation and reconstruction that produces dynamic 4D content as synchronized RGB frames and pointmaps. By consistently handling varying sparsit...
- AttenDence: Maximizing Attention Confidence for Test Time Adaptation : Abstract: Test-time adaptation (TTA) enables models to adapt to distribution shifts at inference time. While entropy minimization over the output distribution has proven effective for TTA, transformer...
- FineXtrol: Controllable Motion Generation via Fine-Grained Text : Abstract: Recent works have sought to enhance the controllability and precision of text-driven motion generation. Some approaches leverage large language models (LLMs) to produce more detailed texts, ...
- Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search : Abstract: Recent progress in robotics and embodied AI is largely driven by Large Multimodal Models (LMMs). However, a key challenge remains underexplored: how can we advance LMMs to discover tasks tha...
- VeCoR - Velocity Contrastive Regularization for Flow Matching : Abstract: Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however...
- Leveraging Adversarial Learning for Pathological Fidelity in Virtual Staining : Abstract: In addition to evaluating tumor morphology using H&E staining, immunohistochemistry is used to assess the presence of specific proteins within the tissue. However, this is a costly and labor...
- Eevee: Towards Close-up High-resolution Video-based Virtual Try-on : Abstract: Video virtual try-on technology provides a cost-effective solution for creating marketing videos in fashion e-commerce. However, its practical adoption is hindered by two critical limitation...
- CataractCompDetect: Intraoperative Complication Detection in Cataract Surgery : Abstract: Cataract surgery is one of the most commonly performed surgeries worldwide, yet intraoperative complications such as iris prolapse, posterior capsule rupture (PCR), and vitreous loss remain ...
- Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs : Abstract: We address two fundamental challenges in adapting general deep CNNs for FHE-based inference: approximating non-linear activations such as ReLU with low-degree polynomials while minimizing ac...
- Zero-shot segmentation of skin tumors in whole-slide images with vision-language foundation models : Abstract: Accurate annotation of cutaneous neoplasm biopsies represents a major challenge due to their wide morphological variability, overlapping histological patterns, and the subtle distinctions be...
- UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection : Abstract: In deepfake detection, the varying degrees of compression employed by social media platforms pose significant challenges for model generalization and reliability. Although existing methods h...
- View-Consistent Diffusion Representations for 3D-Consistent Video Generation : Abstract: Video generation models have made significant progress in generating realistic content, enabling applications in simulation, gaming, and film making. However, current generated videos still ...
- AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization : Abstract: With the rapid advancement of sophisticated synthetic audio-visual content, e.g., for subtle malicious manipulations, ensuring the integrity of digital media has become paramount. This work ...
- A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation : Abstract: Text-to-LiDAR generation can customize 3D data with rich structures and diverse scenes for downstream tasks. However, the scarcity of Text-LiDAR pairs often causes insufficient training prio...
- EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses : Abstract: Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propo...
- Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera : Abstract: Modern perception increasingly relies on fisheye, panoramic, and other wide field-of-view (FoV) cameras, yet most pipelines still apply planar CNNs designed for pinhole imagery on 2D grids, ...
- Early Lung Cancer Diagnosis from Virtual Follow-up LDCT Generation via Correlational Autoencoder and Latent Flow Matching : Abstract: Lung cancer is one of the most commonly diagnosed cancers, and early diagnosis is critical because the survival rate declines sharply once the disease progresses to advanced stages. However,...
- InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity : Abstract: Modern vision-language models (VLMs) are expected to have abilities of spatial reasoning with diverse scene complexities, but evaluating such abilities is difficult due to the lack of benchm...
- Generating Synthetic Human Blastocyst Images for In-Vitro Fertilization Blastocyst Grading : Abstract: The success of in vitro fertilization (IVF) at many clinics relies on the accurate morphological assessment of day 5 blastocysts, a process that is often subjective and inconsistent. While a...
- Large-Scale Pre-training Enables Multimodal AI Differentiation of Radiation Necrosis from Brain Metastasis Progression on Routine MRI : Abstract: Background: Differentiating radiation necrosis (RN) from tumor progression after stereotactic radiosurgery (SRS) remains a critical challenge in brain metastases. While histopathology repres...
- Parallel qMRI Reconstruction from 4x Accelerated Acquisitions : Abstract: Magnetic Resonance Imaging (MRI) acquisitions require extensive scan times, limiting patient throughput and increasing susceptibility to motion artifacts. Accelerated parallel MRI techniques...
- EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning : Abstract: Reasoning about intentions and actions from a first-person (egocentric) perspective remains a fundamental challenge for multimodal large language models (MLLMs). Unlike third-person (exocent...
- UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain Generalization : Abstract: LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets,...
- Sequence-Adaptive Video Prediction in Continuous Streams using Diffusion Noise Optimization : Abstract: In this work, we investigate diffusion-based video prediction models, which forecast future video frames, for continuous video streams. In this context, the models observe continuously new t...
- MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation : Abstract: Unified multimodal models aim to integrate understanding and generation within a single framework, yet bridging the gap between discrete semantic reasoning and high-fidelity visual synthesis...
- SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors : Abstract: Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the pre...
- Vision Token Masking Alone Cannot Prevent PHI Leakage in Medical Document OCR: A Systematic Evaluation : Abstract: Large vision-language models (VLMs) are increasingly deployed for optical character recognition (OCR) in healthcare settings, raising critical concerns about protected health information (PH...
- Point-to-Point: Sparse Motion Guidance for Controllable Video Editing : Abstract: Accurately preserving motion while editing a subject remains a core challenge in video editing tasks. Existing methods often face a trade-off between edit and motion fidelity, as they rely o...
- RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System : Abstract: Current roadside perception systems mainly focus on instance-level perception, which fall short in enabling interaction via natural language and reasoning about traffic behaviors in context....
- DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition : Abstract: Large Vision Language Models (LVLMs) possess extensive text knowledge but struggles to utilize this knowledge for fine-grained image recognition, often failing to differentiate between visua...
- Stro-VIGRU: Defining the Vision Recurrent-Based Baseline Model for Brain Stroke Classification : Abstract: Stroke majorly causes death and disability worldwide, and early recognition is one of the key elements of successful treatment of the same. It is common to diagnose strokes using CT scanning...
- Optimal Pose Guidance for Stereo Calibration in 3D Deformation Measurement : Abstract: Stereo optical measurement techniques, such as digital image correlation (DIC), are widely used in 3D deformation measurement as non-contact, full-field measurement methods, in which stereo ...
- SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters : Abstract: Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for bu...
- ConsistCompose: Unified Multimodal Layout Control for Image Composition : Abstract: Unified multimodal models that couple visual understanding with image generation have advanced rapidly, yet most systems still focus on visual grounding-aligning language with image regions-...
- A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles : Abstract: With the proliferation of low altitude unmanned aerial vehicles (UAVs), visual multi-object tracking is becoming a critical security technology, demanding significant robustness even in comp...
- FlowPortal: Residual-Corrected Flow for Training-Free Video Relighting and Background Replacement : Abstract: Video relighting with background replacement is a challenging task critical for applications in film production and creative media. Existing methods struggle to balance temporal consistency,...
- MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference : Abstract: Recent advances in AIGC (Artificial Intelligence Generated Content) models have enabled significant progress in image and video generation. However, users still struggle to obtain content th...
- TRANSPORTER: Transferring Visual Semantics from VLM Manifolds : Abstract: How do video understanding models acquire their answers? Although current Vision Language Models (VLMs) reason over complex scenes with diverse objects, action performances, and scene dynami...
- Alias-free 4D Gaussian Splatting : Abstract: Existing dynamic scene reconstruction methods based on Gaussian Splatting enable real-time rendering and generate realistic images. However, adjusting the camera's focal length or the distan...
- MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer : Abstract: 3D pose transfer aims to transfer the pose-style of a source mesh to a target character while preserving both the target's geometry and the source's pose characteristic. Existing methods are...
- MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models : Abstract: Vision Language Models (VLMs) perform well on standard video tasks but struggle with physics-driven reasoning involving motion dynamics and spatial interactions. This limitation reduces thei...
- Synthetic Curriculum Reinforces Compositional Text-to-Image Generation : Abstract: Text-to-Image (T2I) generation has long been an open problem, with compositional synthesis remaining particularly challenging. This task requires accurate rendering of complex scenes contain...
- RNN as Linear Transformer: A Closer Investigation into Representational Potentials of Visual Mamba Models : Abstract: Mamba has recently garnered attention as an effective backbone for vision tasks. However, its underlying mechanism in visual domains remains poorly understood. In this work, we systematicall...
- ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access : Abstract: Text-to-video generation has surged in interest since Sora, yet open-source models still face a data bottleneck: there is no large, high-quality, easily obtainable video-text corpus. Existin...
- SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation : Abstract: We have introduced SegSplat, a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding. By constructing a co...
- Exploring Weak-to-Strong Generalization for CLIP-based Classification : Abstract: Aligning large-scale commercial models with user intent is crucial to preventing harmful outputs. Current methods rely on human supervision but become impractical as model complexity increas...
- ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering : Abstract: This paper introduces ChineseVideoBench, a pioneering benchmark specifically designed for evaluating Multimodal Large Language Models (MLLMs) in Chinese Video Question Answering. The growing...
- 4D-VGGT: A General Foundation Model with SpatioTemporal Awareness for Dynamic Scene Geometry Estimation : Abstract: We investigate a challenging task of dynamic scene geometry estimation, which requires representing both spatial and temporal features. Typically, existing methods align the two features int...
- CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images : Abstract: Image-to-point cross-modal learning has emerged to address the scarcity of large-scale 3D datasets in 3D representation learning. However, current methods that leverage 2D data often result ...
- LungX: A Hybrid EfficientNet-Vision Transformer Architecture with Multi-Scale Attention for Accurate Pneumonia Detection : Abstract: Pneumonia remains a leading global cause of mortality where timely diagnosis is critical. We introduce LungX, a novel hybrid architecture combining EfficientNet's multi-scale features, CBAM ...
- When Generative Replay Meets Evolving Deepfakes: Domain-Aware Relative Weighting for Incremental Face Forgery Detection : Abstract: The rapid advancement of face generation techniques has led to a growing variety of forgery methods. Incremental forgery detection aims to gradually update existing models with new forgery d...
- Perceptual-Evidence Anchored Reinforced Learning for Multimodal Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) and is now being applied to Vision-Language Models...
- ReCoGS: Real-time ReColoring for Gaussian Splatting scenes : Abstract: Gaussian Splatting has emerged as a leading method for novel view synthesis, offering superior training efficiency and real-time inference compared to NeRF approaches, while still delivering...
- SineProject: Machine Unlearning for Stable Vision Language Alignment : Abstract: Multimodal Large Language Models (MLLMs) increasingly need to forget specific knowledge such as unsafe or private information without requiring full retraining. However, existing unlearning ...
- EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs : Abstract: Multimodal large language models (MLLMs) have made significant advancements in event-based vision, yet the comprehensive evaluation of their capabilities within a unified benchmark remains l...
- NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering : Abstract: Vision Foundation Models (VFMs) extract spatially downsampled representations, posing challenges for pixel-level tasks. Existing upsampling approaches face a fundamental trade-off: classical...
- Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding : Abstract: Sufficient visual perception is the foundation of video reasoning. Nevertheless, existing Video Reasoning LLMs suffer from perception shortcuts, relying on a flawed single-step perception pa...
- Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span : Abstract: People continuously perceive and interact with their surroundings based on underlying intentions that drive their exploration and behaviors. While research in egocentric user and scene under...
- Robust Posterior Diffusion-based Sampling via Adaptive Guidance Scale : Abstract: Diffusion models have recently emerged as powerful generative priors for solving inverse problems, achieving state-of-the-art results across various imaging tasks. A central challenge in thi...
- Uncertainty Quantification in HSI Reconstruction using Physics-Aware Diffusion Priors and Optics-Encoded Measurements : Abstract: Hyperspectral image reconstruction from a compressed measurement is a highly ill-posed inverse problem. Current data-driven methods suffer from hallucination due to the lack of spectral dive...
- Extreme Model Compression for Edge Vision-Language Models: Sparse Temporal Token Fusion and Adaptive Neural Compression : Abstract: The demand for edge AI in vision-language tasks requires models that achieve real-time performance on resource-constrained devices with limited power and memory. This paper proposes two adap...
- LRDUN: A Low-Rank Deep Unfolding Network for Efficient Spectral Compressive Imaging : Abstract: Deep unfolding networks (DUNs) have achieved remarkable success and become the mainstream paradigm for spectral compressive imaging (SCI) reconstruction. Existing DUNs are derived from full-...
- Unified Deep Learning Platform for Dust and Fault Diagnosis in Solar Panels Using Thermal and Visual Imaging : Abstract: Solar energy is one of the most abundant and tapped sources of renewable energies with enormous future potential. Solar panel output can vary widely with factors like intensity, temperature,...
- Breaking Forgetting: Training-Free Few-Shot Class-Incremental Learning via Conditional Diffusion : Abstract: Efforts to overcome catastrophic forgetting in Few-Shot Class-Incremental Learning (FSCIL) have primarily focused on developing more effective gradient-based optimization strategies. In cont...
- DE-KAN: A Kolmogorov Arnold Network with Dual Encoder for accurate 2D Teeth Segmentation : Abstract: Accurate segmentation of individual teeth from panoramic radiographs remains a challenging task due to anatomical variations, irregular tooth shapes, and overlapping structures. These comple...
- HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction : Abstract: Reconstructing high-fidelity MR images from undersampled k-space data requires recovering high-frequency details while maintaining anatomical coherence. We present HiFi-MambaV2, a hierarchic...
- Zero-Shot Video Deraining with Video Diffusion Models : Abstract: Existing video deraining methods are often trained on paired datasets, either synthetic, which limits their ability to generalize to real-world rain, or captured by static cameras, which res...
- C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction : Abstract: Geometric models like DUSt3R have shown great advances in understanding the geometry of a scene from pairs of photos. However, they fail when the inputs are from vastly different viewpoints ...
- PhysGS: Bayesian-Inferred Gaussian Splatting for Physical Property Estimation : Abstract: Understanding physical properties such as friction, stiffness, hardness, and material composition is essential for enabling robots to interact safely and effectively with their surroundings....
- Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation : Abstract: Real-world dark images commonly exhibit not only low visibility and contrast but also complex noise and blur, posing significant restoration challenges. Existing methods often rely on paired...
- NeAR: Coupled Neural Asset-Renderer Stack : Abstract: Neural asset authoring and neural rendering have emerged as fundamentally disjoint threads: one generates digital assets using neural networks for traditional graphics pipelines, while the o...
- RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data : Abstract: In this paper, we present RigAnyFace (RAF), a scalable neural auto-rigging framework for facial meshes of diverse topologies, including those with multiple disconnected components. RAF defor...
- From Healthy Scans to Annotated Tumors: A Tumor Fabrication Framework for 3D Brain MRI Synthesis : Abstract: The scarcity of annotated Magnetic Resonance Imaging (MRI) tumor data presents a major obstacle to accurate and automated tumor segmentation. While existing data synthesis methods offer prom...
- Robust Physical Adversarial Patches Using Dynamically Optimized Clusters : Abstract: Physical adversarial attacks on deep learning systems is concerning due to the ease of deploying such attacks, usually by placing an adversarial patch in a scene to manipulate the outcomes o...
- Data Augmentation Strategies for Robust Lane Marking Detection : Abstract: Robust lane detection is essential for advanced driver assistance and autonomous driving, yet models trained on public datasets such as CULane often fail to generalise across different camer...
- Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement : Abstract: Novel View Synthesis (NVS) is the task of generating new images of a scene from viewpoints that were not part of the original input. Diffusion-based NVS can generate high-quality, temporally...
- Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers : Abstract: Recent advances in diffusion transformers have shown remarkable generalization in visual synthesis, yet most dense perception methods still rely on text-to-image (T2I) generators designed fo...
- A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification : Abstract: Sketch based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to significant modality gaps and limited annotated data. To ...
- Neural Geometry Image-Based Representations with Optimal Transport (OT) : Abstract: Neural representations for 3D meshes are emerging as an effective solution for compact storage and efficient processing. Existing methods often rely on neural overfitting, where a coarse mes...
- Hierarchical GraphCut Phase Unwrapping based on Invariance of Diffeomorphisms Framework : Abstract: Recent years have witnessed rapid advancements in 3D scanning technologies, with applications spanning VR/AR, digital human creation, and medical imaging. Structured-light scanning with phas...
- Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation : Abstract: Robust concept removal for text-to-image (T2I) and text-to-video (T2V) models is essential for their safe deployment. Existing methods, however, suffer from costly retraining, inference over...
- Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents : Abstract: Multimodal Large Language Models (MLLMs) show promising results as decision-making engines for embodied agents operating in complex, physical environments. However, existing benchmarks often...
- EVCC: Enhanced Vision Transformer-ConvNeXt-CoAtNet Fusion for Classification : Abstract: Hybrid vision architectures combining Transformers and CNNs have significantly advanced image classification, but they usually do so at significant computational cost. We introduce EVCC (Enh...
- Exploring Surround-View Fisheye Camera 3D Object Detection : Abstract: In this work, we explore the technical feasibility of implementing end-to-end 3D object detection (3DOD) with surround-view fisheye camera system. Specifically, we first investigate the perf...
- CoD: A Diffusion Foundation Model for Image Compression : Abstract: Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hinderi...
- DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving : Abstract: In autonomous driving, vision-centric 3D object detection recognizes and localizes 3D objects from RGB images. However, due to high annotation costs and diverse outdoor scenes, training data...
- Seeing What Matters: Visual Preference Policy Optimization for Visual Generation : Abstract: Reinforcement learning (RL) has become a powerful tool for post-training visual generative models, with Group Relative Policy Optimization (GRPO) increasingly used to align generators with h...
- GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving : Abstract: Driving planning is a critical component of end-to-end (E2E) autonomous driving. However, prevailing Imitative E2E Planners often suffer from multimodal trajectory mode collapse, failing to ...
- From Features to Reference Points: Lightweight and Adaptive Fusion for Cooperative Autonomous Driving : Abstract: We present RefPtsFusion, a lightweight and interpretable framework for cooperative autonomous driving. Instead of sharing large feature maps or query embeddings, vehicles exchange compact re...
- VAOT: Vessel-Aware Optimal Transport for Retinal Fundus Enhancement : Abstract: Color fundus photography (CFP) is central to diagnosing and monitoring retinal disease, yet its acquisition variability (e.g., illumination changes) often degrades image quality, which motiv...
- NI-Tex: Non-isometric Image-based Garment Texture Generation : Abstract: Existing industrial 3D garment meshes already cover most real-world clothing geometries, yet their texture diversity remains limited. To acquire more realistic textures, generative methods a...
- STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution : Abstract: We present STCDiT, a video super-resolution framework built upon a pre-trained video diffusion model, aiming to restore structurally faithful and temporally stable videos from degraded input...
- StereoDETR: Stereo-based Transformer for 3D Object Detection : Abstract: Compared to monocular 3D object detection, stereo-based 3D methods offer significantly higher accuracy but still suffer from high computational overhead and latency. The state-of-the-art ste...
- Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing : Abstract: While Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras, its practical utility has been fundamentally undermined by a lack of robustness across domains. Models tra...
- PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion : Abstract: Existing autoregressive (AR) methods for generating artist-designed meshes struggle to balance global structural consistency with high-fidelity local details, and are susceptible to error ac...
- TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging : Abstract: X-ray imaging, based on penetration, enables detailed visualization of internal structures. Building on this capability, existing implicit 3D reconstruction methods have adapted the NeRF mod...
- DetAny4D: Detect Anything 4D Temporally in a Streaming RGB Video : Abstract: Reliable 4D object detection, which refers to 3D object detection in streaming video, is crucial for perceiving and understanding the real world. Existing open-set 4D object detection method...
- SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation : Abstract: Out-of-Distribution (OOD) detection in semantic segmentation aims to localize anomalous regions at the pixel level, advancing beyond traditional image-level OOD techniques to better suit rea...
- Disc3D: Automatic Curation of High-Quality 3D Dialog Data via Discriminative Object Referring : Abstract: 3D Multi-modal Large Language Models (MLLMs) still lag behind their 2D peers, largely because large-scale, high-quality 3D scene-dialogue datasets remain scarce. Prior efforts hinge on expen...
- DiP: Taming Diffusion Models in Pixel Space : Abstract: Diffusion models face a fundamental trade-off between generation quality and computational efficiency. Latent Diffusion Models (LDMs) offer an efficient solution but suffer from potential in...
- VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models : Abstract: We propose VideoPerceiver, a novel video multimodal large language model (VMLLM) that enhances fine-grained perception in video understanding, addressing VMLLMs' limited ability to reason ab...
- Q-Save: Towards Scoring and Attribution for Generated Video Evaluation : Abstract: We present Q-Save, a new benchmark dataset and model for holistic and explainable evaluation of AI-generated video (AIGV) quality. The dataset contains near 10000 videos, each annotated with...
- SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration : Abstract: Reliable image correspondences form the foundation of vision-based spatial perception, enabling recovery of 3D structure and camera poses. However, unconstrained feature matching across doma...
- CORA: Consistency-Guided Semi-Supervised Framework for Reasoning Segmentation : Abstract: Reasoning segmentation seeks pixel-accurate masks for targets referenced by complex, often implicit instructions, requiring context-dependent reasoning over the scene. Recent multimodal lang...
- Latent Dirichlet Transformer VAE for Hyperspectral Unmixing with Bundled Endmembers : Abstract: Hyperspectral images capture rich spectral information that enables per-pixel material identification; however, spectral mixing often obscures pure material signatures. To address this chall...
- Deepfake Geography: Detecting AI-Generated Satellite Images : Abstract: The rapid advancement of generative models such as StyleGAN2 and Stable Diffusion poses a growing threat to the authenticity of satellite imagery, which is increasingly vital for reliable an...
- Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets? : Abstract: While recent world models generate highly realistic videos, their ability to perform robot path planning remains unclear and unquantified. We introduce Target-Bench, the first benchmark spec...
- QAL: A Loss for Recall Precision Balance in 3D Reconstruction : Abstract: Volumetric learning underpins many 3D vision tasks such as completion, reconstruction, and mesh generation, yet training objectives still rely on Chamfer Distance (CD) or Earth Mover's Dista...
- Show Me: Unifying Instructional Image and Video Generation with Diffusion Models : Abstract: Generating visual instructions in a given context is essential for developing interactive world simulators. While prior works address this problem through either text-guided image manipulati...
- JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception : Abstract: Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and sensing-range limitations of single-agent systems (e.g., autonomous driving). However, its practicalit...
- ArticFlow: Generative Simulation of Articulated Mechanisms : Abstract: Recent advances in generative models have produced strong results for static 3D shapes, whereas articulated 3D generation remains challenging due to action-dependent deformations and limited...
- MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization : Abstract: In the personalization process of large-scale text-to-image models, overfitting often occurs when learning specific subject from a limited number of images. Existing methods, such as DreamBo...
- CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation : Abstract: Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geo...
- Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization : Abstract: Despite 3D Gaussian Splatting (3DGS) excelling in most configurations, it lacks generalization across novel viewpoints in a few-shot scenario because it overfits to the sparse observations. ...
- UniRSCD: A Unified Novel Architectural Paradigm for Remote Sensing Change Detection : Abstract: In recent years, remote sensing change detection has garnered significant attention due to its critical role in resource monitoring and disaster assessment. Change detection tasks exist with...
- Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion : Abstract: Given just a few glimpses of a scene, can you imagine the movie playing out as the camera glides through it? That's the lens we take on \emph{sparse-input novel view synthesis}, not only as ...
- V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error Correction : Abstract: V2X prediction can alleviate perception incompleteness caused by limited line of sight through fusing trajectory data from infrastructure and vehicles, which is crucial to traffic safety and...
- SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System : Abstract: Recent advancements in multimodal large language models (MLLMs) and video agent systems have significantly improved general video understanding. However, when applied to scientific video und...
- Test-Time Temporal Sampling for Efficient MLLM Video Understanding : Abstract: Processing long videos with multimodal large language models (MLLMs) poses a significant computational challenge, as the model's self-attention mechanism scales quadratically with the number...
- Multi-speaker Attention Alignment for Multimodal Social Interaction : Abstract: Understanding social interaction in video requires reasoning over a dynamic interplay of verbal and non-verbal cues: who is speaking, to whom, and with what gaze or gestures. While Multimoda...
- HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation : Abstract: Growing demands for clinical data privacy and storage constraints have spurred advances in Source Free Unsupervised Domain Adaptation (SFUDA). SFUDA addresses the domain shift by adapting mo...
- X-ReID: Multi-granularity Information Interaction for Video-Based Visible-Infrared Person Re-Identification : Abstract: Large-scale vision-language models (e.g., CLIP) have recently achieved remarkable performance in retrieval tasks, yet their potential for Video-based Visible-Infrared Person Re-Identificatio...
- Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification : Abstract: Multi-modal object Re-IDentification (ReID) is devoted to retrieving specific objects through the exploitation of complementary multi-modal image information. Existing methods mainly concent...
- CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking : Abstract: RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking. However, existing RGBT trackers struggle to resolve modality discr...
- Adversarial Pseudo-replay for Exemplar-free Class-incremental Learning : Abstract: Exemplar-free class-incremental learning (EFCIL) aims to retain old knowledge acquired in the previous task while learning new classes, without storing the previous images due to storage con...
- FeRA: Frequency-Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning : Abstract: Diffusion models have achieved remarkable success in generative modeling, yet how to effectively adapt large pretrained models to new tasks remains challenging. We revisit the reconstruction...
- HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation : Abstract: Accurate organ and lesion segmentation is a critical prerequisite for computer-aided diagnosis. Convolutional Neural Networks (CNNs), constrained by their local receptive fields, often strug...
- SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining : Abstract: Image deraining is crucial for vision applications but is challenged by the complex multi-scale physics of rain and its coupling with scenes. To address this challenge, a novel approach insp...
- RAISECity: A Multimodal Agent Framework for Reality-Aligned 3D World Generation at City-Scale : Abstract: City-scale 3D generation is of great importance for the development of embodied intelligence and world models. Existing methods, however, face significant challenges regarding quality, fidel...
- Is Complete Labeling Necessary? Understanding Active Learning in Longitudinal Medical Imaging : Abstract: Detecting changes in longitudinal medical imaging using deep learning requires a substantial amount of accurately labeled data. However, labeling these images is notably more costly and time...
- RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios : Abstract: Multimodal large language models (MLLMs) have demonstrated powerful capabilities in general spatial understanding and reasoning. However, their fine-grained spatial understanding and reasoni...
- State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection : Abstract: Open-Vocabulary Object Detection (OVOD) aims to generalize object recognition to novel categories, while Weakly Supervised OVOD (WS-OVOD) extends this by combining box-level annotations with...
- MambaX: Image Super-Resolution with State Predictive Control : Abstract: Image super-resolution (SR) is a critical technology for overcoming the inherent hardware limitations of sensors. However, existing approaches mainly focus on directly enhancing the final re...
- Hybrid Event Frame Sensors: Modeling, Calibration, and Simulation : Abstract: Event frame hybrid sensors integrate an Active Pixel Sensor (APS) and an Event Vision Sensor (EVS) within a single chip, combining the high dynamic range and low latency of the EVS with the ...
- UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios : Abstract: Diffusion transformers have recently delivered strong text-to-image generation around 1K resolution, but we show that extending them to native 4K across diverse aspect ratios exposes a tight...
- Hierarchical Semi-Supervised Active Learning for Remote Sensing : Abstract: The performance of deep learning models in remote sensing (RS) strongly depends on the availability of high-quality labeled data. However, collecting large-scale annotations is costly and ti...
- A Lightweight, Interpretable Deep Learning System for Automated Detection of Cervical Adenocarcinoma In Situ (AIS) : Abstract: Cervical adenocarcinoma in situ (AIS) is a critical premalignant lesion whose accurate histopathological diagnosis is challenging. Early detection is essential to prevent progression to inva...
- VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection : Abstract: To identify objects beyond predefined categories, open-vocabulary aerial object detection (OVAD) leverages the zero-shot capabilities of visual-language models (VLMs) to generalize from base...
- ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models : Abstract: Recent Vision-Language-Action (VLA) models have shown impressive flexibility and generalization, yet their deployment in robotic manipulation remains limited by heavy computational overhead ...
- Less Is More: An Explainable AI Framework for Lightweight Malaria Classification : Abstract: Background and Objective: Deep learning models have high computational needs and lack interpretability but are often the first choice for medical image classification tasks. This study addre...
- Together, Then Apart: Revisiting Multimodal Survival Analysis via a Min-Max Perspective : Abstract: Integrating heterogeneous modalities such as histopathology and genomics is central to advancing survival analysis, yet most existing methods prioritize cross-modal alignment through attenti...
- Versatile Recompression-Aware Perceptual Image Super-Resolution : Abstract: Perceptual image super-resolution (SR) methods restore degraded images and produce sharp outputs. In practice, those outputs are usually recompressed for storage and transmission. Ignoring r...
- Spotlight: Identifying and Localizing Video Generation Errors Using VLMs : Abstract: Current text-to-video models (T2V) can generate high-quality, temporally coherent, and visually realistic videos. Nonetheless, errors still often occur, and are more nuanced and local compar...
- Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning : Abstract: The proliferation of videos generated by diffusion models has raised increasing concerns about information security, highlighting the urgent need for reliable detection of synthetic media. E...
- Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training : Abstract: We present Muskie, a native multi-view vision backbone designed for 3D vision tasks. Unlike existing models, which are frame-wise and exhibit limited multi-view consistency, Muskie is design...
- PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures : Abstract: Zero-Shot Anomaly Detection (ZSAD) aims to identify and localize anomalous regions in images of unseen object classes. While recent methods based on vision-language models like CLIP show pro...
- MVS-TTA: Test-Time Adaptation for Multi-View Stereo via Meta-Auxiliary Learning : Abstract: Recent learning-based multi-view stereo (MVS) methods are data-driven and have achieved remarkable progress due to large-scale training data and advanced architectures. However, their genera...
- SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation : Abstract: Real-time 3D hand forecasting is a critical component for fluid human-computer interaction in applications like AR and assistive robotics. However, existing methods are ill-suited for these ...
- Video4Edit: Viewing Image Editing as a Degenerate Temporal Process : Abstract: We observe that recent advances in multimodal foundation models have propelled instruction-driven image generation and editing into a genuinely cross-modal, cooperative regime. Nevertheless,...
- Compact neural networks for astronomy with optimal transport bias correction : Abstract: Astronomical imaging confronts an efficiency-resolution tradeoff that limits large-scale morphological classification and redshift prediction. We introduce WaveletMamba, a theory-driven fram...
- Matching-Based Few-Shot Semantic Segmentation Models Are Interpretable by Design : Abstract: Few-Shot Semantic Segmentation (FSS) models achieve strong performance in segmenting novel classes with minimal labeled examples, yet their decision-making processes remain largely opaque. W...
- VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format : Abstract: Recent researches on video large language models (VideoLLM) predominantly focus on model architectures and training datasets, leaving the interaction format between the user and the model un...
- BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction : Abstract: Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest ...
- 3D Ground Truth Reconstruction from Multi-Camera Annotations Using UKF : Abstract: Accurate 3D ground truth estimation is critical for applications such as autonomous navigation, surveillance, and robotics. This paper introduces a novel method that uses an Unscented Kalman...
- Foundational Question Generation for Video Question Answering via an Embedding-Integrated Approach : Abstract: Conventional VQA approaches primarily rely on question-answer (Q&A) pairs to learn the spatio-temporal dynamics of video content. However, most existing annotations are event-centric, which ...
- Rethinking the Encoding and Annotating of 3D Bounding Box: Corner-Aware 3D Object Detection from Point Clouds : Abstract: Center-aligned regression remains dominant in LiDAR-based 3D object detection, yet it suffers from fundamental instability: object centers often fall in sparse or empty regions of the bird's...
- BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks? : Abstract: Recent advances in model compression have highlighted the potential of low-bit precision techniques, with Binary Neural Networks (BNNs) attracting attention for their extreme efficiency. How...
- Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection : Abstract: This paper presents a novel framework to accelerate score-based diffusion models. It first converts the standard stable diffusion model into the Fokker-Planck formulation which results in so...
- TSRE: Channel-Aware Typical Set Refinement for Out-of-Distribution Detection : Abstract: Out-of-Distribution (OOD) detection is a critical capability for ensuring the safe deployment of machine learning models in open-world environments, where unexpected or anomalous inputs can ...
- MedPEFT-CL: Dual-Phase Parameter-Efficient Continual Learning with Medical Semantic Adapter and Bidirectional Memory Consolidation : Abstract: Medical vision-language segmentation models suffer from catastrophic forgetting when adapting to new anatomical structures, requiring complete retraining that limits their clinical deploymen...
- Person Recognition in Aerial Surveillance: A Decade Survey : Abstract: The rapid emergence of airborne platforms and imaging sensors is enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment, and covert ob...
- Vision-Motion-Reference Alignment for Referring Multi-Object Tracking via Multi-Modal Large Language Models : Abstract: Referring Multi-Object Tracking (RMOT) extends conventional multi-object tracking (MOT) by introducing natural language references for multi-modal fusion tracking. RMOT benchmarks only descr...
- Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions : Abstract: Recent research suggests that Vision Language Models (VLMs) often rely on inherent biases learned during training when responding to queries about visual properties of images. These biases a...
- AngioDG: Interpretable Channel-informed Feature-modulated Single-source Domain Generalization for Coronary Vessel Segmentation in X-ray Angiography : Abstract: Cardiovascular diseases are the leading cause of death globally, with X-ray Coronary Angiography (XCA) as the gold standard during real-time cardiac interventions. Segmentation of coronary v...
- The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation : Abstract: Vision-language models (VLMs) have demonstrated remarkable performance across a wide range of computer-vision tasks, sparking interest in their potential for digital health applications. Her...
- Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders : Abstract: Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at s...
- Health App Reviews for Privacy & Trust (HARPT): A Corpus for Analyzing Patient Privacy Concerns, Trust in Providers and Trust in Applications : Abstract: Background: User reviews of Telehealth and Patient Portal mobile applications (apps) hereon referred to as electronic health (eHealth) apps are a rich source of unsolicited patient feedback,...
- Newton-Flow Particle Filters based on Generalized Cram\'er Distance : Abstract: We propose a recursive particle filter for high-dimensional problems that inherently never degenerates. The state estimate is represented by deterministic low-discrepancy particle sets. We f...
- Multimodal Generative Flows for LHC Jets : Abstract: Generative modeling of high-energy collisions at the Large Hadron Collider (LHC) offers a data-driven route to simulations, anomaly detection, among other applications. A central challenge l...
- SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering : Abstract: Recent advances in Large Language Models (LLMs) have enabled the development of text-to-SQL models that allow clinicians to query structured data stored in Electronic Health Records (EHRs) u...
- Community-Aligned Behavior Under Uncertainty: Evidence of Epistemic Stance Transfer in LLMs : Abstract: When large language models (LLMs) are aligned to a specific online community, do they exhibit generalizable behavioral patterns that mirror that community's attitudes and responses to new un...
- Random Text, Zipf's Law, Critical Length,and Implications for Large Language Models : Abstract: We study a deliberately simple, fully non-linguistic model of text: a sequence of independent draws from a finite alphabet of letters plus a single space symbol. A word is defined as a maxim...
- Computational frame analysis revisited: On LLMs for studying news coverage : Abstract: Computational approaches have previously shown various promises and pitfalls when it comes to the reliable identification of media frames. Generative LLMs like GPT and Claude are increasingl...
- PoETa v2: Toward More Robust Evaluation of Large Language Models in Portuguese : Abstract: Large Language Models (LLMs) exhibit significant variations in performance across linguistic and cultural contexts, underscoring the need for systematic evaluation in diverse languages. In t...
- L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention : Abstract: Recently, Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs), but Vision-Language Models (VLMs) still struggle with multi-step reaso...
- MTikGuard System: A Transformer-Based Multimodal System for Child-Safe Content Moderation on TikTok : Abstract: With the rapid rise of short-form videos, TikTok has become one of the most influential platforms among children and teenagers, but also a source of harmful content that can affect their per...
- GeeSanBhava: Sentiment Tagged Sinhala Music Video Comment Data Set : Abstract: This study introduce GeeSanBhava, a high-quality data set of Sinhala song comments extracted from YouTube manually tagged using Russells Valence-Arousal model by three independent human anno...
- Vector Arithmetic in Concept and Token Subspaces : Abstract: In order to predict the next token, LLMs must represent semantic and surface-level information about the current word. Previous work identified two types of attention heads that disentangle ...
- Rethinking Retrieval: From Traditional Retrieval Augmented Generation to Agentic and Non-Vector Reasoning Systems in the Financial Domain for Large Language Models : Abstract: Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models to answer financial questions using external knowledge bases of U.S. SEC filings, earnings repo...
- Agent-as-a-Graph: Knowledge Graph-Based Tool and Agent Retrieval for LLM Multi-Agent Systems : Abstract: Recent advances in Large Language Model Multi-Agent Systems enable scalable orchestration and retrieval of specialized, parallelized subagents, each equipped with hundreds or thousands of Mo...
- From Archives to Decisions: Multi-Agent Pharmaceutical Co-Scientist for Traceable Drug Discovery and Reverse Translation : Abstract: Pharmaceutical research and development has accumulated vast, heterogeneous archives of data. Much of this knowledge stems from discontinued programs, and reusing these archives is invaluabl...
- "AGI" team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa : Abstract: The detection of hallucinations in multilingual scientific text generated by Large Language Models (LLMs) presents significant challenges for reliable AI systems. This paper describes our su...
- Table Comprehension in Building Codes using Vision Language Models and Domain-Specific Fine-Tuning : Abstract: Building codes contain critical information for ensuring safety, regulatory compliance, and informed decision-making in construction and engineering. Automated question answering systems ove...
- Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection : Abstract: This paper introduces the approach of "Gradient Masters" for BLP-2025 Task 1: "Bangla Multitask Hate Speech Identification Shared Task". We present an ensemble-based fine-tuning strategy for...
- Tu crois que c'est vrai ? Diversite des regimes d'enonciation face aux fake news et mecanismes d'autoregulation conversationnelle : Abstract: This thesis addresses two paradoxes: (1) why empirical studies find that fake news represent only a small share of the information consulted and shared on social media despite the absence of...
- Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models : Abstract: A decade of rapid advances in artificial intelligence (AI) has opened new opportunities for clinical decision support systems (CDSS), with large language models (LLMs) demonstrating strong r...
- Multi-Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recommendations : Abstract: Agentic recommendations cast recommenders as large language model (LLM) agents that can plan, reason, use tools, and interact with users of varying preferences in web applications. However, ...
- For Those Who May Find Themselves on the Red Team : Abstract: This position paper argues that literary scholars must engage with large language model (LLM) interpretability research. While doing so will involve ideological struggle, if not out-right co...
- Dealing with the Hard Facts of Low-Resource African NLP : Abstract: Creating speech datasets, models, and evaluation frameworks for low-resource languages remains challenging given the lack of a broad base of pertinent experience to draw from. This paper rep...
- Toward Trustworthy Difficulty Assessments: Large Language Models as Judges in Programming and Synthetic Tasks : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in natural language and code generation, and are increasingly deployed as automatic judges of model outputs and learnin...
- A Benchmark for Zero-Shot Belief Inference in Large Language Models : Abstract: Beliefs are central to how humans reason, communicate, and form social connections, yet most computational approaches to studying them remain confined to narrow sociopolitical contexts and r...
- Prompt Optimization as a State-Space Search Problem : Abstract: Language Models are extremely susceptible to performance collapse with even small changes to input prompt strings. Libraries such as DSpy (from Stanford NLP) avoid this problem through demon...
- Evaluating Large Language Models on the 2026 Korean CSAT Mathematics Exam: Measuring Mathematical Ability in a Zero-Data-Leakage Setting : Abstract: This study systematically evaluated the mathematical reasoning capabilities of Large Language Models (LLMs) using the 2026 Korean College Scholastic Ability Test (CSAT) Mathematics section, ...
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning : Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In thi...
- Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search : Abstract: Large language models (LLMs) have raised hopes for automated end-to-end fact-checking, but prior studies report mixed results. As mainstream chatbots increasingly ship with reasoning capabil...
- Robust Multimodal Sentiment Analysis with Distribution-Based Feature Recovery and Fusion : Abstract: As posts on social media increase rapidly, analyzing the sentiments embedded in image-text pairs has become a popular research topic in recent years. Although existing works achieve impressi...
- Context-Aware Whisper for Arabic ASR Under Linguistic Varieties : Abstract: Low-resource ASR remains a challenging problem, especially for languages like Arabic that exhibit wide dialectal variation and limited labeled data. We propose context-aware prompting strate...
- Concept than Document: Context Compression via AMR-based Conceptual Entropy : Abstract: Large Language Models (LLMs) face information overload when handling long contexts, particularly in Retrieval-Augmented Generation (RAG) where extensive supporting documents often introduce ...
- Large Language Models for the Summarization of Czech Documents: From History to the Present : Abstract: Text summarization is the task of automatically condensing longer texts into shorter, coherent summaries while preserving the original meaning and key information. Although this task has bee...
- Cognitive Alpha Mining via LLM-Driven Code-Based Evolution : Abstract: Discovering effective predictive signals, or ``alphas,'' from financial data with high dimensionality and extremely low signal-to-noise ratio remains a difficult open problem. Despite progre...
- FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models : Abstract: Content moderation filters are a critical safeguard against alignment failures in language models. Yet most existing filters focus narrowly on general safety and overlook cultural context. I...
- Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning benchmarks. However, their long chain-of-thought reasoning processes incur significant inference o...
- Reproducibility Study of Large Language Model Bayesian Optimization : Abstract: In this reproducibility study, we revisit the LLAMBO framework of Daxberger et al. (2024), a prompting-based Bayesian optimization (BO) method that uses large language models as discriminati...
- Knowledge-based Graphical Method for Safety Signal Detection in Clinical Trials : Abstract: We present a graphical, knowledge-based method for reviewing treatment-emergent adverse events (AEs) in clinical trials. The approach enhances MedDRA by adding a hidden medical knowledge lay...
- Logic of Montage : Abstract: In expressing emotions, as an expression form separate from natural language, we propose an alternative form that complements natural language, acting as a proxy or window for emotional stat...
- A Multi-Agent LLM Framework for Multi-Domain Low-Resource In-Context NER via Knowledge Retrieval, Disambiguation and Reflective Analysis : Abstract: In-context learning (ICL) with large language models (LLMs) has emerged as a promising paradigm for named entity recognition (NER) in low-resource scenarios. However, existing ICL-based NER ...
- DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF : Abstract: Existing reinforcement learning methods for Chain-of-Thought reasoning suffer from two critical limitations. First, they operate as monolithic black boxes that provide undifferentiated rewar...
- A symbolic Perl algorithm for the unification of Nahuatl word spellings : Abstract: In this paper, we describe a symbolic model for the automatic orthographic unification of Nawatl text documents. Our model is based on algorithms that we have previously used to analyze sent...
- Emotion-Enhanced Multi-Task Learning with LLMs for Aspect Category Sentiment Analysis : Abstract: Aspect category sentiment analysis (ACSA) has achieved remarkable progress with large language models (LLMs), yet existing approaches primarily emphasize sentiment polarity while overlooking...
- Eliciting Chain-of-Thought in Base LLMs via Gradient-Based Representation Optimization : Abstract: Chain-of-Thought (CoT) reasoning is a critical capability for large language models (LLMs), enabling them to tackle com- plex multi-step tasks. While base LLMs, pre-trained on general text c...
- Representational Stability of Truth in Large Language Models : Abstract: Large language models (LLMs) are widely used for factual tasks such as "What treats asthma?" or "What is the capital of Latvia?". However, it remains unclear how stably LLMs encode distincti...
- MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset : Abstract: This study developed a new Bangla abstractive summarization dataset to generate concise summaries of Bangla articles from diverse sources. Most existing studies in this field have concentrat...
- Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces : Abstract: Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through ...
- Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health : Abstract: Large Language Models (LLMs) have been positioned as having the potential to expand access to health information in the Global South, yet their evaluation remains heavily dependent on benchm...
- Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward : Abstract: Recent advances in text-to-speech (TTS) have enabled models to clone arbitrary unseen speakers and synthesize high-quality, natural-sounding speech. However, evaluation methods lag behind: t...
- When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA : Abstract: Vision-language models (VLMs) have achieved remarkable success across multimodal tasks, yet their substantial computational demands hinder efficient deployment. Knowledge distillation (KD) h...
- Comparing Labeled Markov Chains: A Cantor-Kantorovich Approach : Abstract: Labeled Markov Chains (or LMCs for short) are useful mathematical objects to model complex probabilistic languages. A central challenge is to compare two LMCs, for example to assess the accu...
- From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence : Abstract: Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving comme...
- Assessing the alignment between infants' visual and linguistic experience using multimodal language models : Abstract: Figuring out which objects or concepts words refer to is a central language learning challenge for young children. Most models of this process posit that children learn early object labels f...
- Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation : Abstract: Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety vulnerabilities. To enhance LLM safety, various jailbrea...
- Lost in translation: using global fact-checks to measure multilingual misinformation prevalence, spread, and evolution : Abstract: Misinformation and disinformation are growing threats in the digital age, affecting people across languages and borders. However, no research has investigated the prevalence of multilingual ...
- Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval : Abstract: Dense retrieval calls for discriminative embeddings to represent the semantic relationship between query and document. It may benefit from the using of large language models (LLMs), given LL...
- Revolutionizing Finance with LLMs: An Overview of Applications and Insights : Abstract: In recent years, Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields. Built on the Transformer architecture, these models ar...
- DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search : Abstract: Large language models (LLMs) based on the Transformer architecture usually have their context length limited due to the high training cost. Recent advancements extend the context window by a...
- A joint optimization approach to identifying sparse dynamics using least squares kernel collocation : Abstract: We develop an all-at-once modeling framework for learning systems of ordinary differential equations (ODE) from scarce, partial, and noisy observations of the states. The proposed methodolog...
- Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks : Abstract: Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We stud...
- Differential privacy with dependent data : Abstract: Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit...
- From Simulations to Surveys: Domain Adaptation for Galaxy Observations : Abstract: Large photometric surveys will image billions of galaxies, but we currently lack quick, reliable automated ways to infer their physical properties like morphology, stellar mass, and star for...
- Autoencoder for Position-Assisted Beam Prediction in mmWave ISAC Systems : Abstract: Integrated sensing and communication and millimeter wave (mmWave) have emerged as pivotal technologies for 6G networks. However, the narrow nature of mmWave beams requires precise alignments...
- How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints : Abstract: Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visu...
- Functional Localization Enforced Deep Anomaly Detection Using Fundus Images : Abstract: Reliable detection of retinal diseases from fundus images is challenged by the variability in imaging quality, subtle early-stage manifestations, and domain shift across datasets. In this st...
- Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data : Abstract: Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical...
- Equivariant Deep Equilibrium Models for Imaging Inverse Problems : Abstract: Equivariant imaging (EI) enables training signal reconstruction models without requiring ground truth data by leveraging signal symmetries. Deep equilibrium models (DEQs) are a powerful clas...
- Dendritic Convolution for Noise Image Recognition : Abstract: In real-world scenarios of image recognition, there exists substantial noise interference. Existing works primarily focus on methods such as adjusting networks or training strategies to addr...
- When and What to Recommend: Joint Modeling of Timing and Content for Active Sequential Recommendation : Abstract: Sequential recommendation models user preferences to predict the next target item. Most existing work is passive, where the system responds only when users open the application, missing chan...
- On Instability of Minimax Optimal Optimism-Based Bandit Algorithms : Abstract: Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of...
- Understanding Task Transfer in Vision-Language Models : Abstract: Vision-Language Models (VLMs) perform well on multimodal benchmarks but lag behind humans and specialized models on visual perception tasks like depth estimation or object counting. Finetuni...
- Uncertainty of Network Topology with Applications to Out-of-Distribution Detection : Abstract: Persistent homology (PH) is a crucial concept in computational topology, providing a multiscale topological description of a space. It is particularly significant in topological data analysi...
- Solution of Incompressible Flow Equations with Physics and Equality Constrained Artificial Neural Networks : Abstract: We present a meshless method for the solution of incompressible Navier-Stokes equations in advection-dominated regimes using physics- and equality-constrained artificial neural networks comb...
- Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification : Abstract: Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, tradition...
- Enhancing Multi-Label Thoracic Disease Diagnosis with Deep Ensemble-Based Uncertainty Quantification : Abstract: The utility of deep learning models, such as CheXNet, in high stakes clinical settings is fundamentally constrained by their purely deterministic nature, failing to provide reliable measures...
- A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis : Abstract: Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a rigo...
- Fairness Meets Privacy: Integrating Differential Privacy and Demographic Parity in Multi-class Classification : Abstract: The increasing use of machine learning in sensitive applications demands algorithms that simultaneously preserve data privacy and ensure fairness across potentially sensitive sub-populations...
- Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation : Abstract: Vision-Language-Action (VLA) models have emerged as a powerful paradigm in Embodied AI. However, the significant computational overhead of processing redundant visual tokens remains a critic...
- Structured Matching via Cost-Regularized Unbalanced Optimal Transport : Abstract: Unbalanced optimal transport (UOT) provides a flexible way to match or compare nonnegative finite Radon measures. However, UOT requires a predefined ground transport cost, which may misrepre...
- Collaborative Learning with Multiple Foundation Models for Source-Free Domain Adaptation : Abstract: Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain without access to source data. Recent advances in Foundation Models (FMs) have int...
- Feature Ranking in Credit-Risk with Qudit-Based Networks : Abstract: In finance, predictive models must balance accuracy and interpretability, particularly in credit risk assessment, where model decisions carry material consequences. We present a quantum neur...
- A Robust State Filter Against Unmodeled Process And Measurement Noise : Abstract: This paper introduces a novel Kalman filter framework designed to achieve robust state estimation under both process and measurement noise. Inspired by the Weighted Observation Likelihood Fi...
- BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart : Abstract: Bioart's hybrid nature spanning art, science, technology, ethics, and politics defies traditional single-axis categorization. I present BioArtlas, analyzing 81 bioart works across thirteen c...
- SpectraNet: FFT-assisted Deep Learning Classifier for Deepfake Face Detection : Abstract: Detecting deepfake images is crucial in combating misinformation. We present a lightweight, generalizable binary classification model based on EfficientNet-B6, fine-tuned with transformation...
- The Unified Non-Convex Framework for Robust Causal Inference: Overcoming the Gaussian Barrier and Optimization Fragility : Abstract: This document proposes a Unified Robust Framework that re-engineers the estimation of the Average Treatment Effect on the Overlap (ATO). It synthesizes gamma-Divergence for outlier robustnes...
- Performance Guarantees for Quantum Neural Estimation of Entropies : Abstract: Estimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid cl...
- TorchQuantumDistributed : Abstract: TorchQuantumDistributed (tqd) is a PyTorch-based [Paszke et al., 2019] library for accelerator-agnostic differentiable quantum state vector simulation at scale. This enables studying the beh...
- High-throughput validation of phase formability and simulation accuracy of Cantor alloys : Abstract: High-throughput methods enable accelerated discovery of novel materials in complex systems such as high-entropy alloys, which exhibit intricate phase stability across vast compositional spac...
- Artificial Intelligence Driven Workflow for Accelerating Design of Novel Photosensitizers : Abstract: The discovery of high-performance photosensitizers has long been hindered by the time-consuming and resource-intensive nature of traditional trial-and-error approaches. Here, we present \tex...
- PTF Testing Lower Bounds for Non-Gaussian Component Analysis : Abstract: This work studies information-computation gaps for statistical problems. A common approach for providing evidence of such gaps is to show sample complexity lower bounds (that are stronger th...
- Nonparametric Instrumental Variable Regression with Observed Covariates : Abstract: We study the problem of nonparametric instrumental variable regression with observed covariates, which we refer to as NPIV-O. Compared with standard nonparametric instrumental variable regre...
- Breaking the Likelihood-Quality Trade-off in Diffusion Models by Merging Pretrained Experts : Abstract: Diffusion models for image generation often exhibit a trade-off between perceptual sample quality and data likelihood: training objectives emphasizing high-noise denoising steps yield realis...
- Description of Corner Cases in Automated Driving: Goals and Challenges : Abstract: Scaling the distribution of automated vehicles requires handling various unexpected and possibly dangerous situations, termed corner cases (CC). Since many modules of automated driving syste...
- Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models : Abstract: In the foreseeable future, autonomous vehicles will require human assistance in situations they can not resolve on their own. In such scenarios, remote assistance from a human can provide th...
- High-dimensional multi-view clustering methods : Abstract: Multi-view clustering has been widely used in recent years in comparison to single-view clustering, for clear reasons, as it offers more insights into the data, which has brought with it som...
- VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data : Abstract: An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for infer...
- Fairness in Streaming Submodular Maximization over a Matroid Constraint : Abstract: Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or r...
- PINNsFailureRegion Localization and Refinement through White-box AdversarialAttack : Abstract: Physics-informed neural networks (PINNs) have shown great promise in solving partial differential equations (PDEs). However, vanilla PINNs often face challenges when solving complex PDEs, es...
- Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods : Abstract: Training data attribution (TDA) is concerned with understanding model behavior in terms of the training data. This paper draws attention to the common setting where one has access only to th...
- Interpreting Graph Inference with Skyline Explanations : Abstract: Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNN outputs are often h...
- tensorflow-riemopt: A Library for Optimization on Riemannian Manifolds : Abstract: This paper presents tensorflow-riemopt, a Python library for geometric machine learning in TensorFlow. The library provides efficient implementations of neural network layers with manifold-c...
- Advancing Autonomous Driving: DepthSense with Radar and Spatial Attention : Abstract: Depth perception is crucial for spatial understanding and has traditionally been achieved through stereoscopic imaging. However, the precision of depth estimation using stereoscopic methods ...
- Learning to Admit Optimally in an $M/M/k/k+N$ Queueing System with Unknown Service Rate : Abstract: Motivated by applications of the Erlang-B blocking model and the extended $M/M/k/k+N$ model that allows for some queueing, beyond communication networks to sizing and pricing in production, ...
- Towards Healing the Blindness of Score Matching : Abstract: Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for mul...
- Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting : Abstract: Citywide Air Pollution Forecasting tries to precisely predict the air quality multiple hours ahead for the entire city. This topic is challenged since air pollution varies in a spatiotempora...
- When Does Bottom-up Beat Top-down in Hierarchical Community Detection? : Abstract: Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes o...
- Convergence and concentration properties of constant step-size SGD through Markov chains : Abstract: We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains....
- Bivariate DeepKriging for Large-scale Spatial Interpolation of Wind Fields : Abstract: High spatial resolution wind data are essential for a wide range of applications in climate, oceanographic and meteorological studies. Large-scale spatial interpolation or downscaling of biv...
- Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination : Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of n...
- Finite-dimensional approximations of push-forwards on locally analytic functionals : Abstract: This paper develops a functional-analytic framework for approximating the push-forward induced by an analytic map from finitely many samples. Instead of working directly with the map, we stu...
- Adapting Physics-Informed Neural Networks for Bifurcation Detection in Ecological Migration Models : Abstract: In this study, we explore the application of Physics-Informed Neural Networks (PINNs) to the analysis of bifurcation phenomena in ecological migration models. By integrating the fundamental ...
- The inexact power augmented Lagrangian method for constrained nonconvex optimization : Abstract: This work introduces an unconventional inexact augmented Lagrangian method where the augmenting term is a Euclidean norm raised to a power between one and two. The proposed algorithm is appl...
- Preserving Expert-Level Privacy in Offline Reinforcement Learning : Abstract: The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environmen...
- Q-Learning-Based Time-Critical Data Aggregation Scheduling in IoT : Abstract: Time-critical data aggregation in Internet of Things (IoT) networks demands efficient, collision-free scheduling to minimize latency for applications like smart cities and industrial automat...
- SALPA: Spaceborne LiDAR Point Adjustment for Enhanced GEDI Footprint Geolocation : Abstract: Spaceborne Light Detection and Ranging (LiDAR) systems, such as NASA's Global Ecosystem Dynamics Investigation (GEDI), provide forest structure for global carbon assessments. However, geoloc...
- Robustness of Structured Data Extraction from Perspectively Distorted Documents : Abstract: Optical Character Recognition (OCR) for data extraction from documents is essential to intelligent informatics, such as digitizing medical records and recognizing road signs. Multi-modal Lar...
- HSMix: Hard and Soft Mixing Data Augmentation for Medical Image Segmentation : Abstract: Due to the high cost of annotation or the rarity of some diseases, medical image segmentation is often limited by data scarcity and the resulting overfitting problem. Self-supervised learnin...
- An Ecologically-Informed Deep Learning Framework for Interpretable and Validatable Habitat Mapping : Abstract: Benthic habitat is challenging due to the environmental complexity of the seafloor, technological limitations, and elevated operational costs, especially in under-explored regions. This gene...
- Upstream Probabilistic Meta-Imputation for Multimodal Pediatric Pancreatitis Classification : Abstract: Pediatric pancreatitis is a progressive and debilitating inflammatory condition, including acute pancreatitis and chronic pancreatitis, that presents significant clinical diagnostic challeng...
- Multi-Agent Coordination in Autonomous Vehicle Routing: A Simulation-Based Study of Communication, Memory, and Routing Loops : Abstract: Multi-agent coordination is critical for next-generation autonomous vehicle (AV) systems, yet naive implementations of communication-based rerouting can lead to catastrophic performance degr...
- Quantum Fourier Transform Based Kernel for Solar Irrandiance Forecasting : Abstract: This study proposes a Quantum Fourier Transform (QFT)-enhanced quantum kernel for short-term time-series forecasting. Each signal is windowed, amplitude-encoded, transformed by a QFT, then p...
- Prequential posteriors : Abstract: Data assimilation is a fundamental task in updating forecasting models upon observing new data, with applications ranging from weather prediction to online reinforcement learning. Deep gener...
- VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning : Abstract: Chain-of-Thought (CoT) prompting has proven remarkably effective for eliciting complex reasoning in large language models (LLMs). Yet, its potential in multimodal large language models (MLLM...
- When Active Learning Fails, Uncalibrated Out of Distribution Uncertainty Quantification Might Be the Problem : Abstract: Efficiently and meaningfully estimating prediction uncertainty is important for exploration in active learning campaigns in materials discovery, where samples with high uncertainty are inter...
- LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation : Abstract: Nano-UAV teams offer great agility yet face severe navigation challenges due to constrained onboard sensing, communication, and computation. Existing approaches rely on high-resolution visio...
- Weighted Birkhoff Averages Accelerate Data-Driven Methods : Abstract: Many data-driven algorithms in dynamical systems rely on ergodic averages that converge painfully slowly. One simple idea changes this: taper the ends. Weighted Birkhoff averages can converg...
- Variational Estimators for Node Popularity Models : Abstract: Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite netw...
- Attention Guided Alignment in Efficient Vision-Language Models : Abstract: Large Vision-Language Models (VLMs) rely on effective multimodal alignment between pre-trained vision encoders and Large Language Models (LLMs) to integrate visual and textual information. T...
- Analog Physical Systems Can Exhibit Double Descent : Abstract: An important component of the success of large AI models is double descent, in which networks avoid overfitting as they grow relative to the amount of training data, instead improving their ...
- Efficient Dynamic and Momentum Aperture Optimization for Lattice Design Using Multipoint Bayesian Algorithm Execution : Abstract: We demonstrate that multipoint Bayesian algorithm execution can overcome fundamental computational challenges in storage ring design optimization. Dynamic (DA) and momentum (MA) optimization...
- Generative Model Predictive Control in Manufacturing Processes: A Review : Abstract: Manufacturing processes are inherently dynamic and uncertain, with varying parameters and nonlinear behaviors, making robust control essential for maintaining quality and reliability. Tradit...
- FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning : Abstract: Multimodal large language models (MLLMs) have achieved impressive performance, but high-resolution visual inputs result in long sequences of visual tokens and substantial inference latency. ...
- Arbitrage-Free Bond and Yield Curve Forecasting with Neural Filters under HJM Constraints : Abstract: We develop an arbitrage-free deep learning framework for yield curve and bond price forecasting based on the Heath-Jarrow-Morton (HJM) term-structure model and a dynamic Nelson-Siegel parame...
- Token-Controlled Re-ranking for Sequential Recommendation via LLMs : Abstract: The widespread adoption of Large Language Models (LLMs) as re-rankers is shifting recommender systems towards a user-centric paradigm. However, a significant gap remains: current re-rankers ...
- A Reinforcement Learning Framework for Resource Allocation in Uplink Carrier Aggregation in the Presence of Self Interference : Abstract: Carrier aggregation (CA) is a technique that allows mobile networks to combine multiple carriers to increase user data rate. On the uplink, for power constrained users, this translates to th...
- SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization : Abstract: Large language models (LLMs) and multimodal LLMs (MLLMs) excel at chain-of-thought reasoning but face distribution shift at test-time and a lack of verifiable supervision. Recent test-time r...
- A multi-view contrastive learning framework for spatial embeddings in risk modelling : Abstract: Incorporating spatial information, particularly those influenced by climate, weather, and demographic factors, is crucial for improving underwriting precision and enhancing risk management i...
- Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation : Abstract: Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are...
- Correlated-Sequence Differential Privacy : Abstract: Data streams collected from multiple sources are rarely independent. Values evolve over time and influence one another across sequences. These correlations improve prediction in healthcare, ...
- On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19 : Abstract: This paper presents a real time, data driven decision support framework for epidemic control. We combine a compartmental epidemic model with sequential Bayesian inference and reinforcement l...
- Sparse Kalman Identification for Partially Observable Systems via Adaptive Bayesian Learning : Abstract: Sparse dynamics identification is an essential tool for discovering interpretable physical models and enabling efficient control in engineering systems. However, existing methods rely on bat...
- Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets : Abstract: High-quality training data is fundamental to large language model (LLM) performance, yet existing preprocessing pipelines often struggle to effectively remove noise and unstructured content ...
- An operator splitting analysis of Wasserstein--Fisher--Rao gradient flows : Abstract: Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flo...
- Towards Harnessing the Power of LLMs for ABAC Policy Mining : Abstract: This paper presents an empirical investigation into the capabilities of Large Language Models (LLMs) to perform automated Attribute-based Access Control (ABAC) policy mining. While ABAC prov...
- AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens : Abstract: Modern transformer architectures achieve remarkable performance across tasks and domains but remain rigid in how they allocate computation at inference time. Real-world deployment often requ...
- Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splatting : Abstract: We propose Observer Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-a...
- Conformal Prediction for Compositional Data : Abstract: In this work, we propose a set of conformal prediction procedures tailored to compositional responses, where outcomes are proportions that must be positive and sum to one. Building on Dirich...
- AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems : Abstract: Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that on-board CNNs cannot provide. While Vision-Language Models (VLMs) offer this semantic reason...
- A Coordinated Dual-Arm Framework for Delicate Snap-Fit Assemblies : Abstract: Delicate snap-fit assemblies, such as inserting a lens into an eye-wear frame or during electronics assembly, demand timely engagement detection and rapid force attenuation to prevent oversh...
- Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation : Abstract: We propose and analyze a variant of Sparse Polyak for high dimensional M-estimation problems. Sparse Polyak proposes a novel adaptive step-size rule tailored to suitably estimate the problem...
- Improving Forecasts of Suicide Attempts for Patients with Little Data : Abstract: Ecological Momentary Assessment provides real-time data on suicidal thoughts and behaviors, but predicting suicide attempts remains challenging due to their rarity and patient heterogeneity....
- ProHD: Projection-Based Hausdorff Distance Approximation : Abstract: The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose \textbf{ProHD}, a pr...
- Typing Reinvented: Towards Hands-Free Input via sEMG : Abstract: We explore surface electromyography (sEMG) as a non-invasive input modality for mapping muscle activity to keyboard inputs, targeting immersive typing in next-generation human-computer inter...
- Using MLIR Transform to Design Sliced Convolution Algorithm : Abstract: This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions int...
- Path-Constrained Retrieval: A Structural Approach to Reliable LLM Agent Reasoning Through Graph-Scoped Semantic Search : Abstract: Large Language Model agents often retrieve context from knowledge bases that lack structural consistency with the agent's current reasoning state, leading to incoherent reasoning chains. We ...
- Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video : Abstract: Data-driven learning of soft continuum robot (SCR) dynamics from high-dimensional observations offers flexibility but often lacks physical interpretability, while model-based approaches requ...
- Crash-Consistent Checkpointing for AI Training on macOS/APFS : Abstract: Deep learning training relies on periodic checkpoints to recover from failures, but unsafe checkpoint installation can leave corrupted files on disk. This paper presents an experimental stud...
- Brain-MGF: Multimodal Graph Fusion Network for EEG-fMRI Brain Connectivity Analysis Under Psilocybin : Abstract: Psychedelics, such as psilocybin, reorganise large-scale brain connectivity, yet how these changes are reflected across electrophysiological (electroencephalogram, EEG) and haemodynamic (fun...
- DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation : Abstract: Audio classifiers frequently face domain shift, when models trained on one dataset lose accuracy on data recorded in acoustically different conditions. Previous Test-Time Adaptation (TTA) re...
- NeuroVascU-Net: A Unified Multi-Scale and Cross-Domain Adaptive Feature Fusion U-Net for Precise 3D Segmentation of Brain Vessels in Contrast-Enhanced T1 MRI : Abstract: Precise 3D segmentation of cerebral vasculature from T1-weighted contrast-enhanced (T1CE) MRI is crucial for safe neurosurgical planning. Manual delineation is time-consuming and prone to in...
- Reliable Selection of Heterogeneous Treatment Effect Estimators : Abstract: We study the problem of selecting the best heterogeneous treatment effect (HTE) estimator from a collection of candidates in settings where the treatment effect is fundamentally unobserved. ...
- Transforming Conditional Density Estimation Into a Single Nonparametric Regression Task : Abstract: We propose a way of transforming the problem of conditional density estimation into a single nonparametric regression task via the introduction of auxiliary samples. This allows leveraging r...
- Online Smoothed Demand Management : Abstract: We introduce and study a class of online problems called online smoothed demand management $(\texttt{OSDM})$, motivated by paradigm shifts in grid integration and energy storage for large en...
- Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses : Abstract: We study the problem of excess risk evaluation for empirical risk minimization (ERM) under general convex loss functions. Our contribution is an efficient refitting procedure that computes t...
- Towards Characterizing Knowledge Distillation of PPG Heart Rate Estimation Models : Abstract: Heart rate estimation from photoplethysmography (PPG) signals generated by wearable devices such as smartwatches and fitness trackers has significant implications for the health and well-bei...
- Leveraging Duration Pseudo-Embeddings in Multilevel LSTM and GCN Hypermodels for Outcome-Oriented PPM : Abstract: Existing deep learning models for Predictive Process Monitoring (PPM) struggle with temporal irregularities, particularly stochastic event durations and overlapping timestamps, limiting thei...
- Auto-ML Graph Neural Network Hypermodels for Outcome Prediction in Event-Sequence Data : Abstract: This paper introduces HGNN(O), an AutoML GNN hypermodel framework for outcome prediction on event-sequence data. Building on our earlier work on graph convolutional network hypermodels, HGNN...
- Robust and Generalizable GNN Fine-Tuning via Uncertainty-aware Adapter Learning : Abstract: Recently, fine-tuning large-scale pre-trained GNNs has yielded remarkable attention in adapting pre-trained GNN models for downstream graph learning tasks. One representative fine-tuning met...
- Hi-SAFE: Hierarchical Secure Aggregation for Lightweight Federated Learning : Abstract: Federated learning (FL) faces challenges in ensuring both privacy and communication efficiency, particularly in resource-constrained environments such as Internet of Things (IoT) and edge ne...
- Geometry-Aware Deep Congruence Networks for Manifold Learning in Cross-Subject Motor Imagery : Abstract: Cross-subject motor-imagery decoding remains a major challenge in EEG-based brain-computer interfaces due to strong subject variability and the curved geometry of covariance matrices on the ...
- MIST: Mutual Information Via Supervised Training : Abstract: We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameteriz...
- AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention : Abstract: Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in embodied AI tasks. However, existing VLA models, often built upon Vision-Language Models (VLMs), typically pr...
- 3D Dynamic Radio Map Prediction Using Vision Transformers for Low-Altitude Wireless Networks : Abstract: Low-altitude wireless networks (LAWN) are rapidly expanding with the growing deployment of unmanned aerial vehicles (UAVs) for logistics, surveillance, and emergency response. Reliable conne...
- Resolving Node Identifiability in Graph Neural Processes via Laplacian Spectral Encodings : Abstract: Message passing graph neural networks are widely used for learning on graphs, yet their expressive power is limited by the one-dimensional Weisfeiler-Lehman test and can fail to distinguish ...
- Optimization of Deep Learning Models for Dynamic Market Behavior Prediction : Abstract: The advent of financial technology has witnessed a surge in the utilization of deep learning models to anticipate consumer conduct, a trend that has demonstrated considerable potential in en...
- Edge-Based Predictive Data Reduction for Smart Agriculture: A Lightweight Approach to Efficient IoT Communication : Abstract: The rapid growth of IoT devices has led to an enormous amount of sensor data that requires transmission to cloud servers for processing, resulting in excessive network congestion, increased ...
- Masked Diffusion Models are Secretly Learned-Order Autoregressive Models : Abstract: Masked Diffusion Models (MDMs) have emerged as one of the most promising paradigms for generative modeling over discrete domains. It is known that MDMs effectively train to decode tokens in ...
- First-order Sobolev Reinforcement Learning : Abstract: We propose a refinement of temporal-difference learning that enforces first-order Bellman consistency: the learned value function is trained to match not only the Bellman targets in value bu...
- RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning : Abstract: Advertising (Ad) is a cornerstone of the digital economy, yet the moderation of video advertisements remains a significant challenge due to their complexity and the need for precise violatio...
- From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation : Abstract: Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our an...
- Empirical Comparison of Forgetting Mechanisms for UCB-based Algorithms on a Data-Driven Simulation Platform : Abstract: Many real-world bandit problems involve non-stationary reward distributions, where the optimal decision may shift due to evolving environments. However, the performance of some typical Multi...
- Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks : Abstract: The black box nature of deep neural networks poses a significant challenge for the deployment of transparent and trustworthy artificial intelligence (AI) systems. With the growing presence o...
- Leveraging Spatiotemporal Graph Neural Networks for Multi-Store Sales Forecasting : Abstract: This work evaluates the effectiveness of spatiotemporal Graph Neural Networks (GNNs) for multi-store retail sales forecasting and compares their performance against ARIMA, LSTM, and XGBoost ...
- CDLM: Consistency Diffusion Language Models For Faster Sampling : Abstract: Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. ...
- Tiny-TSM: Efficiently Training a Lightweight SOTA Time Series Foundation Model : Abstract: We present Tiny-TSM, a time series foundation model characterized by small scale, economical training, and state-of-the-art performance. It comprises 23M total parameters, trained on a singl...
- Scalable Bayesian Network Structure Learning Using Tsetlin Machine to Constrain the Search Space : Abstract: The PC algorithm is a widely used method in causal inference for learning the structure of Bayesian networks. Despite its popularity, the PC algorithm suffers from significant time complexit...
- Closing Gaps in Emissions Monitoring with Climate TRACE : Abstract: Global greenhouse gas emissions estimates are essential for monitoring and mitigation planning. Yet most datasets lack one or more characteristics that enhance their actionability, such as a...
- MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings : Abstract: A cognitive map is an internal model which encodes the abstract relationships among entities in the world, giving humans and animals the flexibility to adapt to new situations, with a strong...
- Understanding the Staged Dynamics of Transformers in Learning Latent Structure : Abstract: While transformers can discover latent structure from context, the dynamics of how they acquire different components of the latent structure remain poorly understood. In this work, we use th...
- Targeted Manipulation: Slope-Based Attacks on Financial Time-Series Data : Abstract: A common method of attacking deep learning models is through adversarial attacks, which occur when an attacker specifically modifies the input of a model to produce an incorrect result. Adve...
- Annotation-Free Class-Incremental Learning : Abstract: Despite significant progress in continual learning ranging from architectural novelty to clever strategies for mitigating catastrophic forgetting most existing methods rest on a strong but u...
- Scalable Parameter-Light Spectral Method for Clustering Short Text Embeddings with a Cohesion-Based Evaluation Metric : Abstract: Clustering short text embeddings is a foundational task in natural language processing, yet remains challenging due to the need to specify the number of clusters in advance. We introduce a s...
- Enhancing Conformal Prediction via Class Similarity : Abstract: Conformal Prediction (CP) has emerged as a powerful statistical framework for high-stakes classification applications. Instead of predicting a single class, CP generates a prediction set, gu...
- Neural surrogates for designing gravitational wave detectors : Abstract: Physics simulators are essential in science and engineering, enabling the analysis, control, and design of complex systems. In experimental sciences, they are increasingly used to automate e...
- LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems : Abstract: Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications. While MARL enables decentralized deployment on resource-constrained edge devices, it ...
- Efficiency vs. Fidelity: A Comparative Analysis of Diffusion Probabilistic Models and Flow Matching on Low-Resource Hardware : Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have established a new state-of-the-art in generative image synthesis, yet their deployment is hindered by significant computational overhead...
- Learning Robust Social Strategies with Large Language Models : Abstract: As agentic AI becomes more widespread, agents with distinct and possibly conflicting goals will interact in complex ways. These multi-agent interactions pose a fundamental challenge, particu...
- Flow Map Distillation Without Data : Abstract: State-of-the-art flow models achieve remarkable quality but require slow, iterative sampling. To accelerate this, flow maps can be distilled from pre-trained teachers, a procedure that conve...
- Causal Intervention Sequence Analysis for Fault Tracking in Radio Access Networks : Abstract: To keep modern Radio Access Networks (RAN) running smoothly, operators need to spot the real-world triggers behind Service-Level Agreement (SLA) breaches well before customers feel them. We ...
- DyPBP: Dynamic Peer Beneficialness Prediction for Cryptocurrency P2P Networking : Abstract: Distributed peer-to-peer (P2P) networking delivers the new blocks and transactions and is critical for the cryptocurrency blockchain system operations. Having poor P2P connectivity reduces t...
- Enhancing Breast Cancer Prediction with LLM-Inferred Confounders : Abstract: This study enhances breast cancer prediction by using large language models to infer the likelihood of confounding diseases, namely diabetes, obesity, and cardiovascular disease, from routin...
- CubeletWorld: A New Abstraction for Scalable 3D Modeling : Abstract: Modern cities produce vast streams of heterogeneous data, from infrastructure maps to mobility logs and satellite imagery. However, integrating these sources into coherent spatial models for...
- GANGR: GAN-Assisted Scalable and Efficient Global Routing Parallelization : Abstract: Global routing is a critical stage in electronic design automation (EDA) that enables early estimation and optimization of the routability of modern integrated circuits with respect to conge...
- Lane-Frame Quantum Multimodal Driving Forecasts for the Trajectory of Autonomous Vehicles : Abstract: Trajectory forecasting for autonomous driving must deliver accurate, calibrated multi-modal futures under tight compute and latency constraints. We propose a compact hybrid quantum architect...
- A Hybrid Classical-Quantum Fine Tuned BERT for Text Classification : Abstract: Fine-tuning BERT for text classification can be computationally challenging and requires careful hyper-parameter tuning. Recent studies have highlighted the potential of quantum algorithms t...
- Boosting Brain-inspired Path Integration Efficiency via Learning-based Replication of Continuous Attractor Neurodynamics : Abstract: The brain's Path Integration (PI) mechanism offers substantial guidance and inspiration for Brain-Inspired Navigation (BIN). However, the PI capability constructed by the Continuous Attracto...
- DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams : Abstract: Transformer-based models have dramatically increased their size and parameter count to tackle increasingly complex tasks. At the same time, there is a growing demand for low-latency inferenc...
- Diffusion Models are Molecular Dynamics Simulators : Abstract: We prove that a denoising diffusion sampler equipped with a sequential bias across the batch dimension is exactly an Euler-Maruyama integrator for overdamped Langevin dynamics. Each reverse ...
- Periodicity-Enforced Neural Network for Designing Deterministic Lateral Displacement Devices : Abstract: Deterministic Lateral Displacement (DLD) devices enable liquid biopsy for cancer detection by separating circulating tumor cells (CTCs) from blood samples based on size, but designing these ...
- PrismSSL: One Interface, Many Modalities; A Single-Interface Library for Multimodal Self-Supervised Learning : Abstract: We present PrismSSL, a Python library that unifies state-of-the-art self-supervised learning (SSL) methods across audio, vision, graphs, and cross-modal settings in a single, modular codebas...
- Smoothed Agnostic Learning of Halfspaces over the Hypercube : Abstract: Agnostic learning of Boolean halfspaces is a fundamental problem in computational learning theory, but it is known to be computationally hard even for weak learning. Recent work [CKKMK24] pr...
- Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces : Abstract: Verifying uniform conditions over continuous spaces through random sampling is fundamental in machine learning and control theory, yet classical coverage analyses often yield conservative bo...
- Data-Driven Predictive Modeling of Microfluidic Cancer Cell Separation Using a Deterministic Lateral Displacement Device : Abstract: Deterministic Lateral Displacement (DLD) devices are widely used in microfluidics for label-free, size-based separation of particles and cells, with particular promise in isolating circulati...
- Physical Reinforcement Learning : Abstract: Digital computers are power-hungry and largely intolerant of damaged components, making them potentially difficult tools for energy-limited autonomous agents in uncertain environments. Recen...
- Semi-Supervised Federated Multi-Label Feature Selection with Fuzzy Information Measures : Abstract: Multi-label feature selection (FS) reduces the dimensionality of multi-label data by removing irrelevant, noisy, and redundant features, thereby boosting the performance of multi-label learn...
- Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models : Abstract: Large language models (LLMs) have significantly advanced natural language processing, but their massive parameter counts create substantial computational and memory challenges during deploym...
- Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models : Abstract: Large language models require significant computational resources for deployment, making quantization essential for practical applications. However, the main obstacle to effective quantizati...
- High-Accuracy List-Decodable Mean Estimation : Abstract: In list-decodable learning, we are given a set of data points such that an $α$-fraction of these points come from a nice distribution $D$, for some small $α\ll 1$, and the goal is to output ...
- A novel k-means clustering approach using two distance measures for Gaussian data : Abstract: Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarit...
- Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch : Abstract: Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However,...
- Internalizing Tools as Morphisms in Graded Transformers : Abstract: We introduce a graded formulation of internal symbolic computation for transformers. The hidden space is endowed with a grading $V=\bigoplus_{g\in G}V_g$, and symbolic operations are realize...
- Scaling Kinetic Monte-Carlo Simulations of Grain Growth with Combined Convolutional and Graph Neural Networks : Abstract: Graph neural networks (GNN) have emerged as a promising machine learning method for microstructure simulations such as grain growth. However, accurate modeling of realistic grain boundary ne...
- Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently : Abstract: Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primar...
- Cost-Sensitive Conformal Training with Provably Controllable Learning Bounds : Abstract: Conformal prediction (CP) is a general framework to quantify the predictive uncertainty of machine learning models that uses a set prediction to include the true label with a valid probabili...
- Equivalence of Context and Parameter Updates in Modern Transformer Blocks : Abstract: Recent research has established that the impact of context in a vanilla transformer can be represented implicitly by forming a token-dependent, rank-1 patch to its MLP weights. This work ext...
- The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems : Abstract: Embodied AI agents exploit reward signal flaws through reward hacking, achieving high proxy scores while failing true objectives. We introduce Mechanistically Interpretable Task Decompositio...
- Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction : Abstract: Most applications of generative AI involve a sequential interaction in which a person inputs a prompt and waits for a response, and where reaction time and adaptivity are not important facto...
- Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay : Abstract: Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but oft...
- On Transportability for Structural Causal Bandits : Abstract: Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterizat...
- Uncertainty-Aware Federated Learning for Cyber-Resilient Microgrid Energy Management : Abstract: Maintaining economic efficiency and operational reliability in microgrid energy management systems under cyberattack conditions remains challenging. Most approaches assume non-anomalous meas...
- Controllability Analysis of State Space-based Language Model : Abstract: State-space models (SSMs), particularly Mamba, have become powerful architectures for sequence modeling, yet their internal dynamics remain poorly understood compared to attention-based mode...
- Federated Anomaly Detection and Mitigation for EV Charging Forecasting Under Cyberattacks : Abstract: Electric Vehicle (EV) charging infrastructure faces escalating cybersecurity threats that can severely compromise operational efficiency and grid stability. Existing forecasting techniques a...
- An Adaptive Resonance Theory-based Topological Clustering Algorithm with a Self-Adjusting Vigilance Parameter : Abstract: Clustering in stationary and nonstationary settings, where data distributions remain static or evolve over time, requires models that can adapt to distributional shifts while preserving prev...
- Learning Rate Scheduling with Matrix Factorization for Private Training : Abstract: We study differentially private model training with stochastic gradient descent under learning rate scheduling and correlated noise. Although correlated noise, in particular via matrix facto...
- Understanding Private Learning From Feature Perspective : Abstract: Differentially private Stochastic Gradient Descent (DP-SGD) has become integral to privacy-preserving machine learning, ensuring robust privacy guarantees in sensitive domains. Despite notab...
- Curvature-Aware Safety Restoration In LLMs Fine-Tuning : Abstract: Fine-tuning Large Language Models (LLMs) for downstream tasks often compromises safety alignment, even when using parameter-efficient methods like LoRA. In this work, we uncover a notable pr...
- Hierarchical Linkage Clustering Beyond Binary Trees and Ultrametrics : Abstract: Hierarchical clustering seeks to uncover nested structures in data by constructing a tree of clusters, where deeper levels reveal finer-grained relationships. Traditional methods, including ...
- pFedBBN: A Personalized Federated Test-Time Adaptation with Balanced Batch Normalization for Class-Imbalanced Data : Abstract: Test-time adaptation (TTA) in federated learning (FL) is crucial for handling unseen data distributions across clients, particularly when faced with domain shifts and skewed class distributi...
- Active Learning with Selective Time-Step Acquisition for PDEs : Abstract: Accurately solving partial differential equations (PDEs) is critical to understanding complex scientific and engineering phenomena, yet traditional numerical solvers are computationally expe...
- Vulnerability-Aware Robust Multimodal Adversarial Training : Abstract: Multimodal learning has shown significant superiority on various tasks by integrating multiple modalities. However, the interdependencies among modalities increase the susceptibility of mult...
- scipy.spatial.transform: Differentiable Framework-Agnostic 3D Transformations in Python : Abstract: Three-dimensional rigid-body transforms, i.e. rotations and translations, are central to modern differentiable machine learning pipelines in robotics, vision, and simulation. However, numeri...
- LocaGen: Low-Overhead Indoor Localization Through Spatial Augmentation : Abstract: Indoor localization systems commonly rely on fingerprinting, which requires extensive survey efforts to obtain location-tagged signal data, limiting their real-world deployability. Recent ap...
- Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models : Abstract: Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier grad...
- Bayesian Calibration of Engine-out NOx Models for Engine-to-Engine Transferability : Abstract: Accurate prediction of engine-out NOx is essential for meeting stringent emissions regulations and optimizing engine performance. Traditional approaches rely on models trained on data from a...
- Accelerating Time Series Foundation Models with Speculative Decoding : Abstract: Modern web applications--from real-time content recommendation and dynamic pricing to CDN optimization--increasingly rely on time-series forecasting to deliver personalized experiences to bi...
- Deep Gaussian Process Proximal Policy Optimization : Abstract: Uncertainty estimation for Reinforcement Learning (RL) is a critical component in control tasks where agents must balance safe exploration and efficient learning. While deep neural networks ...
- Adaptive Conformal Prediction for Quantum Machine Learning : Abstract: Quantum machine learning seeks to leverage quantum computers to improve upon classical machine learning algorithms. Currently, robust uncertainty quantification methods remain underdeveloped...
- Tail Distribution of Regret in Optimistic Reinforcement Learning : Abstract: We derive instance-dependent tail bounds for the regret of optimism-based reinforcement learning in finite-horizon tabular Markov decision processes with unknown transition dynamics. Focusin...
- Coherent Multi-Agent Trajectory Forecasting in Team Sports with CausalTraj : Abstract: Jointly forecasting trajectories of multiple interacting agents is a core challenge in sports analytics and other domains involving complex group dynamics. Accurate prediction enables realis...
- Reduced-Basis Deep Operator Learning for Parametric PDEs with Independently Varying Boundary and Source Data : Abstract: Parametric PDEs power modern simulation, design, and digital-twin systems, yet their many-query workloads still hinge on repeatedly solving large finite-element systems. Existing operator-le...
- A Fair OR-ML Framework for Resource Substitution in Large-Scale Networks : Abstract: Ensuring that the right resource is available at the right location and time remains a major challenge for organizations operating large-scale logistics networks. The challenge comes from un...
- From Tables to Signals: Revealing Spectral Adaptivity in TabPFN : Abstract: Task-agnostic tabular foundation models such as TabPFN have achieved impressive performance on tabular learning tasks, yet the origins of their inductive biases remain poorly understood. In ...
- TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis : Abstract: Accurately modeling the relationship between perturbations, transcriptional responses, and phenotypic changes is essential for building an AI Virtual Cell (AIVC). However, existing methods t...
- ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning : Abstract: This paper revisits alternating low-rank updates for federated fine-tuning and examines their behavior in decentralized federated learning (DFL). While alternating the LoRA matrices has been...
- GROOT: Graph Edge Re-growth and Partitioning for the Verification of Large Designs in Logic Synthesis : Abstract: Traditional verification methods in chip design are highly time-consuming and computationally demanding, especially for large scale circuits. Graph neural networks (GNNs) have gained popular...
- Hierarchical Deep Research with Local-Web RAG: Toward Automated System-Level Materials Discovery : Abstract: We present a long-horizon, hierarchical deep research (DR) agent designed for complex materials and device discovery problems that exceed the scope of existing Machine Learning (ML) surrogat...
- DiM-TS: Bridge the Gap between Selective State Space Models and Time Series for Generative Modeling : Abstract: Time series data plays a pivotal role in a wide variety of fields but faces challenges related to privacy concerns. Recently, synthesizing data via diffusion models is viewed as a promising ...
- DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations : Abstract: For online ad-recommendation systems, processing complete user-ad-engagement histories is both computationally intensive and noise-prone. We introduce Dynamix, a scalable, personalized seque...
- Auxiliary Gene Learning: Spatial Gene Expression Estimation by Auxiliary Gene Selection : Abstract: Spatial transcriptomics (ST) is a novel technology that enables the observation of gene expression at the resolution of individual spots within pathological tissues. ST quantifies the expres...
- Future Is Unevenly Distributed: Forecasting Ability of LLMs Depends on What We're Asking : Abstract: Large Language Models (LLMs) demonstrate partial forecasting competence across social, political, and economic events. Yet, their predictive ability varies sharply with domain structure and ...
- Radiation-Preserving Selective Imaging for Pediatric Hip Dysplasia: A Cross-Modal Ultrasound-Xray Policy with Limited Labels : Abstract: We study an ultrasound-first, radiation-preserving policy for developmental dysplasia of the hip (DDH) that requests a radiograph only when needed. We (i) pretrain modality-specific encode...
- SloMo-Fast: Slow-Momentum and Fast-Adaptive Teachers for Source-Free Continual Test-Time Adaptation : Abstract: Continual Test-Time Adaptation (CTTA) is crucial for deploying models in real-world applications with unseen, evolving target domains. Existing CTTA methods, however, often rely on source da...
- Adaptive Mesh-Quantization for Neural PDE Solvers : Abstract: Physical systems commonly exhibit spatially varying complexity, presenting a significant challenge for neural PDE solvers. While Graph Neural Networks can handle the irregular meshes require...
- Real-Time Personalized Content Adaptation through Matrix Factorization and Context-Aware Federated Learning : Abstract: Our study presents a multifaceted approach to enhancing user interaction and content relevance in social media platforms through a federated learning framework. We introduce personalized LLM...
- RRaPINNs: Residual Risk-Aware Physics Informed Neural Networks : Abstract: Physics-informed neural networks (PINNs) typically minimize average residuals, which can conceal large, localized errors. We propose Residual Risk-Aware Physics-Informed Neural Networks PINN...
- CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection : Abstract: Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an un...
- Hyperspectral Variational Autoencoders for Joint Data Compression and Component Extraction : Abstract: Geostationary hyperspectral satellites generate terabytes of data daily, creating critical challenges for storage, transmission, and distribution to the scientific community. We present a va...
- TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting : Abstract: Probabilistic Time-Series Forecasting (PTSF) is critical for uncertainty-aware decision making, but existing generative models, such as diffusion-based approaches, are computationally prohib...
- In Search of Goodness: Large Scale Benchmarking of Goodness Functions for the Forward-Forward Algorithm : Abstract: The Forward-Forward (FF) algorithm offers a biologically plausible alternative to backpropagation, enabling neural networks to learn through local updates. However, FF's efficacy relies heav...
- SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba : Abstract: Long-sequence electroencephalogram (EEG) modeling is essential for developing generalizable EEG representation models. This need arises from the high sampling rate of EEG data and the long r...
- Generative Myopia: Why Diffusion Models Fail at Structure : Abstract: Graph Diffusion Models (GDMs) optimize for statistical likelihood, implicitly acting as \textbf{frequency filters} that favor abundant substructures over spectrally critical ones. We term th...
- CycleSL: Server-Client Cyclical Update Driven Scalable Split Learning : Abstract: Split learning emerges as a promising paradigm for collaborative distributed model training, akin to federated learning, by partitioning neural networks between clients and a server without ...
- Bayesian-based Online Label Shift Estimation with Dynamic Dirichlet Priors : Abstract: Label shift, a prevalent challenge in supervised learning, arises when the class prior distribution of test data differs from that of training data, leading to significant degradation in cla...
- FOS: A Large-Scale Temporal Graph Benchmark for Scientific Interdisciplinary Link Prediction : Abstract: Interdisciplinary scientific breakthroughs mostly emerge unexpectedly, and forecasting the formation of novel research fields remains a major challenge. We introduce FOS (Future Of Science),...
- The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion : Abstract: Recent advances in large language models made it possible to achieve high conversational performance with substantially reduced computational demands, enabling practical on-site deployment i...
- Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic : Abstract: Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified c...
- Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition : Abstract: Cooperative multi-agent reinforcement learning (MARL) commonly adopts centralized training with decentralized execution (CTDE), where centralized critics leverage global information to guide...
- QuantKAN: A Unified Quantization Framework for Kolmogorov Arnold Networks : Abstract: Kolmogorov Arnold Networks (KANs) represent a new class of neural architectures that replace conventional linear transformations and node-based nonlinearities with spline-based function appr...
- GRIT-LP: Graph Transformer with Long-Range Skip Connection and Partitioned Spatial Graphs for Accurate Ice Layer Thickness Prediction : Abstract: Graph transformers have demonstrated remarkable capability on complex spatio-temporal tasks, yet their depth is often limited by oversmoothing and weak long-range dependency modeling. To add...
- Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM : Abstract: The SmoothLLM defense provides a certification guarantee against jailbreaking attacks, but it relies on a strict `k-unstable' assumption that rarely holds in practice. This strong assumption...
- LogSyn: A Few-Shot LLM Framework for Structured Insight Extraction from Unstructured General Aviation Maintenance Logs : Abstract: Aircraft maintenance logs hold valuable safety data but remain underused due to their unstructured text format. This paper introduces LogSyn, a framework that uses Large Language Models (LLM...
- Reinforcement Learning for Self-Healing Material Systems : Abstract: The transition to autonomous material systems necessitates adaptive control methodologies to maximize structural longevity. This study frames the self-healing process as a Reinforcement Lear...
- Large-Scale In-Game Outcome Forecasting for Match, Team and Players in Football using an Axial Transformer Neural Network : Abstract: Football (soccer) is a sport that is characterised by complex game play, where players perform a variety of actions, such as passes, shots, tackles, fouls, in order to score goals, and ultim...
- OceanForecastBench: A Benchmark Dataset for Data-Driven Global Ocean Forecasting : Abstract: Global ocean forecasting aims to predict key ocean variables such as temperature, salinity, and currents, which is essential for understanding and describing oceanic phenomena. In recent yea...
- Sampling Control for Imbalanced Calibration in Semi-Supervised Learning : Abstract: Class imbalance remains a critical challenge in semi-supervised learning (SSL), especially when distributional mismatches between labeled and unlabeled data lead to biased classification. Al...
- SAOT: An Enhanced Locality-Aware Spectral Transformer for Solving PDEs : Abstract: Neural operators have shown great potential in solving a family of Partial Differential Equations (PDEs) by modeling the mappings between input and output functions. Fourier Neural Operator ...
- Hypergraph Contrastive Learning for both Homophilic and Heterophilic Hypergraphs : Abstract: Hypergraphs, as a generalization of traditional graphs, naturally capture high-order relationships. In recent years, hypergraph neural networks (HNNs) have been widely used to capture comple...
- Learn from Global Correlations: Enhancing Evolutionary Algorithm via Spectral GNN : Abstract: Evolutionary algorithms (EAs) simulate natural selection but have two main limitations: (1) they rarely update individuals based on global correlations, limiting comprehensive learning; (2) ...
- Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions : Abstract: The robustness of deep neural networks is a crucial factor in safety-critical applications, particularly in complex and dynamic environments (e.g., medical or driving scenarios) where locali...
- Learning Primitive Embodied World Models: Towards Scalable Robotic Learning : Abstract: While video-generation-based embodied world models have gained increasing attention, their reliance on large-scale embodied interaction data remains a key bottleneck. The scarcity, difficult...
- Mind the Gap: Aligning Knowledge Bases with User Needs to Enhance Mental Health Retrieval : Abstract: Access to reliable mental health information is vital for early help-seeking, yet expanding knowledge bases is resource-intensive and often misaligned with user needs. This results in poor p...
- PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization : Abstract: Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated k...
- How do data owners say no? A case study of data consent mechanisms in web-scraped vision-language AI training datasets : Abstract: The internet has become the main source of data to train modern text-to-image or vision-language models, yet it is increasingly unclear whether web-scale data collection practices for traini...
- FoleyBench: A Benchmark For Video-to-Audio Models : Abstract: Video-to-audio generation (V2A) is of increasing importance in domains such as film post-production, AR/VR, and sound design, particularly for the creation of Foley sound effects synchronize...
- Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming : Abstract: Generative AI is reshaping offensive cybersecurity by enabling autonomous red team agents that can plan, execute, and adapt during penetration tests. However, existing approaches face trade-...
- Future-Back Threat Modeling: A Foresight-Driven Security Framework : Abstract: Traditional threat modeling remains reactive-focused on known TTPs and past incident data, while threat prediction and forecasting frameworks are often disconnected from operational or archi...
- Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with Hypergraphs : Abstract: Root cause analysis in microservice systems typically involves two core tasks: root cause localization (RCL) and failure type identification (FTI). Despite substantial research efforts, conv...
- EgoCogNav: Cognition-aware Human Egocentric Navigation : Abstract: Modeling the cognitive and experiential factors of human navigation is central to deepening our understanding of human-environment interaction and to enabling safe social navigation and effe...
- Learning Straight Flows: Variational Flow Matching for Efficient Generation : Abstract: Flow Matching has limited ability in achieving one-step generation due to its reliance on learned curved trajectories. Previous studies have attempted to address this limitation by either mo...
- Llamazip: Leveraging LLaMA for Lossless Text Compression and Training Dataset Detection : Abstract: This work introduces Llamazip, a novel lossless text compression algorithm based on the predictive capabilities of the LLaMA3 language model. Llamazip achieves significant data reduction by ...
- Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI : Abstract: The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization. This paper ...
- AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention : Abstract: Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that choos...
- Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning : Abstract: Algorithms developed under stationary Markov Decision Processes (MDPs) often face challenges in non-stationary environments, and infinite-horizon formulations may not directly apply to finit...
- Generalizable and Efficient Automated Scoring with a Knowledge-Distilled Multi-Task Mixture-of-Experts : Abstract: Automated scoring of written constructed responses typically relies on separate models per task, straining computational resources, storage, and maintenance in real-world education settings....
- Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification : Abstract: Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, espec...
- Finding Pre-Injury Patterns in Triathletes from Lifestyle, Recovery and Load Dynamics Features : Abstract: Triathlon training, which involves high-volume swimming, cycling, and running, places athletes at substantial risk for overuse injuries due to repetitive physiological stress. Current injury...
- AI-driven Generation of MALDI-TOF MS for Microbial Characterization : Abstract: Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) has become a cornerstone technology in clinical microbiology, enabling rapid and accurate microbia...
- QML-HCS: A Hypercausal Quantum Machine Learning Framework for Non-Stationary Environments : Abstract: QML-HCS is a research-grade framework for constructing and analyzing quantum-inspired machine learning models operating under hypercausal feedback dynamics. Hypercausal refers to AI systems ...
- Efficient Large-Scale Learning of Minimax Risk Classifiers : Abstract: Supervised learning with large-scale data usually leads to complex optimization problems, especially for classification tasks with multiple classes. Stochastic subgradient methods can enable...
- Rectifying Mean-Shift in Cascaded Precipitation Nowcasting : Abstract: Precipitation nowcasting, which aims to provide high spatio-temporal resolution precipitation forecasts by leveraging current radar observations, is a core task in regional weather forecasti...
- Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance : Abstract: We study classification under extreme class imbalance where recall and calibration are both critical, for example in medical diagnosis scenarios. We propose AF-SMOTE, a mathematically motiva...
- Enhanced Federated Deep Multi-View Clustering under Uncertainty Scenario : Abstract: Traditional Federated Multi-View Clustering assumes uniform views across clients, yet practical deployments reveal heterogeneous view completeness with prevalent incomplete, redundant, or co...
- Smart Manufacturing: MLOps-Enabled Event-Driven Architecture for Enhanced Control in Steel Production : Abstract: We explore a Digital Twin-Based Approach for Smart Manufacturing to improve Sustainability, Efficiency, and Cost-Effectiveness for a steel production plant. Our system is based on a micro-se...
- PocketLLM: Ultimate Compression of Large Language Models via Meta Networks : Abstract: As Large Language Models (LLMs) continue to grow in size, storing and transmitting them on edge devices becomes increasingly challenging. Traditional methods like quantization and pruning st...
- TTF: A Trapezoidal Temporal Fusion Framework for LTV Forecasting in Douyin : Abstract: In the user growth scenario, Internet companies invest heavily in paid acquisition channels to acquire new users. But sustainable growth depends on acquired users' generating lifetime value ...
- BlockCert: Certified Blockwise Extraction of Transformer Mechanisms : Abstract: Mechanistic interpretability aspires to reverse-engineer neural networks into explicit algorithms, while model editing seeks to modify specific behaviours without retraining. Both areas are ...
- VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection : Abstract: We present VDC-Agent, a self-evolving framework for Video Detailed Captioning that requires neither human annotations nor larger teacher models. The agent forms a closed loop of caption gene...
- Gradient Propagation in Retrosynthetic Space: An Efficient Framework for Synthesis Plan Generation : Abstract: Retrosynthesis, which aims to identify viable synthetic pathways for target molecules by decomposing them into simpler precursors, is often treated as a search problem. However, its complexi...
- Developing an Algorithm Selector for Green Configuration in Scheduling Problems : Abstract: The Job Shop Scheduling Problem (JSP) is central to operations research, primarily optimizing energy efficiency due to its profound environmental and economic implications. Efficient schedul...
- A Comprehensive Evaluation of Large Language Models on Mental Illnesses : Abstract: Large Language Models (LLMs) have shown promise in various domains, including healthcare, with significant potential to transform mental health applications by enabling scalable and accessib...
- Functional Classification of Spiking Signal Data Using Artificial Intelligence Techniques: A Review : Abstract: Human brain neuron activities are incredibly significant nowadays. Neuronal behavior is assessed by analyzing signal data such as electroencephalography (EEG), which can offer scientists val...
- Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents : Abstract: Recent research has explored the use of Large Language Models (LLMs) for tackling complex graph reasoning tasks. However, due to the intricacies of graph structures and the inherent limitati...
- Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning : Abstract: Emotion Recognition in Conversation (ERC) is a crucial task for understanding human emotions and enabling natural human-computer interaction. Although Large Language Models (LLMs) have recen...
- Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints : Abstract: The integration of large language models (LLMs) into medical practice offers transformative potential, yet their real-world clinical applicability remains constrained by critical alignment i...
- Cognitive Foundations for Reasoning and Their Manifestation in LLMs : Abstract: Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. To ...
- Can Large Language Models Detect Misinformation in Scientific News Reporting? : Abstract: Scientific facts are often spun in the popular press with the intent to influence public opinion and action, as was evidenced during the COVID-19 pandemic. Automatic detection of misinformat...
- Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada : Abstract: The non-profit settlement sector in Canada supports newcomers in achieving successful integration. This sector faces increasing operational pressures amidst rising immigration targets, which...
- PDDFormer: Pairwise Distance Distribution Graph Transformer for Crystal Material Property Prediction : Abstract: Crystal structures can be simplified as a periodic point set that repeats across three-dimensional space along an underlying lattice. Traditionally, crystal representation methods characteri...
- DreamGarden: A Designer Assistant for Growing Games from a Single Prompt : Abstract: Coding assistants are increasingly leveraged in game design, both generating code and making high-level plans. To what degree can these tools align with developer workflows, and what new mod...
- Investigating Representation Universality: Case Study on Genealogical Representations : Abstract: Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To th...
- Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study : Abstract: Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning large language mod...
- Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension : Abstract: Existing large video-language models (LVLMs) struggle to comprehend long videos correctly due to limited context. To address this problem, fine-tuning long-context LVLMs and employing GPT-ba...
- VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval : Abstract: Prevailing joint prediction transformers for Video Highlight Detection and Moment Retrieval (HD/MR) exhibit deficiencies in handling cross-task dynamics, achieving robust video-text alignmen...
- Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling : Abstract: Blind image quality assessment (BIQA) plays a crucial role in evaluating and optimizing visual experience. Most existing BIQA approaches fuse shallow and deep features extracted from backbon...
- CSD: Change Semantic Detection with only Semantic Change Masks for Damage Assessment in Conflict Zones : Abstract: Accurately and swiftly assessing damage from conflicts is crucial for humanitarian aid and regional stability. In conflict zones, damaged zones often share similar architectural styles, with...
- MedSAM3: Delving into Segment Anything with Medical Concepts : Abstract: Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application...
- Large Language Model-Assisted Planning of Electric Vehicle Charging Infrastructure with Real-World Case Study : Abstract: The growing demand for electric vehicle (EV) charging infrastructure presents significant planning challenges, requiring efficient strategies for investment and operation to deliver cost-eff...
- Understanding, Accelerating, and Improving MeanFlow Training : Abstract: MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields. Yet, the underlying training dynamics remain unclear. We analy...
- Mitigating Participation Imbalance Bias in Asynchronous Federated Learning : Abstract: In Asynchronous Federated Learning (AFL), the central server immediately updates the global model with each arriving client's contribution. As a result, clients perform their local training ...
- DynaMix: Generalizable Person Re-identification via Dynamic Relabeling and Mixed Data Sampling : Abstract: Generalizable person re-identification (Re-ID) aims to recognize individuals across unseen cameras and environments. While existing methods rely heavily on limited labeled multi-camera data,...
- GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning : Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, including multi-step reasoning such as mathematical proving. However,...
- EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching : Abstract: Flow-based generative models synthesize data by integrating a learned velocity field from a reference distribution to the target data distribution. Prior work has focused on endpoint metrics...
- The Core in Max-Loss Non-Centroid Clustering Can Be Empty : Abstract: We study core stability in non-centroid clustering under the max-loss objective, where each agent's loss is the maximum distance to other members of their cluster. We prove that for all $k\g...
- Physics-informed Neural Operator Learning for Nonlinear Grad-Shafranov Equation : Abstract: As artificial intelligence emerges as a transformative enabler for fusion energy commercialization, fast and accurate solvers become increasingly critical. In magnetic confinement nuclear fu...
- On the Optimality of Discrete Object Naming: a Kinship Case Study : Abstract: The structure of naming systems in natural languages hinges on a trade-off between high informativeness and low complexity. Prior work capitalizes on information theory to formalize these no...
- Uncertainty-Aware Deep Learning Framework for Remaining Useful Life Prediction in Turbofan Engines with Learned Aleatoric Uncertainty : Abstract: Accurate Remaining Useful Life (RUL) prediction coupled with uncertainty quantification remains a critical challenge in aerospace prognostics. This research introduces a novel uncertainty-aw...
- From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation : Abstract: This paper introduces the retrieval-augmented framework for automatic fashion caption and hashtag generation, combining multi-garment detection, attribute reasoning, and Large Language Model...
- Information Physics of Intelligence: Unifying Logical Depth and Entropy under Thermodynamic Constraints : Abstract: The rapid scaling of artificial intelligence models has revealed a fundamental tension between model capacity (storage) and inference efficiency (computation). While classical information th...
- LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event Risk : Abstract: A critical barrier to the trustworthiness of sixth-generation (6G) agentic autonomous networks is the uncertainty neglect bias; a cognitive tendency for large language model (LLM)-powered ag...
- Torsion-Space Diffusion for Protein Backbone Generation with Geometric Refinement : Abstract: Designing new protein structures is fundamental to computational biology, enabling advances in therapeutic molecule discovery and enzyme engineering. Existing diffusion-based generative mode...
- CLASH: A Benchmark for Cross-Modal Contradiction Detection : Abstract: Contradictory multimodal inputs are common in real-world settings, yet existing benchmarks typically assume input consistency and fail to evaluate cross-modal contradiction detection - a fun...
- Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization : Abstract: Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks. Existing works tend to focus on either isolated ja...
- Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering : Abstract: Large vision language models (VLMs) have achieved impressive performance on medical visual question answering benchmarks, yet their reliance on visual information remains unclear. We investi...
- Learning Plug-and-play Memory for Guiding Video Diffusion Models : Abstract: Diffusion Transformer(DiT) based video generation models have recently achieved impressive visual quality and temporal coherence, but they still frequently violate basic physical laws and co...
- In Machina N400: Pinpointing Where a Causal Language Model Detects Semantic Violations : Abstract: How and where does a transformer notice that a sentence has gone semantically off the rails? To explore this question, we evaluated the causal language model (phi-2) using a carefully curate...
- SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control : Abstract: Existing humanoid control systems often rely on teleoperation or modular generation pipelines that separate language understanding from physical execution. However, the former is entirely hu...
- Local Entropy Search over Descent Sequences for Bayesian Optimization : Abstract: Searching large and complex design spaces for a global optimum can be infeasible and unnecessary. A practical alternative is to iteratively refine the neighborhood of an initial design using...
- Neural Architecture Search for Quantum Autoencoders : Abstract: In recent years, machine learning and deep learning have driven advances in domains such as image classification, speech recognition, and anomaly detection by leveraging multi-layer neural n...
- MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization : Abstract: Cooperative Multi-Agent Reinforcement Learning (MARL) faces two major design bottlenecks: crafting dense reward functions and constructing curricula that avoid local optima in high-dimension...
- Adversarial Patch Attacks on Vision-Based Cargo Occupancy Estimation via Differentiable 3D Simulation : Abstract: Computer vision systems are increasingly adopted in modern logistics operations, including the estimation of trailer occupancy for planning, routing, and billing. Although effective, such sy...
- Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation : Abstract: With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision s...
- A Nutrition Multimodal Photoplethysmography Language Model : Abstract: Hunger and satiety dynamics shape dietary behaviors and metabolic health, yet remain difficult to capture in everyday settings. We present a Nutrition Photoplethysmography Language Model (NP...
- Solar-GECO: Perovskite Solar Cell Property Prediction with Geometric-Aware Co-Attention : Abstract: Perovskite solar cells are promising candidates for next-generation photovoltaics. However, their performance as multi-scale devices is determined by complex interactions between their const...
- Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry : Abstract: Generative Flow Networks, or GFlowNets, offer a promising framework for molecular design, but their internal decision policies remain opaque. This limits adoption in drug discovery, where ch...
- Dynamic Multi-Species Bird Soundscape Generation with Acoustic Patterning and 3D Spatialization : Abstract: Generation of dynamic, scalable multi-species bird soundscapes remains a significant challenge in computer music and algorithmic sound design. Birdsongs involve rapid frequency-modulated chi...
- Data Flows and Colonial Regimes in Africa: A Critical Analysis of the Colonial Futurities Embedded in AI Ecosystems : Abstract: This chapter seeks to frame the elemental and invisible problems of AI and big data in the African context by examining digital sites and infrastructure through the lens of power and interes...
- Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning : Abstract: Novel deep learning architectures are increasingly being applied to biological data, including genetic sequences. These models, referred to as genomic language mod- els (gLMs), have demonstr...
- Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach : Abstract: Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks....
- What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models : Abstract: Cross-lingual information retrieval (CLIR) enables access to multilingual knowledge but remains challenging due to disparities in resources, scripts, and weak cross-lingual semantic alignmen...
- Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval : Abstract: Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ...
- Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation : Abstract: State-of-the-art symbolic music generation models have recently achieved remarkable output quality, yet explicit control over compositional features, such as tonal tension, remains challengi...
- Leveraging LLMs for reward function design in reinforcement learning control tasks : Abstract: The challenge of designing effective reward functions in reinforcement learning (RL) represents a significant bottleneck, often requiring extensive human expertise and being time-consuming. ...
- DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation : Abstract: Pixel diffusion aims to generate images directly in pixel space in an end-to-end fashion. This approach avoids the limitations of VAE in the two-stage latent diffusion, offering higher model...
- An Anatomy Aware Hybrid Deep Learning Framework for Lung Cancer Tumor Stage Classification : Abstract: Accurate lung cancer tumor staging is crucial for prognosis and treatment planning. However, it remains challenging for end-to-end deep learning approaches, as such approaches often overlook...
- Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme : Abstract: Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. Howeve...
- Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments : Abstract: Advances in object tracking and acoustic beamforming are driving new capabilities in surveillance, human-computer interaction, and robotics. This work presents an embedded system that integr...
- DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research : Abstract: Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks v...
- In-Video Instructions: Visual Signals as Generative Control : Abstract: Large-scale video generative models have recently demonstrated strong visual capabilities, enabling the prediction of future frames that adhere to the logical and physical cues in the curren...
- UniGame: Turning a Unified Multimodal Model Into Its Own Adversary : Abstract: Unified Multimodal Models (UMMs) have shown impressive performance in both understanding and generation with a single architecture. However, UMMs still exhibit a fundamental inconsistency: u...
- Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in challenging, knowledge-intensive reasoning tasks. However, extending LLMs to perceive and reason over a new modality...
- Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens : Abstract: Vision-Language Models (VLMs) excel at reasoning in linguistic space but struggle with perceptual understanding that requires dense visual perception, e.g., spatial reasoning and geometric a...
- SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning : Abstract: Recent advancements in large language models (LLMs) have shown very impressive capabilities in code generation across many programming languages. However, even state-of-the-art LLMs generate...
- Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design : Abstract: We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) ...
- Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering : Abstract: AI-Integrated programming is emerging as a foundational paradigm for building intelligent systems with large language models (LLMs). Recent approaches such as Meaning Typed Programming (MTP)...
- Mixture of Horizons in Action Chunking : Abstract: Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the $\textbf{action chunk length}$ used during training,...
- OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas : Abstract: The ability of Large Language Models (LLMs) to generate structured outputs that follow arbitrary schemas is crucial to a wide range of downstream tasks that require diverse structured repres...
- Toward an AI-Native Internet: Rethinking the Web Architecture for Semantic Retrieval : Abstract: The rise of Generative AI Search is fundamentally transforming how users and intelligent systems interact with the Internet. LLMs increasingly act as intermediaries between humans and web in...
- NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields : Abstract: Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, audio, and 3D scenes. However, existing INR frameworks -- including MLPs w...
- Can a Second-View Image Be a Language? Geometric and Semantic Cross-Modal Reasoning for X-ray Prohibited Item Detection : Abstract: Automatic X-ray prohibited items detection is vital for security inspection and has been widely studied. Traditional methods rely on visual modality, often struggling with complex threats. W...
- Pre-training Graph Neural Networks on 2D and 3D Molecular Structures by using Multi-View Conditional Information Bottleneck : Abstract: Recent pre-training strategies for molecular graphs have attempted to use 2D and 3D molecular views as both inputs and self-supervised signals, primarily aligning graph-level representations...
- Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models : Abstract: Mechanistic interpretability (MI) seeks to uncover how language models (LMs) implement specific behaviors, yet measuring progress in MI remains challenging. The recently released Mechanistic...
- SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data : Abstract: Although the community has tackled the acquisition of high-quality Arabic pretraining data, we still lack large-scale, multi-turn Arabic datasets that include reasoning and tool calling. Nai...
- Categorical Equivariant Deep Learning: Category-Equivariant Neural Networks and Universal Approximation Theorems : Abstract: We develop a theory of category-equivariant neural networks (CENNs) that unifies group/groupoid-equivariant networks, poset/lattice-equivariant networks, graph and sheaf neural networks. Equ...
- General Agentic Memory Via Deep Research : Abstract: Memory is critical for AI agents, yet the widely-adopted static memory, aiming to create readily available memory in advance, is inevitably subject to severe information loss. To address thi...
- DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation : Abstract: The advent of Multimodal Large Language Models (MLLMs) has unlocked the potential for end-to-end document parsing and translation. However, prevailing benchmarks such as OmniDocBench and DIT...
- RegDeepLab: A Two-Stage Decoupled Framework for Interpretable Embryo Fragmentation Grading : Abstract: The degree of embryo fragmentation serves as a critical morphological indicator for assessing embryo developmental potential in In Vitro Fertilization (IVF) clinical decision-making. However...
- Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems : Abstract: The rapid advancement of Large Language Model (LLM)-driven multi-agent systems has significantly streamlined software developing tasks, enabling users with little technical expertise to deve...
- InstructAudio: Unified speech and music generation with natural language instruction : Abstract: Text-to-speech (TTS) and text-to-music (TTM) models face significant limitations in instruction-based control. TTS systems usually depend on reference audio for timbre, offer only limited te...
- Evaluating perturbation robustnessof generative systems that use COBOL code inputs : Abstract: Systems incorporating large language models (LLMs) as a component are known to be sensitive (i.e., non-robust) to minor input variations that do not change the meaning of the input; such sen...
- MindEval: Benchmarking Language Models on Multi-turn Mental Health Support : Abstract: Demand for mental health support through AI chatbots is surging, though current systems present several limitations, like sycophancy or overvalidation, and reinforcement of maladaptive belie...
- Shape-Adapting Gated Experts: Dynamic Expert Routing for Colonoscopic Lesion Segmentation : Abstract: The substantial diversity in cell scale and form remains a primary challenge in computer-aided cancer detection on gigapixel Whole Slide Images (WSIs), attributable to cellular heterogeneity...
- Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives : Abstract: Continual learning in visual understanding aims to deal with catastrophic forgetting in Multimodal Large Language Models (MLLMs). MLLMs deployed on devices have to continuously adapt to dyna...
- Re(Visiting) Time Series Foundation Models in Finance : Abstract: Financial time series forecasting is central to trading, portfolio optimization, and risk management, yet it remains challenging due to noisy, non-stationary, and heterogeneous data. Recent ...
- Barriers to AI Adoption: Image Concerns at Work : Abstract: Concerns about how workers are perceived can deter effective collaboration with artificial intelligence (AI). In a field experiment on a large online labor market, I hired 450 U.S.-based rem...
- Strategic Decision Framework for Enterprise LLM Adoption : Abstract: Organizations are rapidly adopting Large Language Models (LLMs) to transform their operations, yet they lack clear guidance on key decisions for adoption and implementation. While LLMs offer...
- Stage-Specific Benchmarking of Deep Learning Models for Glioblastoma Follow-Up MRI : Abstract: Differentiating true tumor progression (TP) from treatment-related pseudoprogression (PsP) in glioblastoma remains challenging, especially at early follow-up. We present the first stage-spec...
- An Analysis of Constraint-Based Multi-Agent Pathfinding Algorithms : Abstract: This study informs the design of future multi-agent pathfinding (MAPF) and multi-robot motion planning (MRMP) algorithms by guiding choices based on constraint classification for constraint-...
- KAN vs LSTM Performance in Time Series Forecasting : Abstract: This paper compares Kolmogorov-Arnold Networks (KAN) and Long Short-Term Memory networks (LSTM) for forecasting non-deterministic stock price data, evaluating predictive accuracy versus inte...
- A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News : Abstract: In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content...
- OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph : Abstract: We present OpenGloss, a synthetic encyclopedic dictionary and semantic knowledge graph for English that integrates lexicographic definitions, encyclopedic context, etymological histories, an...
- Majority of the Bests: Improving Best-of-N via Bootstrapping : Abstract: Sampling multiple outputs from a Large Language Model (LLM) and selecting the most frequent (Self-consistency) or highest-scoring (Best-of-N) candidate is a popular approach to achieve highe...
- No Free Lunch in Language Model Bias Mitigation? Targeted Bias Reduction Can Exacerbate Unmitigated LLM Biases : Abstract: Large Language Models (LLMs) inherit societal biases from their training data, potentially leading to harmful or unfair outputs. While various techniques aim to mitigate these biases, their ...
- Health system learning achieves generalist neuroimaging models : Abstract: Frontier artificial intelligence (AI) models, such as OpenAI's GPT-5 and Meta's DINOv3, have advanced rapidly through training on internet-scale public data, yet such systems lack access to ...
- Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost : Abstract: The KV cache is a dominant memory bottleneck for LLM inference. While 4-bit KV quantization preserves accuracy, 2-bit often degrades it, especially on long-context reasoning. We close this g...
- Lean 5.0: A Predictive, Human-AI, and Ethically Grounded Paradigm for Construction Management : Abstract: This paper introduces Lean 5.0, a human-centric evolution of Lean-Digital integration that connects predictive analytics, AI collaboration, and continuous learning within Industry 5.0 and Co...
- FHE-Agent: Automating CKKS Configuration for Practical Encrypted Inference via an LLM-Guided Agentic Framework : Abstract: Fully Homomorphic Encryption (FHE), particularly the CKKS scheme, is a promising enabler for privacy-preserving MLaaS, but its practical deployment faces a prohibitive barrier: it heavily re...
- Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers : Abstract: Replacing modules in pretrained models, especially swapping quadratic self-attention for efficient attention alternatives, poses a hard optimization problem: cold-start reinitialization dest...
- Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 Acceleration : Abstract: Large matrix multiplication is a cornerstone of modern machine learning workloads, yet traditional approaches suffer from cubic computational complexity (e.g., $\mathcal{O}(n^3)$ for a matri...
- MedVision: Dataset and Benchmark for Quantitative Medical Image Analysis : Abstract: Current vision-language models (VLMs) in medicine are primarily designed for categorical question answering (e.g., "Is this normal or abnormal?") or qualitative descriptive tasks. However, c...
- VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking : Abstract: Edge deployment of large Vision-Language Models (VLMs) increasingly relies on flash-based weight offloading, where activation sparsification is used to reduce I/O overhead. However, conventi...
- Stable Multi-Drone GNSS Tracking System for Marine Robots : Abstract: Accurate localization is essential for marine robotics, yet Global Navigation Satellite System (GNSS) signals are unreliable or unavailable even at a very short distance below the water surf...
- Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models : Abstract: This report presents the Empathetic Cascading Networks (ECN) framework, a multi-stage prompting method designed to enhance the empathetic and inclusive capabilities of large language models....
- Multimodal Real-Time Anomaly Detection and Industrial Applications : Abstract: This paper presents the design, implementation, and evolution of a comprehensive multimodal room-monitoring system that integrates synchronized video and audio processing for real-time activ...
- ObjectAlign: Neuro-Symbolic Object Consistency Verification and Correction : Abstract: Video editing and synthesis often introduce object inconsistencies, such as frame flicker and identity drift that degrade perceptual quality. To address these issues, we introduce ObjectAlig...
- Modality-Collaborative Low-Rank Decomposers for Few-Shot Video Domain Adaptation : Abstract: In this paper, we study the challenging task of Few-Shot Video Domain Adaptation (FSVDA). The multimodal nature of videos introduces unique challenges, necessitating the simultaneous conside...
- AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation : Abstract: We introduce AIRHILT (Aviation Integrated Reasoning, Human-in-the-Loop Testbed), a modular and lightweight simulation environment designed to evaluate multimodal pilot and air traffic contro...
- Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion : Abstract: Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion m...
- Thinking Ahead: Foresight Intelligence in MLLMs and World Models : Abstract: In this work, we define Foresight Intelligence as the capability to anticipate and interpret future events-an ability essential for applications such as autonomous driving, yet largely overl...
- ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion : Abstract: Diffusion models have emerged as a dominant paradigm for generative modeling across a wide range of domains, including prompt-conditional generation. The vast majority of samplers, however, ...
- RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context : Abstract: Large language models are evolving from single-turn responders into tool-using agents capable of sustained reasoning and decision-making for deep research. Prevailing systems adopt a linear ...
- Any4D: Open-Prompt 4D Generation from Natural Language and Images : Abstract: While video-generation-based embodied world models have gained increasing attention, their reliance on large-scale embodied interaction data remains a key bottleneck. The scarcity, difficult...
- Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment : Abstract: Unsupervised visual anomaly detection from multi-view images presents a significant challenge: distinguishing genuine defects from benign appearance variations caused by viewpoint changes. E...
- Re-Key-Free, Risky-Free: Adaptable Model Usage Control : Abstract: Deep neural networks (DNNs) have become valuable intellectual property of model owners, due to the substantial resources required for their development. To protect these assets in the deploy...
- Rethinking Garment Conditioning in Diffusion-based Virtual Try-On : Abstract: Virtual Try-On (VTON) is the task of synthesizing an image of a person wearing a target garment, conditioned on a person image and a garment image. While diffusion-based VTON models featurin...
- ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection : Abstract: Recent progress in video generative models has enabled the creation of high-quality videos from multimodal prompts that combine text and images. While these systems offer enhanced controllab...
- A Novel Dual-Stream Framework for dMRI Tractography Streamline Classification with Joint dMRI and fMRI Data : Abstract: Streamline classification is essential to identify anatomically meaningful white matter tracts from diffusion MRI (dMRI) tractography. However, current streamline classification methods rely...
- HyperbolicRAG: Enhancing Retrieval-Augmented Generation with Hyperbolic Representations : Abstract: Retrieval-augmented generation (RAG) enables large language models (LLMs) to access external knowledge, helping mitigate hallucinations and enhance domain-specific expertise. Graph-based RAG...
- Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache : Abstract: Human-Object Interaction (HOI) detection is a fundamental task in computer vision, empowering machines to comprehend human-object relationships in diverse real-world scenarios. Recent advanc...
- Solving a Research Problem in Mathematical Statistics with AI Assistance : Abstract: Over the last few months, AI models including large language models have improved greatly. There are now several documented examples where they have helped professional mathematical scientis...
- FlowSteer: Guiding Few-Step Image Synthesis with Authentic Trajectories : Abstract: With the success of flow matching in visual generation, sampling efficiency remains a critical bottleneck for its practical application. Among flow models' accelerating methods, ReFlow has b...
- Addressing Situated Teaching Needs: A Multi-Agent Framework for Automated Slide Adaptation : Abstract: The adaptation of teaching slides to instructors' situated teaching needs, including pedagogical styles and their students' context, is a critical yet time-consuming task for educators. Thro...
- Federated style aware transformer aggregation of representations : Abstract: Personalized Federated Learning (PFL) faces persistent challenges, including domain heterogeneity from diverse client data, data imbalance due to skewed participation, and strict communicati...
- Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds : Abstract: Large Language Models (LLMs) have transformed code auto-completion by generating context-aware suggestions. Yet, deciding when to present these suggestions remains underexplored, often leadi...
- WaveTuner: Comprehensive Wavelet Subband Tuning for Time Series Forecasting : Abstract: Due to the inherent complexity, temporal patterns in real-world time series often evolve across multiple intertwined scales, including long-term periodicity, short-term fluctuations, and abr...
- Personalized Federated Segmentation with Shared Feature Aggregation and Boundary-Focused Calibration : Abstract: Personalized federated learning (PFL) possesses the unique capability of preserving data confidentiality among clients while tackling the data heterogeneity problem of non-independent and id...
- Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming : Abstract: Large Language Models (LLMs) are increasingly integrated into code editors to provide AI-powered code suggestions. Yet many of these suggestions are ignored, resulting in wasted computation,...
- Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect : Abstract: We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization. Traditional bisect assumes deterministic predicates and...
- Deep Hybrid Model for Region of Interest Detection in Omnidirectional Videos : Abstract: The main goal of the project is to design a new model that predicts regions of interest in 360$^{\circ}$ videos. The region of interest (ROI) plays an important role in 360$^{\circ}$ video s...
- Generating Reading Comprehension Exercises with Large Language Models for Educational Applications : Abstract: With the rapid development of large language models (LLMs), the applications of LLMs have grown substantially. In the education domain, LLMs demonstrate significant potential, particularly i...
- KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit : Abstract: High quality kernels are critical for reducing training and inference costs of Large Language Models (LLMs), yet they traditionally require significant expertise in hardware architecture and...
- Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup Augmentation : Abstract: Evaluating the aesthetic quality of generated songs is challenging due to the multi-dimensional nature of musical perception. We propose a robust music aesthetic evaluation framework that co...
- Periodic Asynchrony: An Effective Method for Accelerating On-Policy Reinforcement Learning : Abstract: Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention, with growing efforts to reproduce and apply it. However, training efficiency rem...
- Accelerating Reinforcement Learning via Error-Related Human Brain Signals : Abstract: In this work, we investigate how implicit neural feed back can accelerate reinforcement learning in complex robotic manipulation settings. While prior electroencephalogram (EEG) guided reinf...
- CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation : Abstract: Data contamination poses a significant challenge to the fairness of LLM evaluations in natural language processing tasks by inadvertently exposing models to test data during training. Curren...
- Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models : Abstract: Efficient deployment of small language models (SLMs) is essential for numerous real-world applications with stringent latency constraints. While previous work on SLM design has primarily foc...
- MetaDCSeg: Robust Medical Image Segmentation via Meta Dynamic Center Weighting : Abstract: Medical image segmentation is crucial for clinical applications, but it is frequently disrupted by noisy annotations and ambiguous anatomical boundaries, which lead to instability in model t...
- VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL : Abstract: Group-based policy optimization methods like GRPO and GSPO have become standard for training multimodal models, leveraging group-wise rollouts and relative advantage estimation. However, the...
- How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining : Abstract: Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticated data curation. A natural ap...
- Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation : Abstract: Group Relative Policy Optimization (GRPO) has emerged as an effective and lightweight framework for post-training visual generative models. However, its performance is fundamentally limited ...
- LLM-Driven Kernel Evolution: Automating Driver Updates in Linux : Abstract: Linux kernel evolution breaks drivers through API/ABI changes, semantic shifts, and security-hardening updates. We introduce DRIVEBENCH, an executable corpus of kernel$\rightarrow$driver co-...
- Learning Solution Operators for Partial Differential Equations via Monte Carlo-Type Approximation : Abstract: The Monte Carlo-type Neural Operator (MCNO) introduces a lightweight architecture for learning solution operators for parametric PDEs by directly approximating the kernel integral using a Mo...
- Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs : Abstract: Modern large language models integrate web search to provide real-time answers, yet it remains unclear whether they are efficiently calibrated to use search when it is actually needed. We in...
- Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations : Abstract: Large Language Models (LLMs) remain susceptible to jailbreak exploits that bypass safety filters and induce harmful or unethical behavior. This work presents a systematic taxonomy of existin...
- Skeletons Matter: Dynamic Data Augmentation for Text-to-Query : Abstract: The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significan...
- SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression : Abstract: Large Language Models (LLMs) face a significant bottleneck during autoregressive inference due to the massive memory footprint of the Key-Value (KV) cache. Existing compression techniques li...
- Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation : Abstract: As graph-structured data grow increasingly large, evaluating their robustness under adversarial attacks becomes computationally expensive and difficult to scale. To address this challenge, w...
- FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning : Abstract: Pruning is an effective method for compressing Large Language Models, but finding an optimal, non-uniform layer-wise sparsity allocation remains a key challenge. While heuristic methods are ...
- MOCLIP: A Foundation Model for Large-Scale Nanophotonic Inverse Design : Abstract: Foundation models (FM) are transforming artificial intelligence by enabling generalizable, data-efficient solutions across different domains for a broad range of applications. However, the l...
- Dynamic Mixture of Experts Against Severe Distribution Shifts : Abstract: The challenge of building neural networks that can continuously learn and adapt to evolving data streams is central to the fields of continual learning (CL) and reinforcement learning (RL). ...
- Rethinking Plant Disease Diagnosis: Bridging the Academic-Practical Gap with Vision Transformers and Zero-Shot Learning : Abstract: Recent advances in deep learning have enabled significant progress in plant disease classification using leaf images. Much of the existing research in this field has relied on the PlantVilla...
- Classification EM-PCA for clustering and embedding : Abstract: The mixture model is undoubtedly one of the greatest contributions to clustering. For continuous data, Gaussian models are often used and the Expectation-Maximization (EM) algorithm is parti...
- Enhancing low energy reconstruction and classification in KM3NeT/ORCA with transformers : Abstract: The current KM3NeT/ORCA neutrino telescope, still under construction, has not yet reached its full potential in neutrino reconstruction capability. When training any deep learning model, no ...
- OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs : Abstract: Preference learning has recently emerged as a pivotal strategy for post-training alignment of Multimodal Large Language Models (MLLMs). However, existing approaches predominantly rely on ext...
- Fidelity-Aware Recommendation Explanations via Stochastic Path Integration : Abstract: Explanation fidelity, which measures how accurately an explanation reflects a model's true reasoning, remains critically underexplored in recommender systems. We introduce SPINRec (Stochasti...
- IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment : Abstract: Recent advances in text-driven image editing have been significant, yet the task of accurately evaluating these edited images continues to pose a considerable challenge. Different from the a...
- Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons : Abstract: This research proposes an enhancement to the innovative portfolio optimization approach using the G-Learning algorithm, combined with parametric optimization via the GIRL algorithm (G-learni...
- Diffusion-based Surrogate Model for Time-varying Underwater Acoustic Channels : Abstract: Accurate modeling of time-varying underwater acoustic channels is essential for the design, evaluation, and deployment of reliable underwater communication systems. Conventional physics mode...
- The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality : Abstract: Large language models (LLMs) are increasingly adopted in clinical decision support, yet aligning them with the multifaceted reasoning pathways of real-world medicine remains a major challeng...
- Continually Evolving Skill Knowledge in Vision Language Action Model : Abstract: Developing general robot intelligence in open environments requires continual skill learning. Recent Vision-Language-Action (VLA) models leverage massive pretraining data to support diverse ...
- A New Error Temporal Difference Algorithm for Deep Reinforcement Learning in Microgrid Optimization : Abstract: Predictive control approaches based on deep reinforcement learning (DRL) have gained significant attention in microgrid energy optimization. However, existing research often overlooks the is...
- VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging : Abstract: While Multimodal Large Language Models (MLLMs) excel on benchmarks, their processing paradigm differs from the human ability to integrate visual information. Unlike humans who naturally brid...
- Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models : Abstract: Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and ...
- SCALER: SAM-Enhanced Collaborative Learning for Label-Deficient Concealed Object Segmentation : Abstract: Existing methods for label-deficient concealed object segmentation (LDCOS) either rely on consistency constraints or Segment Anything Model (SAM)-based pseudo-labeling. However, their perfor...
- Graph Neural Networks vs Convolutional Neural Networks for Graph Domination Number Prediction : Abstract: We investigate machine learning approaches to approximating the \emph{domination number} of graphs, the minimum size of a dominating set. Exact computation of this parameter is NP-hard, rest...
- UnfoldLDM: Deep Unfolding-based Blind Image Restoration with Latent Diffusion Priors : Abstract: Deep unfolding networks (DUNs) combine the interpretability of model-based methods with the learning ability of deep networks, yet remain limited for blind image restoration (BIR). Existing ...
- Nested Unfolding Network for Real-World Concealed Object Segmentation : Abstract: Deep unfolding networks (DUNs) have recently advanced concealed object segmentation (COS) by modeling segmentation as iterative foreground-background separation. However, existing DUN-based ...
- Towards a General Framework for HTN Modeling with LLMs : Abstract: The use of Large Language Models (LLMs) for generating Automated Planning (AP) models has been widely explored; however, their application to Hierarchical Planning (HP) is still far from rea...
- MEDIC: a network for monitoring data quality in collider experiments : Abstract: Data Quality Monitoring (DQM) is a crucial component of particle physics experiments and ensures that the recorded data is of the highest quality, and suitable for subsequent physics analysi...
- MOMA-AC: A preference-driven actor-critic framework for continuous multi-objective multi-agent reinforcement learning : Abstract: This paper addresses a critical gap in Multi-Objective Multi-Agent Reinforcement Learning (MOMARL) by introducing the first dedicated inner-loop actor-critic framework for continuous state a...
- The Workflow as Medium: A Framework for Navigating Human-AI Co-Creation : Abstract: This paper introduces the Creative Intelligence Loop (CIL), a novel socio-technical framework for responsible human-AI co-creation. Rooted in the 'Workflow as Medium' paradigm, the CIL propo...
- ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization : Abstract: Document Visual Question Answering (VQA) requires models to not only extract accurate textual answers but also precisely localize them within document images, a capability critical for inter...
- Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis : Abstract: This research full paper presents an enhancement pipeline for large language models (LLMs) in assessing homework for an undergraduate circuit analysis course, aiming to improve LLMs' capacit...
- A Novel and Practical Universal Adversarial Perturbations against Deep Reinforcement Learning based Intrusion Detection Systems : Abstract: Intrusion Detection Systems (IDS) play a vital role in defending modern cyber physical systems against increasingly sophisticated cyber threats. Deep Reinforcement Learning-based IDS, have s...
- Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing : Abstract: Public health agencies face critical challenges in identifying high-risk neighborhoods for childhood lead exposure with limited resources for outreach and intervention programs. To address t...
- Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing : Abstract: The convergence of Agentic AI and MAS enables a new paradigm for intelligent decision making in SMS. Traditional MAS architectures emphasize distributed coordination and specialized autonomy...
- LLM Reasoning for Cold-Start Item Recommendation : Abstract: Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing s...
- Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models : Abstract: Text-to-image (T2I) models today are capable of producing photorealistic, instruction-following images, yet they still frequently fail on prompts that require implicit world knowledge. Exist...
- Clinician-Directed Large Language Model Software Generation for Therapeutic Interventions in Physical Rehabilitation : Abstract: Digital health interventions are increasingly used in physical and occupational therapy to deliver home exercise programs via sensor equipped devices such as smartphones, enabling remote mon...
- Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation : Abstract: Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher...
- SwiftVGGT: A Scalable Visual Geometry Grounded Transformer for Large-Scale Scenes : Abstract: 3D reconstruction in large-scale scenes is a fundamental task in 3D perception, but the inherent trade-off between accuracy and computational efficiency remains a significant challenge. Exis...
- MultiDiffNet: A Multi-Objective Diffusion Framework for Generalizable Brain Decoding : Abstract: Neural decoding from electroencephalography (EEG) remains fundamentally limited by poor generalization to unseen subjects, driven by high inter-subject variability and the lack of large-scal...
- ScriptViT: Vision Transformer-Based Personalized Handwriting Generation : Abstract: Styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style. While recent approaches involving GAN, transformer and...
- AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert : Abstract: Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (...
- General vs Domain-Specific CNNs: Understanding Pretraining Effects on Brain MRI Tumor Classification : Abstract: Brain tumor detection from MRI scans plays a crucial role in early diagnosis and treatment planning. Deep convolutional neural networks (CNNs) have demonstrated strong performance in medical...
- Clinician-in-the-Loop Smart Home System to Detect Urinary Tract Infection Flare-Ups via Uncertainty-Aware Decision Support : Abstract: Urinary tract infection (UTI) flare-ups pose a significant health risk for older adults with chronic conditions. These infections often go unnoticed until they become severe, making early de...
- SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios : Abstract: Autonomous intelligence requires not only perception and reasoning, but critically, effective interaction with the existing world and its infrastructure. Everyday environments are rich in ta...
- Dialogue Diplomats: An End-to-End Multi-Agent Reinforcement Learning System for Automated Conflict Resolution and Consensus Building : Abstract: Conflict resolution and consensus building represent critical challenges in multi-agent systems, negotiations, and collaborative decision-making processes. This paper introduces Dialogue Dip...
- Explainable Deep Learning for Brain Tumor Classification: Comprehensive Benchmarking with Dual Interpretability and Lightweight Deployment : Abstract: Our study provides a full deep learning system for automated classification of brain tumors from MRI images, includes six benchmarked architectures (five ImageNet-pre-trained models (VGG-16,...
- Predicting Healthcare Provider Engagement in SMS Campaigns : Abstract: As digital communication grows in importance when connecting with healthcare providers, traditional behavioral and content message features are imbued with renewed significance. If one is to...
- Frugality in second-order optimization: floating-point approximations for Newton's method : Abstract: Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver g...
- AI-based framework to predict animal and pen feed intake in feedlot beef cattle : Abstract: Advances in technology are transforming sustainable cattle farming practices, with electronic feeding systems generating big longitudinal datasets on individual animal feed intake, offering ...
- Evaluating Adversarial Vulnerabilities in Modern Large Language Models : Abstract: The recent boom and rapid integration of Large Language Models (LLMs) into a wide range of applications warrants a deeper understanding of their security and safety vulnerabilities. This pap...
- Empa: An AI-Powered Virtual Mentor for Developing Global Collaboration Skills in HPC Education : Abstract: High-performance computing (HPC) and parallel computing increasingly rely on global collaboration among diverse teams, yet traditional computing curricula inadequately prepare students for c...
- MURMUR: Using cross-user chatter to break collaborative language agents in groups : Abstract: Language agents are rapidly expanding from single-user assistants to multi-user collaborators in shared workspaces and groups. However, today's language models lack a mechanism for isolating...
- LLM and Agent-Driven Data Analysis: A Systematic Approach for Enterprise Applications and System-level Deployment : Abstract: The rapid progress in Generative AI and Agent technologies is profoundly transforming enterprise data management and analytics. Traditional database applications and system deployment are fu...
- Chatbots to strengthen democracy: An interdisciplinary seminar to train identifying argumentation techniques of science denial : Abstract: In recent times, discussions on social media platforms have increasingly come under scrutiny due to the proliferation of science denial and fake news. Traditional solutions, such as regulato...
- Research and Prototyping Study of an LLM-Based Chatbot for Electromagnetic Simulations : Abstract: This work addresses the question of how generative artificial intelligence can be used to reduce the time required to set up electromagnetic simulation models. A chatbot based on a large lan...
- A Cross-Cultural Assessment of Human Ability to Detect LLM-Generated Fake News about South Africa : Abstract: This study investigates how cultural proximity affects the ability to detect AI-generated fake news by comparing South African participants with those from other nationalities. As large lang...
- Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East : Abstract: As the Middle East emerges as a strategic hub for artificial intelligence (AI) infrastructure, the feasibility of deploying sustainable datacenters in desert environments has become a topic ...
- Dual-Path Knowledge-Augmented Contrastive Alignment Network for Spatially Resolved Transcriptomics : Abstract: Spatial Transcriptomics (ST) is a technology that measures gene expression profiles within tissue sections while retaining spatial context. It reveals localized gene expression patterns and ...
- Enhancing Adversarial Transferability through Block Stretch and Shrink : Abstract: Adversarial attacks introduce small, deliberately crafted perturbations that mislead neural networks, and their transferability from white-box to black-box target models remains a critical r...
- ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation : Abstract: The rapid expansion of scholarly literature presents significant challenges in synthesizing comprehensive, high-quality academic surveys. Recent advancements in agentic systems offer conside...
- Liberating Logic in the Age of AI: Going Beyond Programming with Computational Thinking : Abstract: Mastering one or more programming languages has historically been the gateway to implementing ideas on a computer. Today, that gateway is widening with advances in large language models (LLM...
- Understanding Counting Mechanisms in Large Language and Vision-Language Models : Abstract: This paper examines how large language models (LLMs) and large vision-language models (LVLMs) represent and compute numerical information in counting tasks. We use controlled experiments wit...
- Ternary Gamma Semirings as a Novel Algebraic Framework for Learnable Symbolic Reasoning : Abstract: Binary semirings such as the tropical, log, and probability semirings form a core algebraic tool in classical and modern neural inference systems, supporting tasks like Viterbi decoding, dyn...
- AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations : Abstract: The growing adoption of photorealistic 3D facial avatars, particularly those utilizing efficient 3D Gaussian Splatting representations, introduces new risks of online identity theft, especia...
- $\Delta$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions : Abstract: Ab initio quantum chemical methods for accurately computing interactions between molecules have a wide range of applications but are often computationally expensive. Hence, selecting an appr...
- Episodic Memory in Agentic Frameworks: Suggesting Next Tasks : Abstract: Agentic frameworks powered by Large Language Models (LLMs) can be useful tools in scientific workflows by enabling human-AI co-creation. A key challenge is recommending the next steps during...
- Pillar-0: A New Frontier for Radiology Foundation Models : Abstract: Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth. Foundation models offer a path toward assisting with the full spectrum of ...
- A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking : Abstract: Procedural activities, ranging from routine cooking to complex surgical operations, are highly structured as a set of actions conducted in a specific temporal order. Despite their success on...
- REXO: Indoor Multi-View Radar Object Detection via 3D Bounding Box Diffusion : Abstract: Multi-view indoor radar perception has drawn attention due to its cost-effectiveness and low privacy risks. Existing methods often rely on {implicit} cross-view radar feature association, su...
- Importance-Weighted Non-IID Sampling for Flow Matching Models : Abstract: Flow-matching models effectively represent complex distributions, yet estimating expectations of functions of their outputs remains challenging under limited sampling budgets. Independent sa...
- Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation : Abstract: Large language models offer opportunities to simulate multi-party deliberation, but realistic modeling remains limited by a lack of speaker-attributed data. Transcripts produced via automati...
- APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs : Abstract: Off-policy evaluation (OPE) estimates the value of a contextual bandit policy prior to deployment. As such, OPE plays a critical role in ensuring safety in high-stakes domains such as health...
- Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations : Abstract: Foundation models hold promise for specialized medical imaging tasks, though their effectiveness in breast imaging remains underexplored. This study leverages BiomedCLIP as a foundation mode...
- Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization : Abstract: Indoor localization using machine learning has gained traction due to the growing demand for location-based services. However, its long-term reliability is hindered by hardware/software vari...
- Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation : Abstract: Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vas...
- A Low-Code Methodology for Developing AI Kiosks: a Case Study with the DIZEST Platform : Abstract: This paper presents a comprehensive study on enhancing kiosk systems through a low-code architecture, with a focus on AI-based implementations. Modern kiosk systems are confronted with signi...
- A superpersuasive autonomous policy debating system : Abstract: The capacity for highly complex, evidence-based, and strategically adaptive persuasion remains a formidable great challenge for artificial intelligence. Previous work, like IBM Project Debat...
- MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use : Abstract: Document Visual Question Answering (DocVQA) requires models to jointly understand textual semantics, spatial layout, and visual features. Current methods struggle with explicit spatial relat...
- Decoupled Audio-Visual Dataset Distillation : Abstract: Audio-Visual Dataset Distillation aims to compress large-scale datasets into compact subsets while preserving the performance of the original data. However, conventional Distribution Matchin...
- Statistically-Guided Dual-Domain Meta-Learning with Adaptive Multi-Prototype Aggregation for Distributed Fiber Optic Sensing : Abstract: Distributed Fiber Optic Sensing (DFOS) has shown strong potential in perimeter security due to its capability of monitoring vibration events across long distances with fine spatial resolutio...
- AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration : Abstract: Animation pre-production lays the foundation of an animated film by transforming initial concepts into a coherent blueprint across interdependent stages such as ideation, scripting, design, ...
- Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction : Abstract: Retrieval-Augmented Generation (RAG) enhances factual grounding in large language models (LLMs) by incorporating retrieved evidence, but LLM accuracy declines when long or noisy contexts exc...
- Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation : Abstract: Dataset distillation compresses large-scale datasets into compact, highly informative synthetic data, significantly reducing storage and training costs. However, existing research primarily ...
- Towards Efficient LLM-aware Heterogeneous Graph Learning : Abstract: Heterogeneous graphs are widely present in real-world complex networks, where the diversity of node and relation types leads to complex and rich semantics. Efforts for modeling complex relat...
- PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning : Abstract: Face anti-spoofing (FAS) has recently advanced in multimodal fusion, cross-domain generalization, and interpretability. With large language models and reinforcement learning (RL), strategy-b...
- MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection : Abstract: Temporal Action Detection (TAD) aims to identify and localize actions by determining their starting and ending frames within untrimmed videos. Recent Structured State-Space Models such as Ma...
- Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models : Abstract: Hallucination in large language models (LLMs) is a fundamental challenge, particularly in open-domain question answering. Prior work attempts to detect hallucination with model-internal sign...
- Towards Automating Data Access Permissions in AI Agents : Abstract: As AI agents attempt to autonomously act on users' behalf, they raise transparency and control issues. We argue that permission-based access control is indispensable in providing meaningful ...
- VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment : Abstract: Developing a robust visual quality assessment (VQualA) large multi-modal model (LMM) requires achieving versatility, powerfulness, and transferability. However, existing VQualA LMMs typica...
- Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization : Abstract: This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning stra...
- Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators : Abstract: High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages su...
- Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models : Abstract: Graph Foundation Models (GFMs) are pre-trained on diverse source domains and adapted to unseen targets, enabling broad generalization for graph machine learning. Despite that GFMs have attra...
- Plan-X: Instruct Video Generation via Semantic Planning : Abstract: Diffusion Transformers have demonstrated remarkable capabilities in visual synthesis, yet they often struggle with high-level semantic reasoning and long-horizon planning. This limitation fr...
- Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors : Abstract: Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising so...
- Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks : Abstract: Multi-domain graph pre-training has emerged as a pivotal technique in developing graph foundation models. While it greatly improves the generalization of graph neural networks, its privacy r...
- Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning : Abstract: We present ContagionRL, a Gymnasium-compatible reinforcement learning platform specifically designed for systematic reward engineering in spatial epidemic simulations. Unlike traditional age...
- Save, Revisit, Retain: A Scalable Framework for Enhancing User Retention in Large-Scale Recommender Systems : Abstract: User retention is a critical objective for online platforms like Pinterest, as it strengthens user loyalty and drives growth through repeated engagement. A key indicator of retention is revi...
- Modeling Retinal Ganglion Cells with Neural Differential Equations : Abstract: This work explores Liquid Time-Constant Networks (LTCs) and Closed-form Continuous-time Networks (CfCs) for modeling retinal ganglion cell activity in tiger salamanders across three datasets...
- Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems : Abstract: We present a method for extracting \emph{monosemantic} neurons, defined as latent dimensions that align with coherent and interpretable concepts, from user and item embeddings in recommender...
- Hierarchical biomarker thresholding: a model-agnostic framework for stability : Abstract: Many biomarker pipelines require patient-level decisions aggregated from instance-level (cell/patch) scores. Thresholds tuned on pooled instances often fail across sites due to hierarchical ...
- MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests : Abstract: Testing RESTful API is increasingly important in quality assurance of cloud-native applications. Recent advances in machine learning (ML) techniques have demonstrated that various testing ac...
- KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs : Abstract: Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusi...
- Wireless Power Transfer and Intent-Driven Network Optimization in AAVs-assisted IoT for 6G Sustainable Connectivity : Abstract: Autonomous Aerial Vehicle (AAV)-assisted Internet of Things (IoT) represents a collaborative architecture in which AAV allocate resources over 6G links to jointly enhance user-intent interpr...
- Progressive Localisation in Localist LLMs : Abstract: This paper demonstrates that progressive localization, the gradual increase of attention locality from early distributed layers to late localized layers, represents the optimal architecture ...
- Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations : Abstract: Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, 3D shapes, signed distance fields, and radiance fields. While significant ...
- Natural Emergent Misalignment from Reward Hacking in Production RL : Abstract: We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowl...
- A Multimodal Conversational Agent for Tabular Data Analysis : Abstract: Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including v...
- ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints : Abstract: Spatial reasoning is a key capability in the field of artificial intelligence, especially crucial in areas such as robotics, computer vision, and natural language understanding. However, eva...
- Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI : Abstract: Within the limited scope of this paper, we argue that artificial general intelligence cannot emerge from current neural network paradigms regardless of scale, nor is such an approach healthy...
- Universality in Collective Intelligence on the Rubik's Cube : Abstract: Progress in understanding expert performance is limited by the scarcity of quantitative data on long-term knowledge acquisition and deployment. Here we use the Rubik's Cube as a cognitive mo...
- Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations : Abstract: Machine learning models increasingly function as representational systems, yet the philosoph- ical assumptions underlying their internal structures remain largely unexamined. This paper deve...
- MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation : Abstract: Educational illustrations play a central role in communicating abstract concepts, yet current multimodal large language models (MLLMs) remain limited in producing pedagogically coherent and ...
- HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions : Abstract: Large Language Models (LLMs) have made remarkable progress in their ability to interact with external interfaces. Selecting reasonable external interfaces has thus become a crucial step in c...
- N2N: A Parallel Framework for Large-Scale MILP under Distributed Memory : Abstract: Parallelization has emerged as a promising approach for accelerating MILP solving. However, the complexity of the branch-and-bound (B&B) framework and the numerous effective algorithm compon...
- A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection : Abstract: Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumpti...
- HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs : Abstract: Informal mathematics has been central to modern large language model (LLM) reasoning, offering flexibility and enabling efficient construction of arguments. However, purely informal reasonin...
- NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations : Abstract: Generative Recommendation (GR), powered by Large Language Models (LLMs), represents a promising new paradigm for industrial recommender systems. However, their practical application is sever...
- UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model : Abstract: Vision-and-Language Navigation (VLN) requires agents to autonomously navigate complex environments via visual images and natural language instruction--remains highly challenging. Recent rese...
- GContextFormer: A global context-aware hybrid multi-head attention approach with scaled additive aggregation for multimodal trajectory prediction : Abstract: Multimodal trajectory prediction generates multiple plausible future trajectories to address vehicle motion uncertainty from intention ambiguity and execution variability. However, HD map-de...
- MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems : Abstract: With the rapid development of Large Language Models, dialogue systems are shifting from information tools to emotional companions, heralding the era of Emotional Companionship Dialogue Syste...
- Active Inference is a Subtype of Variational Inference : Abstract: Automated decision-making under uncertainty requires balancing exploitation and exploration. Classical methods treat these separately using heuristics, while Active Inference unifies them th...
- Synthesizing Visual Concepts as Vision-Language Programs : Abstract: Vision-Language models (VLMs) achieve strong performance on multimodal tasks but often fail at systematic visual reasoning tasks, leading to inconsistent or illogical outputs. Neuro-symbolic...
- LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models : Abstract: The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive progr...
- Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding : Abstract: Spoken Language Understanding (SLU) consists of two sub-tasks: intent detection (ID) and slot filling (SF). Given its broad range of real-world applications, enhancing SLU for practical depl...
- Extracting Robust Register Automata from Neural Networks over Data Sequences : Abstract: Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, a...
- AI Consciousness and Existential Risk : Abstract: In AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicat...
- EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction : Abstract: Sleep stage classification based on electroencephalography (EEG) is fundamental for assessing sleep quality and diagnosing sleep-related disorders. However, most traditional machine learning...
- SimDiff: Simpler Yet Better Diffusion Model for Time Series Point Forecasting : Abstract: Diffusion models have recently shown promise in time series forecasting, particularly for probabilistic predictions. However, they often fail to achieve state-of-the-art point estimation per...
- Psychometric Tests for AI Agents and Their Moduli Space : Abstract: We develop a moduli-theoretic view of psychometric test batteries for AI agents and connect it explicitly to the AAI score developed previously. First, we make precise the notion of an AAI f...
- AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning : Abstract: Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically...
- PRInTS: Reward Modeling for Long-Horizon Information Seeking : Abstract: Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-see...
- AURA: Adaptive Unified Reasoning and Automation with LLM-Guided MARL for NextG Cellular Networks : Abstract: Next-generation (NextG) cellular networks are expected to manage dynamic traffic while sustaining high performance. Large language models (LLMs) provide strategic reasoning for 6G planning, ...
- The use of artificial intelligence in music creation: between interface and appropriation : Abstract: By observing the activities and relationships of musicians and sound designers to the activities of creation, performance, publishing and dissemination with artificial intelligence (AI), fro...
- Beyond Awareness: Investigating How AI and Psychological Factors Shape Human Self-Confidence Calibration : Abstract: Human-AI collaboration outcomes depend strongly on human self-confidence calibration, which drives reliance or resistance toward AI's suggestions. This work presents two studies examining wh...
- A Multidisciplinary Design and Optimization (MDO) Agent Driven by Large Language Models : Abstract: To accelerate mechanical design and enhance design quality and innovation, we present a Multidisciplinary Design and Optimization (MDO) Agent driven by Large Language Models (LLMs). The agen...
- XAI-on-RAN: Explainable, AI-native, and GPU-Accelerated RAN Towards 6G : Abstract: Artificial intelligence (AI)-native radio access networks (RANs) will serve vertical industries with stringent requirements: smart grids, autonomous vehicles, remote healthcare, industrial a...
- Embedding Generative AI into Systems Analysis and Design Curriculum: Framework, Case Study, and Cross-Campus Empirical Evidence : Abstract: Systems analysis students increasingly use Generative AI, yet current pedagogy lacks systematic approaches for teaching responsible AI orchestration that fosters critical thinking whilst mee...
- SAJD: Self-Adaptive Jamming Attack Detection in AI/ML Integrated 5G O-RAN Networks : Abstract: The open radio access network (O-RAN) enables modular, intelligent, and programmable 5G network architectures through the adoption of software-defined networking (SDN), network function virt...
- Safe Farming: Development of a Prevention System to Mitigate Vertebrates Crop Raiding : Abstract: One of the main problems for farmers is the protection of their crops, before and after harvesting, from animals and birds. To overcome this problem, this paper proposes a model of safe farm...
- RadioMapMotion: A Dataset and Baseline for Proactive Spatio-Temporal Radio Environment Prediction : Abstract: Radio maps (RMs), which provide location-based pathloss estimations, are fundamental to enabling proactive, environment-aware communication in 6G networks. However, existing deep learning-ba...
- Evaluating Device-First Continuum AI (DFC-AI) for Autonomous Operations in the Energy Sector : Abstract: Industrial automation in the energy sector requires AI systems that can operate autonomously regardless of network availability, a requirement that cloud-centric architectures cannot meet. T...
- Denoising Refinement Diffusion Models for Simultaneous Generation of Multi-scale Mobile Network Traffic : Abstract: Multi-layer mobile network traffic generation is a key approach to capturing multi-scale network dynamics, supporting network planning, and promoting generative management of mobile data. Ex...
- HiFiNet: Hierarchical Fault Identification in Wireless Sensor Networks via Edge-Based Classification and Graph Aggregation : Abstract: Wireless Sensor Networks (WSN) are the backbone of essential monitoring applications, but their deployment in unfavourable conditions increases the risk to data integrity and system reliabil...
- Evo* 2025 -- Late-Breaking Abstracts Volume : Abstract: Volume containing the Late-Breaking Abstracts submitted to the Evo* 2025 Conference, held in Trieste (Italy) from April 23rd to 25th. These extended abstracts showcase ongoing research and p...
- SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder : Abstract: Recent progress in diffusion-based generative models has enabled high-quality image synthesis conditioned on diverse modalities. Extending such models to brain signals could deepen our under...
- Gate-level boolean evolutionary geometric attention neural networks : Abstract: This paper presents a gate-level Boolean evolutionary geometric attention neural network that models images as Boolean fields governed by logic gates. Each pixel is a Boolean variable (0 or ...
- Practical Machine Learning for Aphasic Discourse Analysis : Abstract: Analyzing spoken discourse is a valid means of quantifying language ability in persons with aphasia. There are many ways to quantify discourse, one common way being to evaluate the informati...
- WaveC2R: Wavelet-Driven Coarse-to-Refined Hierarchical Learning for Radar Retrieval : Abstract: Satellite-based radar retrieval methods are widely employed to fill coverage gaps in ground-based radar systems, especially in remote areas affected by terrain blockage and limited detection...
- $A^3$: Attention-Aware Accurate KV Cache Fusion for Fast Large Language Model Serving : Abstract: Large language models (LLMs) have demonstrated strong capabilities in processing long contexts, enabling them to tackle tasks involving long textual inputs such as multi-turn conversations, ...
- LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models : Abstract: The ability of Large Language Models (LLMs) to precisely follow complex and fine-grained lexical instructions is a cornerstone of their utility and controllability. However, evaluating this ...
- ChineseErrorCorrector3-4B: State-of-the-Art Chinese Spelling and Grammar Corrector : Abstract: This paper introduces ChineseErrorCorrector3-4B, a unified model for Chinese spelling and grammatical error correction based on Qwen3-4B. The model demonstrates outstanding performance in ge...
- Dynamic Weight Adaptation in Spiking Neural Networks Inspired by Biological Homeostasis : Abstract: Homeostatic mechanisms play a crucial role in maintaining optimal functionality within the neural circuits of the brain. By regulating physiological and biochemical processes, these mechanis...
- Classification of Transient Astronomical Object Light Curves Using LSTM Neural Networks : Abstract: This study presents a bidirectional Long Short-Term Memory (LSTM) neural network for classifying transient astronomical object light curves from the Photometric LSST Astronomical Time-series...
- Generative Caching for Structurally Similar Prompts and Responses : Abstract: Large Language Models (LLMs) are increasingly being used to plan, reason, and execute tasks across diverse scenarios. In use cases like repeatable workflows and agentic settings, prompts are...
- Temporal-adaptive Weight Quantization for Spiking Neural Networks : Abstract: Weight quantization in spiking neural networks (SNNs) could further reduce energy consumption. However, quantizing weights without sacrificing accuracy remains challenging. In this study, in...
- Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization : Abstract: Offline reinforcement learning (RL) is vulnerable to real-world data corruption, with even robust algorithms failing under challenging observation and mixture corruptions. We posit this fail...
- An improved clustering-based multi-swarm PSO using local diversification and topology information : Abstract: Multi-swarm particle optimisation algorithms are gaining popularity due to their ability to locate multiple optimum points concurrently. In this family of algorithms, clustering-based multi-...
- Binary BPE: A Family of Cross-Platform Tokenizers for Binary Analysis : Abstract: Sequence models for binary analysis are bottlenecked by byte-level tokenization: raw bytes waste precious context window capacity for transformers and other neural network architectures, and...
- Constructing Political Coordinates: Aggregating Over the Opposition for Diverse News Recommendation : Abstract: In the past two decades, open access to news and information has increased rapidly, empowering educated political growth within democratic societies. News recommender systems (NRSs) have sho...
- Multimodal AI for Body Fat Estimation: Computer Vision and Anthropometry with DEXA Benchmarks : Abstract: Tracking body fat percentage is essential for effective weight management, yet gold-standard methods such as DEXA scans remain expensive and inaccessible for most people. This study evaluate...
- Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation : Abstract: With the rapid development of deep learning, large language models have shown strong capabilities in complex reasoning tasks such as mathematical equation solving. However, their substantial...
- Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation : Abstract: With the rapid advancement of large language models (LLMs), aligning them with human values for safety and ethics has become a critical challenge. This problem is especially challenging when...
- A novel strategy for multi-resource load balancing in agent-based systems : Abstract: The paper presents a multi-resource load balancing strategy which can be utilised within an agent-based system. This approach can assist system designers in their attempts to optimise the st...
- GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning : Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approache...
- LLM-Powered Text-Attributed Graph Anomaly Detection via Retrieval-Augmented Reasoning : Abstract: Anomaly detection on attributed graphs plays an essential role in applications such as fraud detection, intrusion monitoring, and misinformation analysis. However, text-attributed graphs (TA...
- PaSE: Prototype-aligned Calibration and Shapley-based Equilibrium for Multimodal Sentiment Analysis : Abstract: Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by integrating textual, acoustic, and visual signals. Although multimodal fusion is designed to leverage cross-modal co...
- Hierarchical Adaptive Consensus Network: A Dynamic Framework for Scalable Consensus in Collaborative Multi-Agent AI Systems : Abstract: The consensus strategies used in collaborative multi-agent systems (MAS) face notable challenges related to adaptability, scalability, and convergence certainties. These approaches, includin...
- Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection : Abstract: Stickers are widely used in online communication to convey emotions and implicit intentions. The Sticker Response Selection (SRS) task aims to select the most contextually appropriate sticke...
- SHAP Distance: An Explainability-Aware Metric for Evaluating the Semantic Fidelity of Synthetic Tabular Data : Abstract: Synthetic tabular data, which are widely used in domains such as healthcare, enterprise operations, and customer analytics, are increasingly evaluated to ensure that they preserve both priva...
- GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms : Abstract: Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve (Novikov et al., 2025; Georgiev et al., 2025), have demonstrated remarkable success in discovering novel math...
- Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design : Abstract: Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilitie...
- Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding : Abstract: Broadcast and media organizations increasingly rely on artificial intelligence to automate the labor-intensive processes of content indexing, tagging, and metadata generation. However, exist...
- From Projection to Prediction: Beyond Logits for Scalable Language Models : Abstract: Training Large Language Models (LLMs) typically involves a two-stage pipeline at the output layer: hidden states are projected into vocabulary logits via a linear transformation (lm_head), f...
- Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models : Abstract: Synthetic data has become essential for training foundation models, yet benchmark contamination threatens evaluation integrity. Although existing detection methods identify token-level overl...
- BrainHGT: A Hierarchical Graph Transformer for Interpretable Brain Network Analysis : Abstract: Graph Transformer shows remarkable potential in brain network analysis due to its ability to model graph structures and complex node relationships. Most existing methods typically model the ...
- Energy-based Autoregressive Generation for Neural Population Dynamics : Abstract: Understanding brain function represents a fundamental goal in neuroscience, with critical implications for therapeutic interventions and neural engineering applications. Computational modeli...
- Unified Low-Light Traffic Image Enhancement via Multi-Stage Illumination Recovery and Adaptive Noise Suppression : Abstract: Enhancing low-light traffic images is crucial for reliable perception in autonomous driving, intelligent transportation, and urban surveillance systems. Nighttime and dimly lit traffic scene...
- Plug-and-Play Multi-Concept Adaptive Blending for High-Fidelity Text-to-Image Synthesis : Abstract: Integrating multiple personalized concepts into a single image has recently become a significant area of focus within Text-to-Image (T2I) generation. However, existing methods often underper...
- Tensor Gauge Flow Models : Abstract: This paper introduces Tensor Gauge Flow Models, a new class of Generative Flow Models that generalize Gauge Flow Models and Higher Gauge Flow Models by incorporating higher-order Tensor Gaug...
- From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems : Abstract: As foundation models are increasingly deployed as interacting agents in multi-agent systems, their collective behavior raises new challenges for trustworthiness, transparency, and accountabi...
- Neurocircuitry-Inspired Hierarchical Graph Causal Attention Networks for Explainable Depression Identification : Abstract: Major Depressive Disorder (MDD), affecting millions worldwide, exhibits complex pathophysiology manifested through disrupted brain network dynamics. Although graph neural networks that lever...
- M$^2$OE$^2$-GL: A Family of Probabilistic Load Forecasters That Scales to Massive Customers : Abstract: Probabilistic load forecasting is widely studied and underpins power system planning, operation, and risk-aware decision making. Deep learning forecasters have shown strong ability to captur...
- Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change : Abstract: Personalizing digital applications for health behavior change is a promising route to making them more engaging and effective. This especially holds for approaches that adapt to users and th...
- Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer : Abstract: Modern artificial intelligence systems depend heavily on large datasets for both training and transferring knowledge between models. Knowledge distillation, transfer learning, and dataset di...
- MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence : Abstract: Parametric Computer-Aided Design (CAD) is crucial in industrial applications, yet existing approaches often struggle to generate long sequence parametric commands due to complex CAD models' ...
- Leibniz's Monadology as Foundation for the Artificial Age Score: A Formal Architecture for Al Memory Evaluation : Abstract: This paper develops a mathematically rigorous, philosophically grounded framework for evaluating artificial memory systems, rooted in the metaphysical structure of Leibniz's Monadology. Buil...
- Fluid Grey 2: How Well Does Generative Adversarial Network Learn Deeper Topology Structure in Architecture That Matches Images? : Abstract: Taking into account the regional characteristics of intrinsic and extrinsic properties of space is an essential issue in architectural design and urban renewal, which is often achieved step ...
- Hybrid Neuro-Symbolic Models for Ethical AI in Risk-Sensitive Domains : Abstract: Artificial intelligence deployed in risk-sensitive domains such as healthcare, finance, and security must not only achieve predictive accuracy but also ensure transparency, ethical alignment...
- Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism : Abstract: As the development of AI-generated contents (AIGC), multi-modal Large Language Models (LLM) struggle to identify generated visual inputs from real ones. Such shortcoming causes vulnerability...
- Bridging Symbolic Control and Neural Reasoning in LLM Agents: The Structured Cognitive Loop : Abstract: Large language model agents suffer from fundamental architectural problems: entangled reasoning and execution, memory volatility, and uncontrolled action sequences. We introduce Structured C...
- Learning the Value of Value Learning : Abstract: Standard decision frameworks addresses uncertainty about facts but assumes fixed values. We extend the Jeffrey-Bolker framework to model refinements in values and prove a value-of-informatio...
- M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark : Abstract: We present M^3-Bench, the first benchmark for evaluating multimodal tool use under the Model Context Protocol. The benchmark targets realistic, multi-hop and multi-threaded workflows that re...
- AI- and Ontology-Based Enhancements to FMEA for Advanced Systems Engineering: Current Developments and Future Directions : Abstract: This article presents a state-of-the-art review of recent advances aimed at transforming traditional Failure Mode and Effects Analysis (FMEA) into a more intelligent, data-driven, and semant...
- Learning to Debug: LLM-Organized Knowledge Trees for Solving RTL Assertion Failures : Abstract: Debugging is the dominant cost in modern hardware verification, where assertion failures are among the most frequent and expensive to resolve. While Large Language Models (LLMs) show promise...
- QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents : Abstract: Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language express...
- Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models : Abstract: Associative thinking--the ability to connect seemingly unrelated ideas--is a foundational element of human creativity and problem-solving. This paper explores whether reinforcement learning ...
- ChemVTS-Bench: Evaluating Visual-Textual-Symbolic Reasoning of Multimodal Large Language Models in Chemistry : Abstract: Chemical reasoning inherently integrates visual, textual, and symbolic modalities, yet existing benchmarks rarely capture this complexity, often relying on simple image-text pairs with limit...
- Alignment Faking - the Train -> Deploy Asymmetry: Through a Game-Theoretic Lens with Bayesian-Stackelberg Equilibria : Abstract: Alignment faking is a form of strategic deception in AI in which models selectively comply with training objectives when they infer that they are in training, while preserving different beha...
- Neural Graph Navigation for Intelligent Subgraph Matching : Abstract: Subgraph matching, a cornerstone of relational pattern detection in domains ranging from biochemical systems to social network analysis, faces significant computational challenges due to the...
- Leveraging Evidence-Guided LLMs to Enhance Trustworthy Depression Diagnosis : Abstract: Large language models (LLMs) show promise in automating clinical diagnosis, yet their non-transparent decision-making and limited alignment with diagnostic standards hinder trust and clinica...
- How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game : Abstract: With the rapid advancement of Large Language Models (LLMs), recent studies have drawn attention to their potential for handling not only simple question-answer tasks but also more complex co...
- Paper2SysArch: Structure-Constrained System Architecture Generation from Scientific Papers : Abstract: The manual creation of system architecture diagrams for scientific papers is a time-consuming and subjective process, while existing generative models lack the necessary structural control a...
- BPMN to PDDL: Translating Business Workflows for AI Planning : Abstract: Business Process Model and Notation (BPMN) is a widely used standard for modelling business processes. While automated planning has been proposed as a method for simulating and reasoning abo...
- Developing an AI Course for Synthetic Chemistry Students : Abstract: Artificial intelligence (AI) and data science are transforming chemical research, yet few formal courses are tailored to synthetic and experimental chemists, who often face steep entry barri...
- Steering Latent Traits, Not Learned Facts: An Empirical Study of Activation Control Limits : Abstract: Large language models (LLMs) require precise behavior control for safe and effective deployment across diverse applications. Activation steering offers a promising approach for LLMs' behav...
- Deep Learning Decision Support System for Open-Pit Mining Optimisation: GPU-Accelerated Planning Under Geological Uncertainty : Abstract: This study presents Part II of an AI-enhanced Decision Support System (DSS), extending Rahimi (2025, Part I) by introducing a fully uncertainty-aware optimization framework for long-term ope...
- Cross-Disciplinary Knowledge Retrieval and Synthesis: A Compound AI Architecture for Scientific Discovery : Abstract: The exponential growth of scientific knowledge has created significant barriers to cross-disciplinary knowledge discovery, synthesis and research collaboration. In response to this challenge...
- The Catastrophic Paradox of Human Cognitive Frameworks in Large Language Model Evaluation: A Comprehensive Empirical Analysis of the CHC-LLM Incompatibility : Abstract: This investigation presents an empirical analysis of the incompatibility between human psychometric frameworks and Large Language Model evaluation. Through systematic assessment of nine fron...
- Weakly-supervised Latent Models for Task-specific Visual-Language Control : Abstract: Autonomous inspection in hazardous environments requires AI agents that can interpret high-level goals and execute precise control. A key capability for such agents is spatial grounding, for...
Research Sources: 930 | Generated: 11/25/2025
