AI Research News Feeds for November 18th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

TR-Gaussians: High-fidelity Real-time Rendering of Planar Transmission and Reflection with 3D Gaussian Splatting : Abstract: We propose Transmission-Reflection Gaussians (TR-Gaussians), a novel 3D-Gaussian-based representation for high-fidelity rendering of planar transmission and reflection, which are ubiquitous ...
MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements : Abstract: Graphical User Interface (GUI) grounding - the task of mapping natural language instructions to screen coordinates - is essential for autonomous agents and accessibility technologies. Existi...
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications : Abstract: Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network op...
PIGEON: VLM-Driven Object Navigation via Points of Interest Selection : Abstract: Navigating to a specified object in an unknown environment is a fundamental yet challenging capability of embodied intelligence. However, current methods struggle to balance decision frequen...
DAP: A Discrete-token Autoregressive Planner for Autonomous Driving : Abstract: Gaining sustainable performance improvement with scaling data and model budget remains a pivotal yet unresolved challenge in autonomous driving. While autoregressive models exhibited promisi...
Trust in Vision-Language Models: Insights from a Participatory User Workshop : Abstract: With the growing deployment of Vision-Language Models (VLMs), pre-trained on large image-text and video-text datasets, it is critical to equip users with the tools to discern when to trust t...
HIBMatch: Hypergraph Information Bottleneck for Semi-supervised Alzheimer's Progression : Abstract: Alzheimer's disease progression prediction is critical for patients with early Mild Cognitive Impairment (MCI) to enable timely intervention and improve their quality of life. While existing...
DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection : Abstract: The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social med...
Lane Graph Extraction from Aerial Imagery via Lane Segmentation Refinement with Diffusion Models : Abstract: The lane graph is critical for applications such as autonomous driving and lane-level route planning. While previous research has focused on extracting lane-level graphs from aerial imagery ...
3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance : Abstract: Recent 3D novel view synthesis (NVS) methods often require extensive 3D data for training, and also typically lack generalization beyond the training distribution. Moreover, they tend to be ...
BadVim: Unveiling Backdoor Threats in Visual State Space Model : Abstract: Visual State Space Models (VSSM) have shown remarkable performance in various computer vision tasks. However, backdoor attacks pose significant security challenges, causing compromised model...
An Efficient Watermarking Method for Latent Diffusion Models via Low-Rank Adaptation and Dynamic Loss Weighting : Abstract: The rapid proliferation of Deep Neural Networks (DNNs) is driving a surge in model watermarking technologies, as the trained models themselves constitute valuable intellectual property. Exis...
Revisiting Long-Tailed Learning: Insights from an Architectural Perspective : Abstract: Long-Tailed (LT) recognition has been widely studied to tackle the challenge of imbalanced data distributions in real-world applications. However, the design of neural architectures for LT s...
Density-aware global-local attention network for point cloud segmentation : Abstract: 3D point cloud segmentation has a wide range of applications in areas such as autonomous driving, augmented reality, virtual reality and digital twins. The point cloud data collected in real...
Towards Collective Intelligence: Uncertainty-aware SAM Adaptation for Ambiguous Medical Image Segmentation : Abstract: Collective intelligence from multiple medical experts consistently surpasses individual expertise in clinical diagnosis, particularly for ambiguous medical image segmentation tasks involving...
Subjective and Objective Quality Evaluation of Super-Resolution Enhanced Broadcast Images on a Novel SR-IQA Dataset : Abstract: Super-Resolution (SR) is essential for displaying low-quality broadcast content on high-resolution screens. Recently, SR methods have been developed that not only increase resolution while p...
MeshCone: Second-Order Cone Programming for Geometrically-Constrained Mesh Enhancement : Abstract: Modern geometric generation methods rely heavily on deep learning methods that, while powerful, often lack interpretability and require extensive training data. This work introduces MeshCone...
FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation : Abstract: Accurate segmentation of neural structures in Electron Microscopy (EM) images is paramount for neuroscience. However, this task is challenged by intricate morphologies, low signal-to-noise r...
RobustGait: Robustness Analysis for Appearance Based Gait Recognition : Abstract: Appearance-based gait recognition have achieved strong performance on controlled datasets, yet systematic evaluation of its robustness to real-world corruptions and silhouette variability re...
Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving : Abstract: Modular design of planning-oriented autonomous driving has markedly advanced end-to-end systems. However, existing architectures remain constrained by an over-reliance on ego status, hinderi...
MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images : Abstract: Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks, reducing the resources and effort required for data transfer a...
CapeNext: Rethinking and refining dynamic support information for category-agnostic pose estimation : Abstract: Recent research in Category-Agnostic Pose Estimation (CAPE) has adopted fixed textual keypoint description as semantic prior for two-stage pose matching frameworks. While this paradigm enhan...
PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking : Abstract: Multi-object tracking (MOT) predominantly follows the tracking-by-detection paradigm, where Kalman filters serve as the standard motion predictor due to computational efficiency but inherent...
Low-Level Dataset Distillation for Medical Image Enhancement : Abstract: Medical image enhancement is clinically valuable, but existing methods require large-scale datasets to learn complex pixel-level mappings. However, the substantial training and storage costs...
DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection : Abstract: The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy viola...
Learning Implicit Neural Degradation Representation for Unpaired Image Dehazing : Abstract: Image dehazing is an important task in the field of computer vision, aiming at restoring clear and detail-rich visual content from haze-affected images. However, when dealing with complex sc...
Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining : Abstract: Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have mad...
A Lightweight 3D Anomaly Detection Method with Rotationally Invariant Features : Abstract: 3D anomaly detection (AD) is a crucial task in computer vision, aiming to identify anomalous points or regions from point cloud data. However, existing methods may encounter challenges when ...
CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model : Abstract: Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task. Recent advances in video diffusion models have demonstrated strong temporal reason...
VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language : Abstract: Jailbreak attacks can circumvent model safety guardrails and reveal critical blind spots. Prior attacks on text-to-video (T2V) models typically add adversarial perturbations to obviously uns...
Shedding Light on VLN Robustness: A Black-box Framework for Indoor Lighting-based Adversarial Attack : Abstract: Vision-and-Language Navigation (VLN) agents have made remarkable progress, but their robustness remains insufficiently studied. Existing adversarial evaluations often rely on perturbations t...
MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation : Abstract: As Vision-Language Models (VLMs) increasingly gain traction in medical applications, clinicians are progressively expecting AI systems not only to generate textual diagnoses but also to prod...
WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection : Abstract: 3D object detection is critical for autonomous driving, yet it remains fundamentally challenging to simultaneously maximize computational efficiency and capture long-range spatial dependenci...
Automated Road Distress Detection Using Vision Transformersand Generative Adversarial Networks : Abstract: The American Society of Civil Engineers has graded Americas infrastructure condition as a C, with the road system receiving a dismal D. Roads are vital to regional economic viability, yet th...
Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification : Abstract: Multimodal pretraining has revolutionized visual understanding, but its impact on video-based person re-identification (ReID) remains underexplored. Existing approaches often rely on video-t...
SOMA: Feature Gradient Enhanced Affine-Flow Matching for SAR-Optical Registration : Abstract: Achieving pixel-level registration between SAR and optical images remains a challenging task due to their fundamentally different imaging mechanisms and visual characteristics. Although deep...
THIR: Topological Histopathological Image Retrieval : Abstract: According to the World Health Organization, breast cancer claimed the lives of approximately 685,000 women in 2020. Early diagnosis and accurate clinical decision making are critical in redu...
HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution : Abstract: Diffusion-based methods have shown great promise in single image super-resolution (SISR); however, existing approaches often produce blurred fine details due to insufficient guidance in the ...
GenTract: Generative Global Tractography : Abstract: Tractography is the process of inferring the trajectories of white-matter pathways in the brain from diffusion magnetic resonance imaging (dMRI). Local tractography methods, which construct ...
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework : Abstract: Foundation models have revolutionized artificial intelligence across numerous domains, yet their transformative potential remains largely untapped in Extreme Multi-label Classification (XMC)...
Video Spatial Reasoning with Object-Centric 3D Rollout : Abstract: Recent advances in Multi-modal Large Language Models (MLLMs) have showcased remarkable capabilities in vision-language understanding. However, enabling robust video spatial reasoning-the abi...
Birth of a Painting: Differentiable Brushstroke Reconstruction : Abstract: Painting embodies a unique form of visual storytelling, where the creation process is as significant as the final artwork. Although recent advances in generative models have enabled visually...
Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection : Abstract: Monocular 3D object detection is a cost-effective solution for applications like autonomous driving and robotics, but remains fundamentally ill-posed due to inherently ambiguous depth cues. ...
Self-Supervised Ultrasound Screen Detection : Abstract: Ultrasound (US) machines display images on a built-in monitor, but routine transfer to hospital systems relies on DICOM. We propose a self-supervised pipeline to extract the US image from a ...
RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection : Abstract: Weakly-Supervised Video Anomaly Detection aims to identify anomalous events using only video-level labels, balancing annotation efficiency with practical applicability. However, existing met...
End-to-End Multi-Person Pose Estimation with Pose-Aware Video Transformer : Abstract: Existing multi-person video pose estimation methods typically adopt a two-stage pipeline: detecting individuals in each frame, followed by temporal modeling for single-person pose estimation...
3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale : Abstract: Despite recent advancements in 3D-text cross-modal alignment, existing state-of-the-art methods still struggle to align fine-grained textual semantics with detailed geometric structures, and...
Hybrid-Domain Adaptative Representation Learning for Gaze Estimation : Abstract: Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significa...
MRIQT: Physics-Aware Diffusion Model for Image Quality Transfer in Neonatal Ultra-Low-Field MRI : Abstract: Portable ultra-low-field MRI (uLF-MRI, 0.064 T) offers accessible neuroimaging for neonatal care but suffers from low signal-to-noise ratio and poor diagnostic quality compared to high-field...
MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection : Abstract: Multimodal misinformation floods on various social media, and continues to evolve in the era of AI-generated content (AIGC). The emerged misinformation with low creation cost and high decept...
Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention : Abstract: Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions. Previous research has transforme...
GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models : Abstract: Large multimodal models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks, however their knowledge and abilities in the cross-view geo-localization and pose estim...
Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges : Abstract: Driven by recent advances in vision language models (VLMs) and egocentric perception research, we introduce the concept of an egocentric procedural AI assistant (EgoProceAssist) tailored to ...
SymGS : Leveraging Local Symmetries for 3D Gaussian Splatting Compression : Abstract: 3D Gaussian Splatting has emerged as a transformative technique in novel view synthesis, primarily due to its high rendering speed and photorealistic fidelity. However, its memory footprint ...
Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation : Abstract: Vision-Language Models (VLMs), leveraging their powerful visual perception and reasoning capabilities, have been widely applied in Unmanned Aerial Vehicle (UAV) tasks. However, the spatial i...
Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models : Abstract: We address the challenge of detecting rare and diverse anomalies in surveillance videos using only video-level supervision. Our dual-backbone framework combines convolutional and transformer...
SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting : Abstract: Lightweight building surface models are crucial for digital city, navigation, and fast geospatial analytics, yet conventional multi-view geometry pipelines remain cumbersome and quality-sens...
Towards Metric-Aware Multi-Person Mesh Recovery by Jointly Optimizing Human Crowd in Camera Space : Abstract: Multi-person human mesh recovery from a single image is a challenging task, hindered by the scarcity of in-the-wild training data. Prevailing in-the-wild human mesh pseudo-ground-truth (pGT)...
TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing : Abstract: Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions. Existing Multim...
SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design : Abstract: Artistic design such as poster design often demands rapid yet precise modification of textual content while preserving visual harmony and typographic intent, especially across diverse font s...
CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving : Abstract: End-to-end planning methods are the de facto standard of the current autonomous driving system, while the robustness of the data-driven approaches suffers due to the notorious long-tail prob...
DriveLiDAR4D: Sequential and Controllable LiDAR Scene Generation for Autonomous Driving : Abstract: The generation of realistic LiDAR point clouds plays a crucial role in the development and evaluation of autonomous driving systems. Although recent methods for 3D LiDAR point cloud generati...
Computer Vision based group activity detection and action spotting : Abstract: Group activity detection in multi-person scenes is challenging due to complex human interactions, occlusions, and variations in appearance over time. This work presents a computer vision bas...
YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection : Abstract: This paper presents a novel Mixture-of-Experts framework for object detection, incorporating adaptive routing among multiple YOLOv9-T experts to enable dynamic feature specialization and ach...
Semi-Supervised Multi-Task Learning for Interpretable Quality As- sessment of Fundus Images : Abstract: Retinal image quality assessment (RIQA) supports computer-aided diagnosis of eye diseases. However, most tools classify only overall image quality, without indicating acquisition defects to ...
Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model : Abstract: Recently, the Denoising Diffusion Codebook Models (DDCM) was proposed. DDCM leverages the Denoising Diffusion Probabilistic Model (DDPM) and replaces the random noise in the backward process...
Descriptor: Distance-Annotated Traffic Perception Question Answering (DTPQA) : Abstract: The remarkable progress of Vision-Language Models (VLMs) on a variety of tasks has raised interest in their application to automated driving. However, for these models to be trusted in such ...
TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing : Abstract: Scene Text Editing (STE) aims to naturally modify text in images while preserving visual consistency, the decisive factors of which can be divided into three parts, i.e., text style, text co...
What Color Is It? A Text-Interference Multimodal Hallucination Benchmark : Abstract: With the rapid advancement of Large Models, numerous text-and-vision-fused Multimodal Large Models (MLMs) have emerged. However, these MLMs remain susceptible to informational interference i...
Delineate Anything Flow: Fast, Country-Level Field Boundary Detection from Any Source : Abstract: Accurate delineation of agricultural field boundaries from satellite imagery is essential for land management and crop monitoring, yet existing methods often produce incomplete boundaries, m...
VOPE: Revisiting Hallucination of Vision-Language Models in Voluntary Imagination Task : Abstract: Most research on hallucinations in Large Vision-Language Models (LVLMs) focuses on factual description tasks that prohibit any output absent from the image. However, little attention has bee...
FUSE: A Flow-based Mapping Between Shapes : Abstract: We introduce a novel neural representation for maps between 3D shapes based on flow-matching models, which is computationally efficient and supports cross-representation shape matching witho...
Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline : Abstract: With the rapid advancement of artificial intelligence-generated content (AIGC) technologies, including multimodal large language models (MLLMs) and diffusion models, image generation and man...
InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE : Abstract: Generating high-quality human interactions holds significant value for applications like virtual reality and robotics. However, existing methods often fail to preserve unique individual char...
Language-Guided Invariance Probing of Vision-Language Models : Abstract: Recent vision-language models (VLMs) such as CLIP, OpenCLIP, EVA02-CLIP and SigLIP achieve strong zero-shot performance, but it is unclear how reliably they respond to controlled linguistic ...
Mapping the Vanishing and Transformation of Urban Villages in China : Abstract: Urban villages (UVs), informal settlements embedded within China's urban fabric, have undergone widespread demolition and redevelopment in recent decades. However, there remains a lack of sy...
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems : Abstract: In ill-posed imaging inverse problems, uncertainty quantification remains a fundamental challenge, especially in safety-critical applications. Recently, conformal prediction has been used to...
Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color Skew : Abstract: As machine learning models are increasingly deployed in safety-critical domains, visual explanation techniques have become essential tools for supporting transparency. In this work, we revea...
Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks : Abstract: The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these...
TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images : Abstract: Monocular height estimation plays a critical role in 3D perception for remote sensing, offering a cost-effective alternative to multi-view or LiDAR-based methods. While deep learning has sig...
Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a leading framework for novel view synthesis, yet its core optimization challenges remain underexplored. We identify two key issues in 3DGS optimi...
Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification : Abstract: Person re-identification (ReID) aims to retrieve target pedestrian images given either visual queries (image-to-image, I2I) or textual descriptions (text-to-image, T2I). Although both tasks ...
Adaptive Multi-Scale Integration Unlocks Robust Cell Annotation in Histopathology Images : Abstract: Identifying cell types and subtypes from routine histopathology images is essential for improving the computational understanding of human disease. Existing tile-based models can capture det...
VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping : Abstract: Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Alth...
ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement : Abstract: Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant prog...
Tissue Aware Nuclei Detection and Classification Model for Histopathology Images : Abstract: Accurate nuclei detection and classification are fundamental to computational pathology, yet existing approaches are hindered by reliance on detailed expert annotations and insufficient use ...
A Real-Time Driver Drowsiness Detection System Using MediaPipe and Eye Aspect Ratio : Abstract: One of the major causes of road accidents is driver fatigue that causes thousands of fatalities and injuries every year. This study shows development of a Driver Drowsiness Detection System ...
Alpha Divergence Losses for Biometric Verification : Abstract: Performance in face and speaker verification is largely driven by margin based softmax losses like CosFace and ArcFace. Recently introduced $α$-divergence loss functions offer a compelling a...
CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding : Abstract: Long-form video question answering (VQA) overwhelms current vision-language models (VLMs) because attention and key-value (KV) caches grow with runtime, forcing either expensive inference or...
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model : Abstract: We introduce Part-X-MLLM, a native 3D multimodal large language model that unifies diverse 3D tasks by formulating them as programs in a structured, executable grammar. Given an RGB point cl...
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image : Abstract: 3D modeling is shifting from static visual representations toward physical, articulated assets that can be directly used in simulation and interaction. However, most existing 3D generation m...
Distribution Matching Distillation Meets Reinforcement Learning : Abstract: Distribution Matching Distillation (DMD) distills a pre-trained multi-step diffusion model to a few-step one to improve inference efficiency. However, the performance of the latter is often ...
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models : Abstract: The rapid evolution of video generative models has shifted their focus from producing visually plausible outputs to tackling tasks requiring physical plausibility and logical consistency. Ho...
Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine : Abstract: Recent advances in text-to-image (T2I) diffusion models have significantly improved semantic image editing, yet most methods fall short in performing 3D-aware object manipulation. In this wo...
Segment Anything Across Shots: A Method and Benchmark : Abstract: This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple sho...
Back to Basics: Let Denoising Generative Models Denoise : Abstract: Today's denoising diffusion models do not "denoise" in the classical sense, i.e., they do not directly predict clean images. Rather, the neural networks predict noise or a noised quantity. I...
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature : Abstract: Tendrils coil their shape to anchor the plant to supporting structures, allowing vertical growth toward light. Although climbing plants have been studied for a long time, extracting informat...
Slow - Motion Video Synthesis for Basketball Using Frame Interpolation : Abstract: Basketball broadcast footage is traditionally captured at 30-60 fps, limiting viewers' ability to appreciate rapid plays such as dunks and crossovers. We present a real-time slow-motion synt...
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks : Abstract: Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers but faces significant communication bottlenecks when transmitting interm...
Understanding the Representation of Older Adults in Motion Capture Locomotion Datasets : Abstract: The Internet of Things (IoT) sensors have been widely employed to capture human locomotions to enable applications such as activity recognition, human pose estimation, and fall detection. Mo...
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy: A Review : Abstract: With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robot...
End to End AI System for Surgical Gesture Sequence Recognition and Clinical Outcome Prediction : Abstract: Fine-grained analysis of intraoperative behavior and its impact on patient outcomes remain a longstanding challenge. We present Frame-to-Outcome (F2O), an end-to-end system that translates t...
TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space : Abstract: The recent surge in video generation has shown the growing demand for high-quality video synthesis using large vision models. Existing video generation models are predominantly based on the ...
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces ...
Recursive Threshold Median Filter and Autoencoder for Salt-and-Pepper Denoising: SSIM analysis of Images and Entropy Maps : Abstract: This paper studies the removal of salt-and-pepper noise from images using median filter (MF) and simple three-layer autoencoder (AE) within recursive threshold algorithm. The performance of ...
AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos : Abstract: Unplanned extubation (UE) remains a critical patient safety concern in intensive care units (ICUs), often leading to severe complications or death. Real-time UE detection has been limited, l...
Deep Unfolded BM3D: Unrolling Non-local Collaborative Filtering into a Trainable Neural Network : Abstract: Block-Matching and 3D Filtering (BM3D) exploits non-local self-similarity priors for denoising but relies on fixed parameters. Deep models such as U-Net are more flexible but often lack inte...
Bregman geometry-aware split Gibbs sampling for Bayesian Poisson inverse problems : Abstract: This paper proposes a novel Bayesian framework for solving Poisson inverse problems by devising a Monte Carlo sampling algorithm which accounts for the underlying non-Euclidean geometry. To ...
Multimodal RGB-HSI Feature Fusion with Patient-Aware Incremental Heuristic Meta-Learning for Oral Lesion Classification : Abstract: Early detection of oral cancer and potentially malignant disorders is challenging in low-resource settings due to limited annotated data. We present a unified four-class oral lesion classifi...
RAA-MIL: A Novel Framework for Classification of Oral Cytology : Abstract: Cytology is a valuable tool for early detection of oral squamous cell carcinoma (OSCC). However, manual examination of cytology whole slide images (WSIs) is slow, subjective, and depends hea...
MTMed3D: A Multi-Task Transformer-Based Model for 3D Medical Imaging : Abstract: In the field of medical imaging, AI-assisted techniques such as object detection, segmentation, and classification are widely employed to alleviate the workload of physicians and doctors. Ho...
DEMIST: \underline{DE}coupled \underline{M}ulti-stream latent d\underline{I}ffusion for Quantitative Myelin Map \underline{S}yn\underline{T}hesis : Abstract: Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, ...
Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex : Abstract: Scenes are complex, yet structured collections of parts, including objects and surfaces, that exhibit spatial and semantic relations to one another. An effective visual system therefore need...
Improving the Generalisation of Learned Reconstruction Frameworks : Abstract: Ensuring proper generalization is a critical challenge in applying data-driven methods for solving inverse problems in imaging, as neural networks reconstructing an image must perform well a...
BrainNormalizer: Anatomy-Informed Pseudo-Healthy Brain Reconstruction from Tumor MRI via Edge-Guided ControlNet : Abstract: Brain tumors are among the most clinically significant neurological diseases and remain a major cause of morbidity and mortality due to their aggressive growth and structural heterogeneity. ...
Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration : Abstract: 3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences. However, existing...
Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models : Abstract: Automated operation in cross-platform strategy games demands agents with robust generalization across diverse user interfaces and dynamic battlefield conditions. While vision-language models...
Inertia-Informed Orientation Priors for Event-Based Optical Flow Estimation : Abstract: Event cameras, by virtue of their working principle, directly encode motion within a scene. Many learning-based and model-based methods exist that estimate event-based optical flow, however ...
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization : Abstract: Multimodal large language models (MLLMs) have demonstrated impressive reasoning and instruction-following capabilities, yet their expanded modality space introduces new compositional safety ...
Scalable Vision-Guided Crop Yield Estimation : Abstract: Precise estimation and uncertainty quantification for average crop yields are critical for agricultural monitoring and decision making. Existing data collection methods, such as crop cuts in...
ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks : Abstract: Ultra-high-resolution (UHR) remote sensing (RS) images offer rich fine-grained information but also present challenges in effective processing. Existing dynamic resolution and token pruning ...
TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation : Abstract: Medical image segmentation is essential for clinical diagnosis and treatment planning. Although transformer-based methods have achieved remarkable results, their high computational cost hind...
One target to align them all: LiDAR, RGB and event cameras extrinsic calibration for Autonomous Driving : Abstract: We present a novel multi-modal extrinsic calibration framework designed to simultaneously estimate the relative poses between event cameras, LiDARs, and RGB cameras, with particular focus on...
Rethinking Bias in Generative Data Augmentation for Medical AI: a Frequency Recalibration Method : Abstract: Developing Medical AI relies on large datasets and easily suffers from data scarcity. Generative data augmentation (GDA) using AI generative models offers a solution to synthesize realistic ...
LiDAR-GS++:Improving LiDAR Gaussian Reconstruction via Diffusion Priors : Abstract: Recent GS-based rendering has made significant progress for LiDAR, surpassing Neural Radiance Fields (NeRF) in both quality and speed. However, these methods exhibit artifacts in extrapolate...
SpaceVLM: Sub-Space Modeling of Negation in Vision-Language Models : Abstract: Vision-Language Models (VLMs) struggle with negation. Given a prompt like "retrieve (or generate) a street scene without pedestrians," they often fail to respect the "not." Existing methods ...
Explainable AI-Generated Image Detection RewardBench : Abstract: Conventional, classification-based AI-generated image detection methods cannot explain why an image is considered real or AI-generated in a way a human expert would, which reduces the trustw...
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning : Abstract: Visual reasoning may require models to interpret images and videos and respond to implicit text queries across diverse output formats, from pixel-level segmentation masks to natural language...
Fast Reasoning Segmentation for Images and Videos : Abstract: Reasoning segmentation enables open-set object segmentation via implicit text queries, therefore serving as a foundation for embodied agents that should operate autonomously in real-world en...
Changes in Real Time: Online Scene Change Detection with Multi-View Fusion : Abstract: Online Scene Change Detection (SCD) is an extremely challenging problem that requires an agent to detect relevant changes on the fly while observing the scene from unconstrained viewpoints. ...
Reasoning Text-to-Video Retrieval via Digital Twin Video Representations and Large Language Models : Abstract: The goal of text-to-video retrieval is to search large databases for relevant videos based on text queries. Existing methods have progressed to handling explicit queries where the visual con...
Leveraging Quantum-Based Architectures for Robust Diagnostics : Abstract: The objective of this study is to diagnose and differentiate kidney stones, cysts, and tumors using Computed Tomography (CT) images of the kidney. This study leverages a hybrid quantum-class...
Calibrated Decomposition of Aleatoric and Epistemic Uncertainty in Deep Features for Inference-Time Adaptation : Abstract: Most estimators collapse all uncertainty modes into a single confidence score, preventing reliable reasoning about when to allocate more compute or adjust inference. We introduce Uncertainty...
MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting : Abstract: We introduce MSLoRA, a backbone-agnostic, parameter-efficient adapter that reweights feature responses rather than re-tuning the underlying backbone. Existing low-rank adaptation methods a...
VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving : Abstract: Exploring open-world situations in an end-to-end manner is a promising yet challenging task due to the need for strong generalization capabilities. In particular, end-to-end autonomous drivi...
Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection : Abstract: The deployment of automated pavement defect detection is often hindered by poor cross-domain generalization. Supervised detectors achieve strong in-domain accuracy but require costly re-anno...
Towards Rotation-only Imaging Geometry: Rotation Estimation : Abstract: Structure from Motion (SfM) is a critical task in computer vision, aiming to recover the 3D scene structure and camera motion from a sequence of 2D images. The recent pose-only imaging geome...
Seeing Through the Rain: Resolving High-Frequency Conflicts in Deraining and Super-Resolution via Diffusion Guidance : Abstract: Clean images are crucial for visual tasks such as small object detection, especially at high resolutions. However, real-world images are often degraded by adverse weather, and weather restor...
MFI-ResNet: Efficient ResNet Architecture Optimization via MeanFlow Compression and Selective Incubation : Abstract: ResNet has achieved tremendous success in computer vision through its residual connection mechanism. ResNet can be viewed as a discretized form of ordinary differential equations (ODEs). Fro...
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning : Abstract: Vision-Language Models (VLMs) have achieved remarkable progress in multimodal reasoning and generation, yet their high computational demands remain a major challenge. Diffusion Vision-Langua...
Text-Guided Channel Perturbation and Pretrained Knowledge Integration for Unified Multi-Modality Image Fusion : Abstract: Multi-modality image fusion enhances scene perception by combining complementary information. Unified models aim to share parameters across modalities for multi-modality image fusion, but la...
CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training : Abstract: Medical visual question answering could support clinical decision making, yet current systems often fail under domain shift and produce answers that are weakly grounded in image evidence. Th...
MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning : Abstract: In supervised learning, traditional image masking faces two key issues: (i) discarded pixels are underutilized, leading to a loss of valuable contextual information; (ii) masking may remove ...
Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion : Abstract: Recent camera-based 3D semantic scene completion (SSC) methods have increasingly explored leveraging temporal cues to enrich the features of the current frame. However, while these approache...
Visible Structure Retrieval for Lightweight Image-Based Relocalisation : Abstract: Accurate camera pose estimation from an image observation in a previously mapped environment is commonly done through structure-based methods: by finding correspondences between 2D keypoints...
MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics : Abstract: Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result. However, existing methods 1) fail to account for the degradation visible...
D$^{2}$-VPR: A Parameter-efficient Visual-foundation-model-based Visual Place Recognition Method via Knowledge Distillation and Deformable Aggregation : Abstract: Visual Place Recognition (VPR) aims to determine the geographic location of a query image by retrieving its most visually similar counterpart from a geo-tagged reference database. Recently, ...
ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding : Abstract: Keyframe selection has become essential for video understanding with vision-language models (VLMs) due to limited input tokens and the temporal sparsity of relevant information across video ...
HiGFA: Hierarchical Guidance for Fine-grained Data Augmentation with Diffusion Models : Abstract: Generative diffusion models show promise for data augmentation. However, applying them to fine-grained tasks presents a significant challenge: ensuring synthetic images accurately capture th...
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis : Abstract: Visual Emotion Analysis (VEA) aims to bridge the affective gap between visual content and human emotional responses. Despite its promise, progress in this field remains limited by the lack o...
SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition : Abstract: Ultrasound standard plane recognition is essential for clinical tasks such as disease screening, organ evaluation, and biometric measurement. However, existing methods fail to effectively ex...
Through-Foliage Surface-Temperature Reconstruction for early Wildfire Detection : Abstract: We introduce a novel method for reconstructing surface temperatures through occluding forest vegetation by combining signal processing and machine learning. Our goal is to enable fully autom...
Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection : Abstract: Large Visual Language Models (LVLMs) now pose a serious yet overlooked privacy threat, as they can infer a social media user's geolocation directly from shared images, leading to unintended ...
TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction : Abstract: We present TempoMaster, a novel framework that formulates long video generation as next-frame-rate prediction. Specifically, we first generate a low-frame-rate clip that serves as a coarse b...
Rank-Aware Agglomeration of Foundation Models for Immunohistochemistry Image Cell Counting : Abstract: Accurate cell counting in immunohistochemistry (IHC) images is critical for quantifying protein expression and aiding cancer diagnosis. However, the task remains challenging due to the chrom...
Fine-Grained Representation for Lane Topology Reasoning : Abstract: Precise modeling of lane topology is essential for autonomous driving, as it directly impacts navigation and control decisions.Existing methods typically represent each lane with a single qu...
Seg-VAR: Image Segmentation with Visual Autoregressive Modeling : Abstract: While visual autoregressive modeling (VAR) strategies have shed light on image generation with the autoregressive models, their potential for segmentation, a task that requires precise low-l...
LoRA-Enhanced Vision Transformer for Single Image based Morphing Attack Detection via Knowledge Distillation from EfficientNet : Abstract: Face Recognition Systems (FRS) are critical for security but remain vulnerable to morphing attacks, where synthetic images blend biometric features from multiple individuals. We propose a no...
Pixels or Positions? Benchmarking Modalities in Group Activity Recognition : Abstract: Group Activity Recognition (GAR) is well studied on the video modality for surveillance and indoor team sports (e.g., volleyball, basketball). Yet, other modalities such as agent positions a...
Open-World Test-Time Adaptation with Hierarchical Feature Aggregation and Attention Affine : Abstract: Test-time adaptation (TTA) refers to adjusting the model during the testing phase to cope with changes in sample distribution and enhance the model's adaptability to new environments. In rea...
C3Net: Context-Contrast Network for Camouflaged Object Detection : Abstract: Camouflaged object detection identifies objects that blend seamlessly with their surroundings through similar colors, textures, and patterns. This task challenges both traditional segmentati...
Multivariate Diffusion Transformer with Decoupled Attention for High-Fidelity Mask-Text Collaborative Facial Generation : Abstract: While significant progress has been achieved in multimodal facial generation using semantic masks and textual descriptions, conventional feature fusion approaches often fail to enable effect...
Denoising Vision Transformer Autoencoder with Spectral Self-Regularization : Abstract: Variational autoencoders (VAEs) typically encode images into a compact latent space, reducing computational cost but introducing an optimization dilemma: a higher-dimensional latent space im...
Medical Knowledge Intervention Prompt Tuning for Medical Image Classification : Abstract: Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning th...
DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry : Abstract: Deep learning-based Visual SLAM (vSLAM) systems exhibit exceptional geometric reasoning capabilities, yet their prohibitive computational overhead severely restricts deployment on resource-c...
Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis : Abstract: Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic da...
Hi-Reco: High-Fidelity Real-Time Conversational Digital Humans : Abstract: High-fidelity digital humans are increasingly used in interactive applications, yet achieving both visual realism and real-time responsiveness remains a major challenge. We present a high-fi...
DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality : Abstract: In this work, we propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images in the proposed non-causal selective state space for dense perc...
Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis : Abstract: The goal of Novel View Synthesis (NVS) is to generate realistic images of a given content from unseen viewpoints. But how can we trust that a generated image truly reflects the intended tran...
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections : Abstract: Deploying embodied agents that can answer questions about their surroundings in realistic real-world settings remains difficult, partly due to the scarcity of benchmarks that faithfully capt...
R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection : Abstract: Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R$^{2}$Seg, a trainin...
HEDGE: Hallucination Estimation via Dense Geometric Entropy for VQA with Vision-Language Models : Abstract: Vision-language models (VLMs) enable open-ended visual question answering but remain prone to hallucinations. We present HEDGE, a unified framework for hallucination detection that combines ...
Counting Through Occlusion: Framework for Open World Amodal Counting : Abstract: Object counting has achieved remarkable success on visible instances, yet state-of-the-art (SOTA) methods fail under occlusion, a pervasive challenge in real world deployment. This failure s...
FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling : Abstract: Understanding where drivers look and why they shift their attention is essential for autonomous systems that read human intent and justify their actions. Most existing models rely on large-s...
Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning : Abstract: Open-vocabulary object detectors (OVODs) unify vision and language to detect arbitrary object categories based on text prompts, enabling strong zero-shot generalization to novel concepts. As...
Direct Visual Grounding by Directing Attention of Visual Tokens : Abstract: Vision Language Models (VLMs) mix visual tokens and text tokens. A puzzling issue is the fact that visual tokens most related to the query receive little to no attention in the final layers ...
Deep Imbalanced Multi-Target Regression: 3D Point Cloud Voxel Content Estimation in Simulated Forests : Abstract: Voxelization is an effective approach to reduce the computational cost of processing Light Detection and Ranging (LiDAR) data, yet it results in a loss of fine-scale structural information. ...
SAGE: Saliency-Guided Contrastive Embeddings : Abstract: Integrating human perceptual priors into the training of neural networks has been shown to raise model generalization, serve as an effective regularizer, and align models with human expertis...
Which Way from B to A: The role of embedding geometry in image interpolation for Stable Diffusion : Abstract: It can be shown that Stable Diffusion has a permutation-invariance property with respect to the rows of Contrastive Language-Image Pretraining (CLIP) embedding matrices. This inspired the no...
Lightweight Optimal-Transport Harmonization on Edge Devices : Abstract: Color harmonization adjusts the colors of an inserted object so that it perceptually matches the surrounding image, resulting in a seamless composite. The harmonization problem naturally ari...
Enhancing Neuro-Oncology Through Self-Assessing Deep Learning Models for Brain Tumor Unified Model for MRI Segmentation : Abstract: Accurate segmentation of brain tumors is vital for diagnosis, surgical planning, and treatment monitoring. Deep learning has advanced on benchmarks, but two issues limit clinical use: no unc...
MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection : Abstract: Camouflaged object detection is an emerging and challenging computer vision task that requires identifying and segmenting objects that blend seamlessly into their environments due to high si...
SAGA: Source Attribution of Generative AI Videos : Abstract: The proliferation of generative AI has led to hyper-realistic synthetic videos, escalating misuse risks and outstripping binary real/fake detectors. We introduce SAGA (Source Attribution of ...
Video Finetuning Improves Reasoning Between Frames : Abstract: Multimodal large language models (LLMs) have made rapid progress in visual understanding, yet their extension from images to videos often reduces to a naive concatenation of frame tokens. In...
View-aware Cross-modal Distillation for Multi-view Action Recognition : Abstract: The widespread use of multi-sensor systems has increased research in multi-view action recognition. While existing approaches in multi-view setups with fully overlapping sensors benefit from...
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views : Abstract: Analyzing hand-object interaction in egocentric vision facilitates VR/AR applications and human-robot policy transfer. Existing research has mostly focused on modeling the behavior paradigm ...
Simple Lines, Big Ideas: Towards Interpretable Assessment of Human Creativity from Drawings : Abstract: Assessing human creativity through visual outputs, such as drawings, plays a critical role in fields including psychology, education, and cognitive science. However, current assessment pract...
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation : Abstract: Visual Autoregressive (VAR) models enable efficient image generation via next-scale prediction but face escalating computational costs as sequence length grows. Existing static pruning metho...
Reconstructing 3D Scenes in Native High Dynamic Range : Abstract: High Dynamic Range (HDR) imaging is essential for professional digital media creation, e.g., filmmaking, virtual production, and photorealistic rendering. However, 3D scene reconstruction ha...
FDP: A Frequency-Decomposition Preprocessing Pipeline for Unsupervised Anomaly Detection in Brain MRI : Abstract: Due to the diversity of brain anatomy and the scarcity of annotated data, supervised anomaly detection for brain MRI remains challenging, driving the development of unsupervised anomaly dete...
DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning : Abstract: Sports video understanding presents unique challenges, requiring models to perceive high-speed dynamics, comprehend complex rules, and reason over long temporal contexts. While Multimodal La...
CASL: Curvature-Augmented Self-supervised Learning for 3D Anomaly Detection : Abstract: Deep learning-based 3D anomaly detection methods have demonstrated significant potential in industrial manufacturing. However, many approaches are specifically designed for anomaly detection...
Explore How to Inject Beneficial Noise in MLLMs : Abstract: Multimodal Large Language Models (MLLMs) have played an increasingly important role in multimodal intelligence. However, the existing fine-tuning methods often ignore cross-modal heterogenei...
CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation : Abstract: Object 6D pose estimation, a crucial task for robotics and augmented reality applications, becomes particularly challenging when dealing with novel objects whose 3D models are not readily av...
Generative Photographic Control for Scene-Consistent Video Cinematic Editing : Abstract: Cinematic storytelling is profoundly shaped by the artful manipulation of photographic elements such as depth of field and exposure. These effects are crucial in conveying mood and creating ...
Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes : Abstract: With the rapid advancement of intelligent transportation systems, text-driven image generation and editing techniques have demonstrated significant potential in providing rich, controllable ...
PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos : Abstract: We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from ``Outfit of the Day'' (OOTD) photos, which exhibit diverse poses, occlusions, and comple...
ProtoAnomalyNCD: Prototype Learning for Multi-class Novel Anomaly Discovery in Industrial Scenarios : Abstract: Existing industrial anomaly detection methods mainly determine whether an anomaly is present. However, real-world applications also require discovering and classifying multiple anomaly types...
Semi-Supervised High Dynamic Range Image Reconstructing via Bi-Level Uncertain Area Masking : Abstract: Reconstructing high dynamic range (HDR) images from low dynamic range (LDR) bursts plays an essential role in the computational photography. Impressive progress has been achieved by learning...
Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention : Abstract: Recent advancements in video generation have demonstrated the potential of using video diffusion models as world models, with autoregressive generation of infinitely long videos through mask...
T2I-Based Physical-World Appearance Attack against Traffic Sign Recognition Systems in Autonomous Driving : Abstract: Traffic Sign Recognition (TSR) systems play a critical role in Autonomous Driving (AD) systems, enabling real-time detection of road signs, such as STOP and speed limit signs. While these sy...
EndoSight AI: Deep Learning-Driven Real-Time Gastrointestinal Polyp Detection and Segmentation for Enhanced Endoscopic Diagnostics : Abstract: Precise and real-time detection of gastrointestinal polyps during endoscopic procedures is crucial for early diagnosis and prevention of colorectal cancer. This work presents EndoSight AI, a...
CalibrateMix: Guided-Mixup Calibration of Image Semi-Supervised Models : Abstract: Semi-supervised learning (SSL) has demonstrated high performance in image classification tasks by effectively utilizing both labeled and unlabeled data. However, existing SSL methods often s...
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models : Abstract: Concept erasure aims to remove harmful, inappropriate, or copyrighted content from text-to-image diffusion models while preserving non-target semantics. However, existing methods either rely...
HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology : Abstract: Spatial transcriptomics (ST) bridges gene expression and tissue morphology but faces clinical adoption barriers due to technical complexity and prohibitive costs. While computational methods...
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes : Abstract: Building interactive simulators and scalable robot-learning environments requires a large number of articulated assets. However, most existing 3D assets in simulation are rigid, and manually...
Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach : Abstract: Contrastive vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition yet remain vulnerable to spurious correlations, particularly background over-reliance. We introduc...
UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective : Abstract: The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset...
Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection : Abstract: In the domain of non-generative visual counterfactual explanations (CE), traditional techniques frequently involve the substitution of sections within a query image with corresponding sectio...
PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching : Abstract: Image retouching aims to enhance visual quality while aligning with users' personalized aesthetic preferences. To address the challenge of balancing controllability and subjectivity, we prop...
Medal S: Spatio-Textual Prompt Model for Medical Segmentation : Abstract: We introduce Medal S, a medical segmentation foundation model that supports native-resolution spatial and textual prompts within an end-to-end trainable framework. Unlike text-only methods l...
Infinite-Story: A Training-Free Consistent Text-to-Image Generation : Abstract: We present Infinite-Story, a training-free framework for consistent text-to-image (T2I) generation tailored for multi-prompt storytelling scenarios. Built upon a scale-wise autoregressive mo...
SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias : Abstract: Large vision-language models, such as CLIP, have shown strong zero-shot classification performance by aligning images and text in a shared embedding space. However, CLIP models often develop...
Beyond Darkness: Thermal-Supervised 3D Gaussian Splatting for Low-Light Novel View Synthesis : Abstract: Under extremely low-light conditions, novel view synthesis (NVS) faces severe degradation in terms of geometry, color consistency, and radiometric stability. Standard 3D Gaussian Splatting (...
You Only Look Omni Gradient Backpropagation for Moving Infrared Small Target Detection : Abstract: Moving infrared small target detection is a key component of infrared search and tracking systems, yet it remains extremely challenging due to low signal-to-clutter ratios, severe target-bac...
Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues : Abstract: Universal Photometric Stereo is a promising approach for recovering surface normals without strict lighting assumptions. However, it struggles when multi-illumination cues are unreliable, su...
SpectralAdapt: Semi-Supervised Domain Adaptation with Spectral Priors for Human-Centered Hyperspectral Image Reconstruction : Abstract: Hyperspectral imaging (HSI) holds great potential for healthcare due to its rich spectral information. However, acquiring HSI data remains costly and technically demanding. Hyperspectral ima...
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding : Abstract: Self-reflection mechanisms that rely on purely text-based rethinking processes perform well in most multimodal tasks. However, when directly applied to long-form video understanding scenario...
Towards 3D Object-Centric Feature Learning for Semantic Scene Completion : Abstract: Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by ag...
Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts : Abstract: We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a singl...
uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data : Abstract: Contrastive Language-Image Pre-training (CLIP) has demonstrated strong generalization across a wide range of visual tasks by leveraging large-scale English-image pairs. However, its extensio...
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization : Abstract: Open-Vocabulary Temporal Action Localization (OV-TAL) aims to recognize and localize instances of any desired action categories in videos without explicitly curating training data for all ca...
DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation : Abstract: Indoor semantic segmentation is fundamental to computer vision and robotics, supporting applications such as autonomous navigation, augmented reality, and smart environments. Although RGB-D ...
ViSS-R1: Self-Supervised Reinforcement Video Reasoning : Abstract: Complex video reasoning remains a significant challenge for Multimodal Large Language Models (MLLMs), as current R1-based methodologies often prioritize text-centric reasoning derived from t...
Monocular 3D Lane Detection via Structure Uncertainty-Aware Network with Curve-Point Queries : Abstract: Monocular 3D lane detection is challenged by aleatoric uncertainty arising from inherent observation noise. Existing methods rely on simplified geometric assumptions, such as independent poi...
LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions : Abstract: Members of the Human-Robot Interaction (HRI) and Machine Learning (ML) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural langu...
Psychological stress during Examination and its estimation by handwriting in answer script : Abstract: This research explores the fusion of graphology and artificial intelligence to quantify psychological stress levels in students by analyzing their handwritten examination scripts. By leverag...
Real-time pothole detection with onboard sensors and camera on vehicles : Abstract: Road conditions play an important role in our everyday commute. With the proliferating number of vehicles on the road each year, it has become necessary to access the road conditions very fr...
A Method for Identifying Farmland System Habitat Types Based on the Dynamic-Weighted Feature Fusion Network Model : Abstract: Addressing the current lack of a standardized habitat classification system for cultivated land ecosystems, incomplete coverage of habitat types, and the inability of existing models to effe...
AGENet: Adaptive Edge-aware Geodesic Distance Learning for Few-Shot Medical Image Segmentation : Abstract: Medical image segmentation requires large annotated datasets, creating a significant bottleneck for clinical applications. While few-shot segmentation methods can learn from minimal examples...
EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance : Abstract: Recent approaches for few-shot 3D point cloud semantic segmentation typically require a two-stage learning process, i.e., a pre-training stage followed by a few-shot training stage. While ef...
Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement : Abstract: Understanding 3D scene-level affordances from natural language instructions is essential for enabling embodied agents to interact meaningfully in complex environments. However, this task rem...
LE-CapsNet: A Light and Enhanced Capsule Network : Abstract: Capsule Network (CapsNet) classifier has several advantages over CNNs, including better detection of images containing overlapping categories and higher accuracy on transformed images. Despi...
Target-Balanced Score Distillation : Abstract: Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-sm...
CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition : Abstract: Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute...
AdaptFly: Prompt-Guided Adaptation of Foundation Models for Low-Altitude UAV Networks : Abstract: Low-altitude Unmanned Aerial Vehicle (UAV) networks rely on robust semantic segmentation as a foundational enabler for distributed sensing-communication-control co-design across heterogeneou...
Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video : Abstract: Typically, children start to learn their first words between 6 and 9 months, linking spoken utterances to their visual referents. Without prior knowledge, a word encountered for the first ti...
GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion : Abstract: Effectively modeling multimodal spatial omics data is critical for understanding tissue complexity and underlying biological mechanisms. While spatial transcriptomics, proteomics, and epigen...
Exposing DeepFakes via Hyperspectral Domain Mapping : Abstract: Modern generative and diffusion models produce highly realistic images that can mislead human perception and even sophisticated automated detection systems. Most detection methods operate in...
Toward bilipshiz geometric models : Abstract: Many neural networks for point clouds are, by design, invariant to the symmetries of this datatype: permutations and rigid motions. The purpose of this paper is to examine whether such netwo...
Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models : Abstract: Modern vision-language models (VLMs) deliver impressive predictive accuracy yet offer little insight into 'why' a decision is reached, frequently hallucinating facts, particularly when encou...
Batch Transformer Architecture: Case of Synthetic Image Generation for Emotion Expression Facial Recognition : Abstract: A novel Transformer variation architecture is proposed in the implicit sparse style. Unlike "traditional" Transformers, instead of attention to sequential or batch entities in their entirety...
Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing : Abstract: Recent advances in text-to-image generation have produced strong single-shot models, yet no individual system reliably executes the long, compositional prompts typical of creative workflows....
SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction : Abstract: Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for r...
Defending Unauthorized Model Merging via Dual-Stage Weight Protection : Abstract: The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-...
FocusSDF: Boundary-Aware Learning for Medical Image Segmentation via Signed Distance Supervision : Abstract: Segmentation of medical images constitutes an essential component of medical image analysis, providing the foundation for precise diagnosis and efficient therapeutic interventions in clinica...
Lacking Data? No worries! How synthetic images can alleviate image scarcity in wildlife surveys: a case study with muskox (Ovibos moschatus) : Abstract: Accurate population estimates are essential for wildlife management, providing critical insights into species abundance and distribution. Traditional survey methods, including visual aerial ...
Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation : Abstract: High-resolution volumetric imaging techniques, such as X-ray tomography and advanced microscopy, generate increasingly large datasets that challenge existing tools for efficient processing, ...
Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks : Abstract: Vision-language foundation models (VLMs) show promise for diverse imaging tasks but often underperform on medical benchmarks. Prior efforts to improve performance include model finetuning, w...
PI-NAIM: Path-Integrated Neural Adaptive Imputation Model : Abstract: Medical imaging and multi-modal clinical settings often face the challange of missing modality in their diagnostic pipelines. Existing imputation methods either lack representational capacit...
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models : Abstract: Despite the recent advances in the video understanding ability of multimodal large language models (MLLMs), long video understanding remains a challenge. One of the main issues is that the n...
From Events to Clarity: The Event-Guided Diffusion Framework for Dehazing : Abstract: Clear imaging under hazy conditions is a critical task. Prior-based and neural methods have improved results. However, they operate on RGB frames, which suffer from limited dynamic range. Th...
Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs : Abstract: This study presents a comparative analysis of three U-Net-based architectures for semantic segmentation of rock art petroglyphs from Brazilian archaeological sites. The investigated architec...
From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology : Abstract: Fine-grained glomerular subtyping is central to kidney biopsy interpretation, but clinically valuable labels are scarce and difficult to obtain. Existing computational pathology approaches i...
BeyondFacial: Identity-Preserving Personalized Generation Beyond Facial Close-ups : Abstract: Identity-Preserving Personalized Generation (IPPG) has advanced film production and artistic creation, yet existing approaches overemphasize facial regions, resulting in outputs dominated by...
LithoSeg: A Coarse-to-Fine Framework for High-Precision Lithography Segmentation : Abstract: Accurate segmentation and measurement of lithography scanning electron microscope (SEM) images are crucial for ensuring precise process control, optimizing device performance, and advancing ...
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension : Abstract: Existing Weakly-Supervised Referring Expression Comprehension (WREC) methods, while effective, are fundamentally limited by a one-to-one mapping assumption, hindering their ability to handle...
Null-Space Diffusion Distillation for Efficient Photorealistic Lensless Imaging : Abstract: State-of-the-art photorealistic reconstructions for lensless cameras often rely on paired lensless-lensed supervision, which can bias models due to lens-lensless domain mismatch. To avoid th...
Bridging Vision and Language for Robust Context-Aware Surgical Point Tracking: The VL-SurgPT Dataset and Benchmark : Abstract: Accurate point tracking in surgical environments remains challenging due to complex visual conditions, including smoke occlusion, specular reflections, and tissue deformation. While existing...
GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory : Abstract: Long-video understanding remains a significant challenge for Multimodal Large Language Models (MLLMs) due to inherent token limitations and the complexity of capturing long-term temporal dep...
VPHO: Joint Visual-Physical Cue Learning and Aggregation for Hand-Object Pose Estimation : Abstract: Estimating the 3D poses of hands and objects from a single RGB image is a fundamental yet challenging problem, with broad applications in augmented reality and human-computer interaction. Ex...
Improved Masked Image Generation with Knowledge-Augmented Token Representations : Abstract: Masked image generation (MIG) has demonstrated remarkable efficiency and high-fidelity images by enabling parallel token prediction. Existing methods typically rely solely on the model itsel...
SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images : Abstract: Feed-forward 3D reconstruction from sparse, low-resolution (LR) images is a crucial capability for real-world applications, such as autonomous driving and embodied AI. However, existing meth...
FedSDA: Federated Stain Distribution Alignment for Non-IID Histopathological Image Classification : Abstract: Federated learning (FL) has shown success in collaboratively training a model among decentralized data resources without directly sharing privacy-sensitive training data. Despite recent adva...
DCMM-Transformer: Degree-Corrected Mixed-Membership Attention for Medical Imaging : Abstract: Medical images exhibit latent anatomical groupings, such as organs, tissues, and pathological regions, that standard Vision Transformers (ViTs) fail to exploit. While recent work like SBM-Tr...
DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training : Abstract: Deepfakes are major threats to the integrity of digital media. We propose DeiTFake, a DeiT-based transformer and a novel two-stage progressive training strategy with increasing augmentation ...
UniABG: Unified Adversarial View Bridging and Graph Correspondence for Unsupervised Cross-View Geo-Localization : Abstract: Cross-view geo-localization (CVGL) matches query images ($\textit{e.g.}$, drone) to geographically corresponding opposite-view imagery ($\textit{e.g.}$, satellite). While supervised methods ...
PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling : Abstract: Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered...
MovSemCL: Movement-Semantics Contrastive Learning for Trajectory Similarity : Abstract: Trajectory similarity computation is fundamental functionality that is used for, e.g., clustering, prediction, and anomaly detection. However, existing learning-based methods exhibit three k...
DCA-LUT: Deep Chromatic Alignment with 5D LUT for Purple Fringing Removal : Abstract: Purple fringing, a persistent artifact caused by Longitudinal Chromatic Aberration (LCA) in camera lenses, has long degraded the clarity and realism of digital imaging. Traditional solutions...
Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound : Abstract: Emotion understanding is critical for making Large Language Models (LLMs) more general, reliable, and aligned with humans. Art conveys emotion through the joint design of visual and auditory...
Point Cloud Quantization through Multimodal Prompting for 3D Understanding : Abstract: Vector quantization has emerged as a powerful tool in large-scale multimodal models, unifying heterogeneous representations through discrete token encoding. However, its effectiveness hinges...
Supervised Multilabel Image Classification Using Residual Networks with Probabilistic Reasoning : Abstract: Multilabel image categorization has drawn interest recently because of its numerous computer vision applications. The proposed work introduces a novel method for classifying multilabel image...
SemanticStitch: Enhancing Image Coherence through Foreground-Aware Seam Carving : Abstract: Image stitching often faces challenges due to varying capture angles, positional differences, and object movements, leading to misalignments and visual discrepancies. Traditional seam carvin...
Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning : Abstract: Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation...
Learning from Dense Events: Towards Fast Spiking Neural Networks Training via Event Dataset Distillatio : Abstract: Event cameras sense brightness changes and output binary asynchronous event streams, attracting increasing attention. Their bio-inspired dynamics align well with spiking neural networks (SNN...
Sparse by Rule: Probability-Based N:M Pruning for Spiking Neural Networks : Abstract: Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, ...
DINOv3-Guided Cross Fusion Framework for Semantic-aware CT generation from MRI and CBCT : Abstract: Generating synthetic CT images from CBCT or MRI has a potential for efficient radiation dose planning and adaptive radiotherapy. However, existing CNN-based models lack global semantic under...
Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models : Abstract: Recent advancements in diffusion-based video generation have produced impressive and high-fidelity short videos. To extend these successes to generate coherent long videos, most video diffus...
Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation : Abstract: In current visual model training, models often rely on only limited sufficient causes for their predictions, which makes them sensitive to distribution shifts or the absence of key features....
BdSL-SPOTER: A Transformer-Based Framework for Bengali Sign Language Recognition with Cultural Adaptation : Abstract: We introduce BdSL-SPOTER, a pose-based transformer framework for accurate and efficient recognition of Bengali Sign Language (BdSL). BdSL-SPOTER extends the SPOTER paradigm with cultural spe...
Fine-Grained DINO Tuning with Dual Supervision for Face Forgery Detection : Abstract: The proliferation of sophisticated deepfakes poses significant threats to information integrity. While DINOv2 shows promise for detection, existing fine-tuning approaches treat it as generic...
MediRound: Multi-Round Entity-Level Reasoning Segmentation in Medical Images : Abstract: Despite the progress in medical image segmentation, most existing methods remain task-specific and lack interactivity. Although recent text-prompt-based segmentation approaches enhance user-...
RadarMP: Motion Perception for 4D mmWave Radar in Autonomous Driving : Abstract: Accurate 3D scene motion perception significantly enhances the safety and reliability of an autonomous driving system. Benefiting from its all-weather operational capability and unique perce...
OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description : Abstract: Large Language Models (LLMs) have become a crucial tool in Visual Question Answering (VQA) for handling knowledge-intensive questions in few-shot or zero-shot scenarios. However, their relia...
Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware : Abstract: Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy effi...
MAVIS: A Benchmark for Multimodal Source Attribution in Long-form Visual Question Answering : Abstract: Source attribution aims to enhance the reliability of AI-generated answers by including references for each statement, helping users validate the provided answers. However, existing work has...
Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain : Abstract: The integration of event cameras and spiking neural networks (SNNs) promises energy-efficient visual intelligence, yet scarce event data and the sparsity of DVS outputs hinder effective trai...
FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing : Abstract: Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail ...
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective : Abstract: Point cloud completion aims to reconstruct complete 3D shapes from partial observations, which is a challenging problem due to severe occlusions and missing geometry. Despite recent advances...
MMRINet: Efficient Mamba-Based Segmentation with Dual-Path Refinement for Low-Resource MRI Analysis : Abstract: Automated brain tumor segmentation in multi-parametric MRI remains challenging in resource-constrained settings where deep 3D networks are computationally prohibitive. We propose MMRINet, a ...
Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System : Abstract: Driver distraction remains a leading cause of road traffic accidents, contributing to thousands of fatalities annually across the globe. While deep learning-based driver activity recognition...
Bridging Granularity Gaps: Hierarchical Semantic Learning for Cross-domain Few-shot Segmentation : Abstract: Cross-domain Few-shot Segmentation (CD-FSS) aims to segment novel classes from target domains that are not involved in training and have significantly different data distributions from the s...
OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs : Abstract: Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the trainin...
LSS3D: Learnable Spatial Shifting for Consistent and High-Quality 3D Generation from Single-Image : Abstract: Recently, multi-view diffusion-based 3D generation methods have gained significant attention. However, these methods often suffer from shape and texture misalignment across generated multi-v...
GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction : Abstract: Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing meth...
A Novel AI-Driven System for Real-Time Detection of Mirror Absence, Helmet Non-Compliance, and License Plates Using YOLOv8 and OCR : Abstract: Road safety is a critical global concern, with manual enforcement of helmet laws and vehicle safety standards (e.g., rear-view mirror presence) being resource-intensive and inconsistent. Thi...
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation : Abstract: We introduce MoS (Mixture of States), a novel fusion paradigm for multimodal diffusion models that merges modalities using flexible, state-based interactions. The core of MoS is a learnable,...
FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention : Abstract: Medical vision-language pre-training (VLP) offers significant potential for advancing medical image understanding by leveraging paired image-report data. However, existing methods are limite...
Model Inversion Attack Against Deep Hashing : Abstract: Deep hashing improves retrieval efficiency through compact binary codes, yet it introduces severe and often overlooked privacy risks. The ability to reconstruct original training data from h...
Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets : Abstract: The Video Browser Showdown (VBS) challenges systems to deliver accurate results under strict time constraints. To meet this demand, we present Fusionista2.0, a streamlined video retrieval sy...
Prompt-Conditioned FiLM and Multi-Scale Fusion on MedSigLIP for Low-Dose CT Quality Assessment : Abstract: We propose a prompt-conditioned framework built on MedSigLIP that injects textual priors via Feature-wise Linear Modulation (FiLM) and multi-scale pooling. Text prompts condition patch-token...
A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation : Abstract: Radiology report generation from chest X-rays is an important task in artificial intelligence with the potential to greatly reduce radiologists' workload and shorten patient wait times. Desp...
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models : Abstract: Cross-Video Reasoning (CVR) presents a significant challenge in video understanding, which requires simultaneous understanding of multiple videos to aggregate and compare information across ...
Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations : Abstract: Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this...
CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs : Abstract: Large language models (LLMs) are increasingly deployed in culturally diverse environments, yet existing evaluations of cultural competence remain limited. Existing methods focus on de-contex...
Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task : Abstract: In this paper, we explore the effectiveness of combining fine-tuning and backtranslation on a small Japanese corpus for neural machine translation. Starting from a baseline English{\textrigh...
LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models : Abstract: Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff. This creates a strict knowledge boundary beyond which models cannot provide accurate information...
PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection : Abstract: The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific ...
AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing : Abstract: Goal-driven persuasive dialogue, exemplified by applications like telemarketing, requires sophisticated multi-turn planning and strict factual faithfulness, which remains a significant chall...
Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding : Abstract: Multimodal Large Language Models (MLLMs) have unlocked powerful cross-modal capabilities, but still significantly suffer from hallucinations. As such, accurate detection of hallucinations in...
CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic : Abstract: Tool-Integrated Reasoning (TIR) with search engines enables large language models to iteratively retrieve up-to-date external knowledge, enhancing adaptability and generalization in complex ...
MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues : Abstract: Fine-grained entity recognition is crucial for reasoning and decision-making in task-oriented dialogues, yet current large language models (LLMs) continue to face challenges in domain adapta...
ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations : Abstract: Recent advances in contextualized word embeddings have greatly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, but most progress has been limited t...
AugAbEx : Way Forward for Extractive Case Summarization : Abstract: Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Th...
Do LLMs and Humans Find the Same Questions Difficult? A Case Study on Japanese Quiz Answering : Abstract: LLMs have achieved performance that surpasses humans in many NLP tasks. However, it remains unclear whether problems that are difficult for humans are also difficult for LLMs. This study inv...
Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load : Abstract: Negation instructions such as 'do not mention $X$' can paradoxically increase the accessibility of $X$ in human thought, a phenomenon known as ironic rebound. Large language models (LLMs) fa...
From Phonemes to Meaning: Evaluating Large Language Models on Tamil : Abstract: Large Language Models (LLMs) have shown strong generalization across tasks in high-resource languages; however, their linguistic competence in low-resource and morphologically rich languages...
Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models : Abstract: Previous methods evaluate reward models by testing them on a fixed pairwise ranking test set, but they typically do not provide performance information on each preference dimension. In this ...
Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing : Abstract: Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answe...
SGuard-v1: Safety Guardrail for Large Language Models : Abstract: We present SGuard-v1, a lightweight safety guardrail for Large Language Models (LLMs), which comprises two specialized models to detect harmful content and screen adversarial prompts in huma...
QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs : Abstract: Decomposing sentences into fine-grained meaning units is increasingly used to model semantic alignment. While QA-based semantic approaches have shown effectiveness for representing predicate...
TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction : Abstract: Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to info...
Mitigating Length Bias in RLHF through a Causal Lens : Abstract: Reinforcement learning from human feedback (RLHF) is widely used to align large language models (LLMs) with human preferences. However, RLHF-trained reward models often exhibit length bias -...
MMWOZ: Building Multimodal Agent for Task-oriented Dialogue : Abstract: Task-oriented dialogue systems have garnered significant attention due to their conversational ability to accomplish goals, such as booking airline tickets for users. Traditionally, task-ori...
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data : Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understand...
Knots: A Large-Scale Multi-Agent Enhanced Expert-Annotated Dataset and LLM Prompt Optimization for NOTAM Semantic Parsing : Abstract: Notice to Air Missions (NOTAMs) serve as a critical channel for disseminating key flight safety information, yet their complex linguistic structures and implicit reasoning pose significant c...
Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing : Abstract: Aligning Large Language Models (LLMs) to be faithful to new knowledge in complex, multi-hop reasoning tasks is a critical, yet unsolved, challenge. We find that SFT-based methods, e.g., Reas...
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs : Abstract: Automated red teaming frameworks for Large Language Models (LLMs) have become increasingly sophisticated, yet they share a fundamental limitation: their jailbreak logic is confined to select...
Adaptive Focus Memory for Language Models : Abstract: Large language models (LLMs) are increasingly deployed in multi-turn dialogue settings, but their behavior is still bottlenecked by fixed context windows and naive memory strategies. Replayi...
On the Brittleness of LLMs: A Journey around Set Membership : Abstract: Large language models (LLMs) achieve superhuman performance on complex reasoning tasks, yet often fail on much simpler problems, raising concerns about their reliability and interpretability...
Evidence of Phase Transitions in Small Transformer-Based Language Models : Abstract: Phase transitions have been proposed as the origin of emergent abilities in large language models (LLMs), where new capabilities appear abruptly once models surpass critical thresholds of sc...
LLM Reinforcement in Context : Abstract: Current Large Language Model alignment research mostly focuses on improving model robustness against adversarial attacks and misbehavior by training on examples and prompting. Research has s...
Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing : Abstract: Large Language Models (LLMs) have recently emerged as powerful tools for autoformalization. Despite their impressive performance, these models can still struggle to produce grounded and veri...
BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals : Abstract: Assessing journal impact is central to scholarly communication, yet existing open resources rarely capture how collaboration structures and artificial intelligence (AI) research jointly shap...
From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation : Abstract: Large Language Models (LLMs) demonstrate increasing conversational fluency, yet instilling them with nuanced, human-like emotional expression remains a significant challenge. Current alignme...
Quantifying consistency and accuracy of Latent Dirichlet Allocation : Abstract: Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation,...
NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation : Abstract: Clinical electroencephalogram (EEG) reports encode domain-specific linguistic conventions that general-purpose language models (LMs) fail to capture. We introduce NeuroLex, a lightweight dom...
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models : Abstract: With the remarkable success of Multimodal Large Language Models (MLLMs) in perception tasks, enhancing their complex reasoning capabilities has emerged as a critical research focus. Existing...
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy : Abstract: Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over th...
Visual Room 2.0: Seeing is Not Understanding for MLLMs : Abstract: Can multi-modal large language models (MLLMs) truly understand what they can see? Extending Searle's Chinese Room into the multi-modal domain, this paper proposes the Visual Room argument: M...
Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty : Abstract: The honesty of Large Language Models (LLMs) is increasingly important for safe deployment in high-stakes domains. However, this crucial trait is severely undermined by supervised fine-tuning...
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models : Abstract: Existing language model evaluations primarily measure general capabilities, yet reliable use of these models across a range of domains demands factual accuracy and recognition of knowledge g...
How Good is BLI as an Alignment Measure: A Study in Word Embedding Paradigm : Abstract: Sans a dwindling number of monolingual embedding studies originating predominantly from the low-resource domains, it is evident that multilingual embedding has become the de facto choice due...
Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training : Abstract: Large Language Models (LLMs) have shown significant promise in automated theorem proving, yet progress is often constrained by the scarcity of diverse and high-quality formal language data. ...
BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models : Abstract: We introduce BeDiscovER (Benchmark of Discourse Understanding in the Era of Reasoning Language Models), an up-to-date, comprehensive suite for evaluating the discourse-level knowledge of mod...
Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study : Abstract: The Consolidated Standards of Reporting Trials statement is the global benchmark for transparent and high-quality reporting of randomized controlled trials. Manual verification of CONSORT ad...
Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction : Abstract: Zero-shot event extraction (ZSEE) remains a significant challenge for large language models (LLMs) due to the need for complex reasoning and domain-specific understanding. Direct prompting o...
A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition : Abstract: This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representati...
Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels : Abstract: Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its sp...
Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis : Abstract: Automatic Speech Recognition (ASR) transcripts, especially in low-resource languages like Bangla, contain a critical ambiguity: word-word repetitions can be either Repetition Disfluency (uni...
TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine : Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in general domains, yet their application in highly specialized and culturally-rich fields like Traditional Chinese Me...
Translation Entropy: A Statistical Framework for Evaluating Translation Systems : Abstract: The translation of written language has been known since the 3rd century BC; however, its necessity has become increasingly common in the information age. Today, many translators exist, base...
Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study : Abstract: Automatic diacritic restoration is crucial for text processing in languages with rich diacritical marks, such as Romanian. This study evaluates the performance of several large language mode...
Seeing isn't Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms : Abstract: With the rise of Large Language Models (LLMs) and their vision-enabled counterparts (VLMs), numerous works have investigated their capabilities in tasks that fuse the modalities of vision an...
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and ca...
RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-as-a-Service Copyright Protection : Abstract: Embedding-as-a-Service (EaaS) is an effective and convenient deployment solution for addressing various NLP tasks. Nevertheless, recent research has shown that EaaS is vulnerable to model ex...
AHaSIS: Shared Task on Sentiment Analysis for Arabic Dialects : Abstract: The hospitality industry in the Arab world increasingly relies on customer feedback to shape services, driving the need for advanced Arabic sentiment analysis tools. To address this challeng...
Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning : Abstract: Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly ...
Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts : Abstract: With the rapid rise of large language models (LLMs) in medicine, a key question is whether they can function as competent pediatricians in real-world clinical settings. We developed PEDIASBe...
Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction : Abstract: With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants tha...
Non-Linear Scoring Model for Translation Quality Evaluation : Abstract: Analytic Translation Quality Evaluation (TQE), based on Multidimensional Quality Metrics (MQM), traditionally uses a linear error-to-penalty scale calibrated to a reference sample of 1000-20...
Aspect-Level Obfuscated Sentiment in Thai Financial Disclosures and Its Impact on Abnormal Returns : Abstract: Understanding sentiment in financial documents is crucial for gaining insights into market behavior. These reports often contain obfuscated language designed to present a positive or neutral...
Applying Large Language Models to Characterize Public Narratives : Abstract: Public Narratives (PNs) are key tools for leadership development and civic mobilization, yet their systematic analysis remains challenging due to their subjective interpretation and the high...
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets : Abstract: The advancement of automatic speech recognition (ASR) has been largely enhanced by extensive datasets in high-resource languages, while languages such as Hungarian remain underrepresented du...
Beyond SELECT: A Comprehensive Taxonomy-Guided Benchmark for Real-World Text-to-SQL Translation : Abstract: Text-to-SQL datasets are essential for training and evaluating text-to-SQL models, but existing datasets often suffer from limited coverage and fail to capture the diversity of real-world ap...
Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents : Abstract: Recent advancements in LLM-powered agents have demonstrated significant potential in generating human-like responses; however, they continue to face challenges in maintaining long-term inter...
Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation : Abstract: Indian poetry, known for its linguistic complexity and deep cultural resonance, has a rich and varied heritage spanning thousands of years. However, its layered meanings, cultural allusions,...
LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism : Abstract: This research examines the potential of datasets generated by Large Language Models (LLMs) to support Natural Language Processing (NLP) tasks, aiming to overcome challenges related to data a...
CLINB: A Climate Intelligence Benchmark for Foundational Models : Abstract: Evaluating how Large Language Models (LLMs) handle complex, specialized knowledge remains a critical challenge. We address this through the lens of climate change by introducing CLINB, a ben...
SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detectio : Abstract: We introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human ...
EduAgentQG: A Multi-Agent Workflow Framework for Personalized Question Generation : Abstract: High-quality personalized question banks are crucial for supporting adaptive learning and individualized assessment. Manually designing questions is time-consuming and often fails to meet di...
Automatic generation of DRI Statements : Abstract: Assessing the quality of group deliberation is essential for improving our understanding of deliberative processes. The Deliberative Reason Index (DRI) offers a sophisticated metric for eval...
Generative AI as a Linguistic Equalizer in Global Science : Abstract: For decades, the dominance of English has created a substantial barrier in global science, disadvantaging non-native speakers. The recent rise of generative AI (GenAI) offers a potential tec...
Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy : Abstract: Due to its expressiveness and unambiguous nature, First-Order Logic (FOL) is a powerful formalism for representing concepts expressed in natural language (NL). This is useful, e.g., for spec...
Leveraging Large Language Models for Career Mobility Analysis: A Study of Gender, Race, and Job Change Using U.S. Online Resume Profiles : Abstract: We present a large-scale analysis of career mobility of college-educated U.S. workers using online resume profiles to investigate how gender, race, and job change options are associated with...
How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer : Abstract: Lexical tone is central to many languages but remains underexplored in self-supervised learning (SSL) speech models, especially beyond Mandarin. We study four languages with complex and dive...
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing : Abstract: We introduce VoiceCraft-X, an autoregressive neural codec language model which unifies multilingual speech editing and zero-shot Text-to-Speech (TTS) synthesis across 11 languages: English, ...
DenseAnnotate: Enabling Scalable Dense Caption Collection for Images and 3D Scenes via Spoken Descriptions : Abstract: With the rapid adoption of multimodal large language models (MLLMs) across diverse applications, there is a pressing need for task-centered, high-quality training data. A key limitation of c...
Co-Layout: LLM-driven Co-optimization for Interior Layout : Abstract: We present a novel framework for automated interior design that combines large language models (LLMs) with grid-based integer programming to jointly optimize room layout and furniture placem...
Evolving Prompts for Toxicity Search in Large Language Models : Abstract: Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a black-box evolutionary framework that tests mode...
Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing : Abstract: Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing -- an endeavor requiring precision, multimodal synthes...
A Content-Preserving Secure Linguistic Steganography : Abstract: Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between norma...
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance : Abstract: Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, curren...
PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics : Abstract: Real-world conversations are rich with pragmatic elements, such as entity mentions, references, and implicatures. Understanding such nuances is a requirement for successful natural communica...
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment : Abstract: Humans display significant uncertainty when confronted with moral dilemmas, yet the extent of such uncertainty in machines and AI agents remains underexplored. Recent studies have confirmed ...
Attention Grounded Enhancement for Visual Document Retrieval : Abstract: Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy information needs. Recent advances use screenshot-based document encoding with fine-grained ...
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models : Abstract: The rapid adoption of large language models (LLMs) has brought both transformative applications and new security risks, including jailbreak attacks that bypass alignment safeguards to elicit...
Historical/temporal necessities/possibilities, and a logical theory of them in branching time : Abstract: In this paper, we do three kinds of work. First, we recognize four notions of necessity and two notions of possibility related to time flow, namely strong/weak historical/temporal necessitie...
Simultaneous Machine Translation with Large Language Models : Abstract: Real-world simultaneous machine translation (SimulMT) systems face more challenges than just the quality-latency trade-off. They also need to address issues related to robustness with noisy ...
Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language : Abstract: The Bangla linguistic variety is a fascinating mix of regional dialects that contributes to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating ...
Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models : Abstract: Simultaneous machine translation (SimulMT) presents a challenging trade-off between translation quality and latency. Recent studies have shown that LLMs can achieve good performance in Simul...
ProFuser: Progressive Fusion of Large Language Models : Abstract: While fusing the capacities and advantages of various large language models offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select ad...
Contextual Breach: Assessing the Robustness of Transformer-based QA Models : Abstract: Contextual question-answering models are susceptible to adversarial perturbations to input context, commonly observed in real-world scenarios. These adversarial noises are designed to degrad...
Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model : Abstract: Recently, deep learning models have increasingly been used in cognitive modelling of language. This study asks whether deep learning can help us to better understand the learning problem tha...
Uncovering Factor Level Preferences to Improve Human-Model Alignment : Abstract: Large language models (LLMs) often exhibit tendencies that diverge from human preferences, such as favoring certain writing styles or producing overly verbose outputs. While crucial for impr...
Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot : Abstract: Companies support their customers using live chats and chatbots to gain their loyalty. AFAS is a Dutch company aiming to leverage the opportunity large language models (LLMs) offer to answer...
DeepMIDE: A Multi-Output Spatio-Temporal Method for Ultra-Scale Offshore Wind Energy Forecasting : Abstract: To unlock access to stronger winds, the offshore wind industry is advancing towards significantly larger and taller wind turbines. This massive upscaling motivates a departure from wind fore...
EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models : Abstract: Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement i...
Fair In-Context Learning via Latent Concept Variables : Abstract: The emerging in-context learning (ICL) ability of large language models (LLMs) has prompted their use for predictive tasks in various domains with different data types, including tabular dat...
Competence-Aware AI Agents with Metacognition for Unknown Situations and Environments (MUSE) : Abstract: Metacognition, defined as the awareness and regulation of one's cognitive processes, is central to human adaptability in unknown situations. In contrast, current autonomous agents often stru...
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers : Abstract: Elastic Decision Transformers (EDTs) have proved to be particularly successful in offline reinforcement learning, offering a flexible framework that unifies sequence modeling with decision-m...
Neutron Reflectometry by Gradient Descent : Abstract: Neutron reflectometry (NR) is a powerful technique to probe surfaces and interfaces. NR is inherently an indirect measurement technique, access to the physical quantities of interest (layer ...
A Comparative Benchmark of Federated Learning Strategies for Mortality Prediction on Heterogeneous and Imbalanced Clinical Data : Abstract: Machine learning models hold significant potential for predicting in-hospital mortality, yet data privacy constraints and the statistical heterogeneity of real-world clinical data often hamp...
Learning at the Speed of Physics: Equilibrium Propagation on Oscillator Ising Machines : Abstract: Physical systems that naturally perform energy descent offer a direct route to accelerating machine learning. Oscillator Ising Machines (OIMs) exemplify this idea: their GHz-frequency dynami...
Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation : Abstract: Facial emotion recognition (FER) is a fine-grained problem where the value of transfer learning is often assumed. We first quantify this assumption and show that, on AffectNet, training from...
FinGPT: Open-Source Financial Large Language Models : Abstract: Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality fina...
Foundations of Structural Causal Models with Latent Selection : Abstract: Three distinct phenomena complicate statistical causal analysis: latent common causes, causal cycles, and latent selection. Foundational works on Structural Causal Models (SCMs), e.g., Bonge...
A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset : Abstract: While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarci...
Architectures and random properties of symplectic quantum circuits : Abstract: Parametrized and random unitary (or orthogonal) $n$-qubit circuits play a central role in quantum information. As such, one could naturally assume that circuits implementing symplectic trans...
Learning Optimal Distributionally Robust Stochastic Control in Continuous State Spaces : Abstract: We study data-driven learning of robust stochastic control for infinite-horizon systems with potentially continuous state and action spaces. In many managerial settings--supply chains, finan...
Emulation with uncertainty quantification of regional sea-level change caused by the Antarctic Ice Sheet : Abstract: Projecting sea-level change in various climate-change scenarios typically involves running forward simulations of the Earth's gravitational, rotational and deformational (GRD) response to ic...
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents : Abstract: Autonomous machine learning research has gained significant attention recently. We present MLR-COPILOT, an autonomous Machine Learning Research framework powered by large language model agen...
On the Limitations of Language Targeted Pruning: Investigating the Calibration Language Impact in Multilingual LLM Pruning : Abstract: Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive p...
Identify As A Human Does: A Pathfinder of Next-Generation Anti-Cheat Framework for First-Person Shooter Games : Abstract: The gaming industry has experienced substantial growth, but cheating in online games poses a significant threat to the integrity of the gaming experience. Cheating, particularly in first-per...
A Framework for Real-Time Volcano-Seismic Event Recognition Based on Multi-Station Seismograms and Semantic Segmentation Models : Abstract: In volcano monitoring, effective recognition of seismic events is essential for understanding volcanic activity and raising timely warning alerts. Traditional methods rely on manual analysis...
Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control : Abstract: Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows lear...
NoLBERT: A No Lookahead(back) Foundational Language Model : Abstract: We present NoLBERT, a lightweight, timestamped foundational language model for empirical research -- particularly for forecasting in economics, finance, and the social sciences. By pretraini...
Evaluating Multiple Instance Learning Strategies for Automated Sebocyte Droplet Counting : Abstract: Sebocytes are lipid-secreting cells whose differentiation is marked by the accumulation of intracellular lipid droplets, making their quantification a key readout in sebocyte biology. Manual...
qc-kmeans: A Quantum Compressive K-Means Algorithm for NISQ Devices : Abstract: Clustering on NISQ hardware is constrained by data loading and limited qubits. We present \textbf{qc-kmeans}, a hybrid compressive $k$-means that summarizes a dataset with a constant-size Fo...
TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy : Abstract: Traditional fuzzy matching often fails when searching for quotes that are semantically identical but syntactically different across documents-a common issue when aligning official written re...
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling : Abstract: We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model ...
On the Notion that Language Models Reason : Abstract: Language models (LMs) are said to be exhibiting reasoning, but what does this entail? We assess definitions of reasoning and how key papers in the field of natural language processing (NLP) ...
Scaling Open-Weight Large Language Models for Hydropower Regulatory Information Extraction: A Systematic Analysis : Abstract: Information extraction from regulatory documents using large language models presents critical trade-offs between performance and computational resources. We evaluated seven open-weight mode...
Towards Autoformalization of LLM-generated Outputs for Requirement Verification : Abstract: Autoformalization, the process of translating informal statements into formal logic, has gained renewed interest with the emergence of powerful Large Language Models (LLMs). While LLMs show ...
Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection : Abstract: Story understanding and analysis have long been challenging areas within Natural Language Understanding. Automated narrative analysis requires deep computational semantic representations alo...
Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches : Abstract: Large language models (LLMs) have shown considerable promise in clinical natural language processing, yet few domain-specific datasets exist to rigorously evaluate their performance on radio...
MedPT: A Massive Medical Question Answering Dataset for Brazilian-Portuguese Speakers : Abstract: While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages, creating a critical barrier for others as simple...
Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support : Abstract: Mental health illness represents a substantial global socioeconomic burden, with COVID-19 further exacerbating accessibility challenges and driving increased demand for telehealth mental hea...
A Reasoning Paradigm for Named Entity Recognition : Abstract: Generative LLMs typically improve Named Entity Recognition (NER) performance through instruction tuning. They excel at generating entities by semantic pattern matching but lack an explicit, ...
Ground Plane Projection for Improved Traffic Analytics at Intersections : Abstract: Accurate turning movement counts at intersections are important for signal control, traffic management and urban planning. Computer vision systems for automatic turning movement counts typic...
CLAReSNet: When Convolution Meets Latent Attention for Hyperspectral Image Classification : Abstract: Hyperspectral image (HSI) classification faces critical challenges, including high spectral dimensionality, complex spectral-spatial correlations, and limited training samples with severe cl...
More Than Irrational: Modeling Belief-Biased Agents : Abstract: Despite the explosive growth of AI and the technologies built upon it, predicting and inferring the sub-optimal behavior of users or human collaborators remains a critical challenge. In many...
AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification : Abstract: Medical image analysis for complex tasks such as severity grading and disease subtype classification poses significant challenges due to intricate and similar visual patterns among classes, ...
Multi-Domain EEG Representation Learning with Orthogonal Mapping and Attention-based Fusion for Cognitive Load Classification : Abstract: We propose a new representation learning solution for the classification of cognitive load based on Electroencephalogram (EEG). Our method integrates both time and frequency domains by first...
Stochastic Predictive Analytics for Stocks in the Newsvendor Problem : Abstract: This work addresses a key challenge in inventory management by developing a stochastic model that describes the dynamic distribution of inventory stock over time without assuming a specific ...
From Black Box to Bijection: Interpreting Machine Learning to Build a Zeta Map Algorithm : Abstract: There is a large class of problems in algebraic combinatorics which can be distilled into the same challenge: construct an explicit combinatorial bijection. Traditionally, researchers have s...
GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs : Abstract: Text-attributed graphs (TAGs), which combine structural and textual node information, are ubiquitous across many domains. Recent work integrates Large Language Models (LLMs) with Graph Neura...
Real-Time Drivers' Drowsiness Detection and Analysis through Deep Learning : Abstract: A long road trip is fun for drivers. However, a long drive for days can be tedious for a driver to accommodate stringent deadlines to reach distant destinations. Such a scenario forces drive...
MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding : Abstract: The rapid growth of e-commerce calls for multimodal models that comprehend rich visual and textual product information. Although recent multimodal large language models (MLLMs) for product u...
A Multicollinearity-Aware Signal-Processing Framework for Cross-$\beta$ Identification via X-ray Scattering of Alzheimer's Tissue : Abstract: X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross-$β$ inclusions, yet systematic exploitation of these data for automated detecti...
Discovering autonomous quantum error correction via deep reinforcement learning : Abstract: Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error...
Iris: First-Class Multi-GPU Programming Experience in Triton : Abstract: Multi-GPU programming traditionally requires developers to navigate complex trade-offs between performance and programmability. High-performance implementations typically rely on low-level H...
DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection : Abstract: With growing concerns over image authenticity and digital safety, the field of AI-generated image (AIGI) detection has progressed rapidly. Yet, most AIGI detectors still struggle under real-...
FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration : Abstract: The growing demand for low-power and area-efficient TinyML inference on AIoT devices necessitates memory architectures that minimise data movement while sustaining high computational efficie...
DLMMPR:Deep Learning-based Measurement Matrix for Phase Retrieval : Abstract: This paper pioneers the integration of learning optimization into measurement matrix design for phase retrieval. We introduce the Deep Learning-based Measurement Matrix for Phase Retrieval (...
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models : Abstract: Large Language Models (LLMs) often suffer from mode collapse, repeatedly generating the same few completions even when many valid answers exist, limiting their diversity across a wide range ...
OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding : Abstract: We introduce a unified, end-to-end framework that seamlessly integrates object detection and pose estimation with a versatile onboarding process. Our pipeline begins with an onboarding stage...
LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews : Abstract: Context: Large language models (LLMs) are released faster than users' ability to evaluate them rigorously. When LLMs underpin research, such as identifying relevant literature for systematic...
Auto-encoder model for faster generation of effective one-body gravitational waveform approximations : Abstract: Upgrades to current gravitational wave detectors for the next observation run and upcoming third-generation observatories, like the Einstein telescope, are expected to have enormous improvem...
Adaptive Dual-Layer Web Application Firewall (ADL-WAF) Leveraging Machine Learning for Enhanced Anomaly and Threat Detection : Abstract: Web Application Firewalls are crucial for protecting web applications against a wide range of cyber threats. Traditional Web Application Firewalls often struggle to effectively distinguish b...
Scalable Hierarchical AI-Blockchain Framework for Real-Time Anomaly Detection in Large-Scale Autonomous Vehicle Networks : Abstract: The security of autonomous vehicle networks is facing major challenges, owing to the complexity of sensor integration, real-time performance demands, and distributed communication protocols ...
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework : Abstract: Assurance for artificial intelligence (AI) systems remains fragmented across software supply-chain security, adversarial machine learning, and governance documentation. Existing transparency...
Accelerated Distributional Temporal Difference Learning with Linear Function Approximation : Abstract: In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning ...
Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data : Abstract: Direct speech-to-speech translation (S2ST), in which all components are trained jointly, is an attractive alternative to cascaded systems because it offers a simpler pipeline and lower infer...
X-VMamba: Explainable Vision Mamba : Abstract: State Space Models (SSMs), particularly the Mamba architecture, have recently emerged as powerful alternatives to Transformers for sequence modeling, offering linear computational complexity...
An Evaluation Framework for Network IDS/IPS Datasets: Leveraging MITRE ATT&CK and Industry Relevance Metrics : Abstract: The performance of Machine Learning (ML) and Deep Learning (DL)-based Intrusion Detection and Prevention Systems (IDS/IPS) is critically dependent on the relevance and quality of the dataset...
TSB-HB: A Hierarchical Bayesian Extension of the TSB Model for Intermittent Demand Forecasting : Abstract: Intermittent demand forecasting poses unique challenges due to sparse observations, cold-start items, and obsolescence. Classical models such as Croston, SBA, and the Teunter-Syntetos-Babai ...
Adaptively Coordinating with Novel Partners via Learned Latent Strategies : Abstract: Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as indi...
Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL : Abstract: Despite significant progress and advances in autonomous driving, many end-to-end systems still struggle with domain adaptation (DA), such as transferring a policy trained under clear weather...
RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition : Abstract: Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on America...
Event-CausNet: Unlocking Causal Knowledge from Text with Large Language Models for Reliable Spatio-Temporal Forecasting : Abstract: While spatio-temporal Graph Neural Networks (GNNs) excel at modeling recurring traffic patterns, their reliability plummets during non-recurring events like accidents. This failure occurs be...
Function-on-Function Bayesian Optimization : Abstract: Bayesian optimization (BO) has been widely used to optimize expensive and gradient-free objective functions across various domains. However, existing BO methods have not addressed the object...
Neuro-Logic Lifelong Learning : Abstract: Solving Inductive Logic Programming (ILP) problems with neural networks is a key challenge in Neural-Symbolic Ar- tificial Intelligence (AI). While most research has focused on designing nov...
Practical Causal Evaluation Metrics for Biological Networks : Abstract: Estimating causal networks from biological data is a critical step in systems biology. When evaluating the inferred network, assessing the networks based on their intervention effects is par...
Enhancing LLM Code Generation Capabilities through Test-Driven Development and Code Interpreter : Abstract: Over the past few years, improving LLM code generation capabilities has been a key focus in NLP research. Despite Bengali having 242 million native speakers worldwide, it receives little att...
Efficient Adversarial Malware Defense via Trust-Based Raw Override and Confidence-Adaptive Bit-Depth Reduction : Abstract: The deployment of robust malware detection systems in big data environments requires careful consideration of both security effectiveness and computational efficiency. While recent advances ...
DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks : Abstract: Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior s...
Benign Overfitting in Linear Classifiers with a Bias Term : Abstract: Modern machine learning models with a large number of parameters often generalize well despite perfectly interpolating noisy training data - a phenomenon known as benign overfitting. A found...
Scalable learning of macroscopic stochastic dynamics : Abstract: Macroscopic dynamical descriptions of complex physical systems are crucial for understanding and controlling material behavior. With the growing availability of data and compute, machine lea...
Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback : Abstract: Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating human feedback into the agent's training process. We intro...
Structured Imitation Learning of Interactive Policies through Inverse Games : Abstract: Generative model-based imitation learning methods have recently achieved strong results in learning high-complexity motor skills from human demonstrations. However, imitation learning of int...
Bootstrapping LLMs via Preference-Based Policy Optimization : Abstract: Bootstrapping large language models (LLMs) through preference-based policy optimization offers a promising direction for aligning model behavior with human preferences without relying on ext...
Classification of Hope in Textual Data using Transformer-Based Models : Abstract: This paper presents a transformer-based approach for classifying hope expressions in text. We developed and compared three architectures (BERT, GPT-2, and DeBERTa) for both binary classifica...
Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation : Abstract: Large language model (LLM)-based recommender systems have achieved high-quality performance by bridging the discrepancy between the item space and the language space through item tokenizatio...
MCAQ-YOLO: Morphological Complexity-Aware Quantization for Efficient Object Detection with Curriculum Learning : Abstract: Most neural network quantization methods apply uniform bit precision across spatial regions, ignoring the heterogeneous structural and textural complexity of visual data. This paper introduc...
Revealing the dynamic responses of Pb under shock loading based on DFT-accuracy machine learning potential : Abstract: Lead (Pb) is a typical low-melting-point ductile metal and serves as an important model material in the study of dynamic responses. Under shock-wave loading, its dynamic mechanical behavior ...
GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs : Abstract: Alignment of large language models (LLMs) with human preferences typically relies on supervised reward models or external judges that demand abundant annotations. However, in fields that rel...
MeanFlow Transformers with Representation Autoencoders : Abstract: MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data. In practice, it is often used as a late...
Reconstruction of Manifold Distances from Noisy Observations : Abstract: We consider the problem of reconstructing the intrinsic geometry of a manifold from noisy pairwise distance observations. Specifically, let $M$ denote a diameter 1 d-dimensional manifold and...
Orientation-Free Neural Network-Based Bias Estimation for Low-Cost Stationary Accelerometers : Abstract: Low-cost micro-electromechanical accelerometers are widely used in navigation, robotics, and consumer devices for motion sensing and position estimation. However, their performance is often ...
Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations : Abstract: Saliency maps are widely used for visual explanations in deep learning, but a fundamental lack of consensus persists regarding their intended purpose and alignment with diverse user queries....
STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization : Abstract: Multi-turn interaction remains challenging for online reinforcement learning. A common solution is trajectory-level optimization, which treats each trajectory as a single training sample. Ho...
NuBench: An Open Benchmark for Deep Learning-Based Event Reconstruction in Neutrino Telescopes : Abstract: Neutrino telescopes are large-scale detectors designed to observe Cherenkov radiation produced from neutrino interactions in water or ice. They exist to identify extraterrestrial neutrino so...
Region-Point Joint Representation for Effective Trajectory Similarity Learning : Abstract: Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the compr...
InteractiveGNNExplainer: A Visual Analytics Framework for Multi-Faceted Understanding and Probing of Graph Neural Network Predictions : Abstract: Graph Neural Networks (GNNs) excel in graph-based learning tasks, but their complex, non-linear operations often render them as opaque "black boxes". This opacity hinders user trust, complic...
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks : Abstract: The Resource-Constrained Project Scheduling Problem (RCPSP) is a classical scheduling problem that has received significant attention due to of its numerous applications in industry. However...
Likelihood-guided Regularization in Attention Based Models : Abstract: The transformer architecture has demonstrated strong performance in classification tasks involving structured and high-dimensional data. However, its success often hinges on large- scale tra...
Case study of a differentiable heterogeneous multiphysics solver for a nuclear fusion application : Abstract: This work presents a case study of a heterogeneous multiphysics solver from the nuclear fusion domain. At the macroscopic scale, an auto-differentiable ODE solver in JAX computes the evoluti...
Causal Inference, Biomarker Discovery, Graph Neural Network, Feature Selection : Abstract: Biomarker discovery from high-throughput transcriptomic data is crucial for advancing precision medicine. However, existing methods often neglect gene-gene regulatory relationships and lack ...
EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation : Abstract: Acting in human environments is a crucial capability for general-purpose robots, necessitating a robust understanding of natural language and its application to physical tasks. This paper se...
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research : Abstract: Generating thorough natural language explanations for threat detections remains an open problem in cybersecurity research, despite significant advances in automated malware detection systems...
Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce's Manuscripts with Vision-Language Models : Abstract: Diagrams are crucial yet underexplored tools in many disciplines, demonstrating the close connection between visual representation and scholarly reasoning. However, their iconic form poses o...
Uncovering Causal Drivers of Energy Efficiency for Industrial Process in Foundry via Time-Series Causal Inference : Abstract: Improving energy efficiency in industrial foundry processes is a critical challenge, as these operations are highly energy-intensive and marked by complex interdependencies among process var...
Taming Barren Plateaus in Arbitrary Parameterized Quantum Circuits Without Sacrificing Expressibility : Abstract: Quantum algorithms based on parameterized quantum circuits (PQCs) have enabled a wide range of applications on near-term quantum devices. However, existing PQC architectures face several cha...
Exploring Multi-Table Retrieval Through Iterative Search : Abstract: Open-domain question answering over datalakes requires retrieving and composing information from multiple tables, a challenging subtask that demands semantic relevance and structural coheren...
Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling : Abstract: Multimedia documents such as slide presentations and posters are designed to be interactive and easy to modify. Yet, they are often distributed in a static raster format, which limits editin...
Systematic evaluation of time-frequency features for binaural sound source localization : Abstract: This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance acro...
The Shape of Data: Topology Meets Analytics. A Practical Introduction to Topological Analytics and the Stability Index (TSI) in Business : Abstract: Modern business and economic datasets often exhibit nonlinear, multi-scale structures that traditional linear tools under-represent. Topological Data Analysis (TDA) offers a geometric lens f...
AI Fairness Beyond Complete Demographics: Current Achievements and Future Directions : Abstract: Fairness in artificial intelligence (AI) has become a growing concern due to discriminatory outcomes in AI-based decision-making systems. While various methods have been proposed to mitigate...
BootOOD: Self-Supervised Out-of-Distribution Detection via Synthetic Sample Exposure under Neural Collapse : Abstract: Out-of-distribution (OOD) detection is critical for deploying image classifiers in safety-sensitive environments, yet existing detectors often struggle when OOD samples are semantically simi...
Power Homotopy for Zeroth-Order Non-Convex Optimizations : Abstract: We introduce GS-PowerHP, a novel zeroth-order method for non-convex optimization problems of the form $\max_{x \in \mathbb{R}^d} f(x)$. Our approach leverages two key components: a power-tra...
A Gentle Introduction to Conformal Time Series Forecasting : Abstract: Conformal prediction is a powerful post-hoc framework for uncertainty quantification that provides distribution-free coverage guarantees. However, these guarantees crucially rely on the assu...
AtlasMorph: Learning conditional deformable templates for brain MRI : Abstract: Deformable templates, or atlases, are images that represent a prototypical anatomy for a population, and are often enhanced with probabilistic anatomical label maps. They are commonly used i...
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? : Abstract: Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software prob...
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation : Abstract: Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal found...
Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues : Abstract: Deceptive reviews mislead consumers, harm businesses, and undermine trust in online marketplaces. Machine learning classifiers can learn from large amounts of training examples to effectivel...
Cost-Driven Synthesis of Sound Abstract Interpreters : Abstract: Constructing abstract interpreters that provide global soundness guarantees remains a major obstacle in abstract interpretation. We investigate whether modern LLMs can reduce this burden by ...
T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization : Abstract: Recent advances in LLMs have outpaced the computational and memory capacities of edge platforms that primarily employ CPUs, thereby challenging efficient and scalable deployment. While terna...
QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention : Abstract: Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity. We introduce QUILL, a schedule-aware accele...
Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting : Abstract: We introduce GS-Light, an efficient, textual position-aware pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting (3DGS). GS-Light implements a training-free ex...
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations : Abstract: Hospitals and healthcare systems rely on operational decisions that determine patient flow, cost, and quality of care. Despite strong performance on medical knowledge and conversational benc...
From Power to Precision: Learning Fine-grained Dexterity for Multi-fingered Robotic Hands : Abstract: Human grasps can be roughly categorized into two types: power grasps and precision grasps. Precision grasping enables tool use and is believed to have influenced human evolution. Today's mul...
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity : Abstract: The Segment Anything Model (SAM) family has become a widely adopted vision foundation model, but its ability to control segmentation granularity remains limited. Users often need to refine r...
Scaling Spatial Intelligence with Multimodal Foundation Models : Abstract: Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to c...
Loss Patterns of Neural Networks : Abstract: We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The propos...
Achieving Fairness with a Simple Ridge Penalty : Abstract: In this paper we present a general framework for estimating regression models subject to a user-defined level of fairness. We enforce fairness as a model selection step in which we choose th...
State-Space Constraints Can Improve the Generalisation of the Differentiable Neural Computer to Input Sequences With Unseen Length : Abstract: Memory-augmented neural networks (MANNs) can perform algorithmic tasks such as sorting. However, they often fail to generalise to input sequence lengths not encountered during training. We i...
Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design : Abstract: Deep generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown great promise in a variety of applicati...
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models : Abstract: The success of current Large-Language Models (LLMs) hinges on extensive training data that is collected and stored centrally, called Centralized Learning (CL). However, such a collection man...
GLANCE: Global Actions in a Nutshell for Counterfactual Explainability : Abstract: The widespread deployment of machine learning systems in critical real-world decision-making applications has highlighted the urgent need for counterfactual explainability methods that opera...
Uncertainty Quantification for Deep Learning : Abstract: We present a critical survey on the consistency of uncertainty quantification used in deep learning and highlight partial uncertainty coverage and many inconsistencies. We then provide a com...
Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems : Abstract: Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to trai...
Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft : Abstract: The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-effic...
Temporal Test-Time Adaptation with State-Space Models : Abstract: Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate thi...
Efficiently Computing Compact Formal Explanations : Abstract: Building on VeriX (Verified eXplainability, arXiv:2212.01051), a system for producing optimal verified explanations for machine learning models, we present VeriX+, which significantly improv...
Exploiting Missing Data Remediation Strategies using Adversarial Missingness Attacks : Abstract: Adversarial Missingness (AM) attacks aim to manipulate model fitting by carefully engineering a missing data problem to achieve a specific malicious objective. AM attacks are significantly d...
Communication-Efficient Federated Low-Rank Update Algorithm and its Connection to Implicit Regularization : Abstract: Federated Learning (FL) faces significant challenges related to communication efficiency and performance reduction when scaling to many clients. To address these issues, we explore the poten...
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? : Abstract: Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., L...
Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning : Abstract: Self-supervised graph representation learning (GRL) typically generates paired graph augmentations from each graph to infer similar representations for augmentations of the same graph, but d...
Finding Kissing Numbers with Game-theoretic Reinforcement Learning : Abstract: Since Isaac Newton first studied the Kissing Number Problem in 1694, determining the maximal number of non-overlapping spheres around a central sphere has remained a fundamental challenge. T...
Fast and Robust Simulation-Based Inference With Optimization Monte Carlo : Abstract: Bayesian parameter inference for complex stochastic simulators is challenging due to intractable likelihood functions. Existing simulation-based inference methods often require large number ...
PAST: A Primary-Auxiliary Spatio-Temporal Network for Traffic Time Series Imputation : Abstract: Traffic time series imputation is crucial for the safety and reliability of intelligent transportation systems, while diverse types of missing data, including random, fiber, and block missin...
MMWSTM-ADRAN+: A Novel Hybrid Deep Learning Architecture for Enhanced Climate Time Series Forecasting and Extreme Event Prediction : Abstract: Accurate short-range prediction of extreme air temperature events remains a fundamental challenge in operational climate-risk management. We present Multi-Modal Weather State Transition Mode...
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression : Abstract: While data scaling laws of large language models (LLMs) have been widely examined in the one-pass regime with massive corpora, their form under limited data and repeated epochs remains large...
Discovering Operational Patterns Using Image-Based Convolutional Clustering and Composite Evaluation: A Case Study in Foundry Melting Processes : Abstract: Industrial process monitoring increasingly relies on sensor-generated time-series data, yet the lack of labels, high variability, and operational noise make it difficult to extract meaningfu...
Hardware optimization on Android for inference of AI models : Abstract: The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Opt...
Artificial Intelligence-Enabled Spirometry for Early Detection of Right Heart Failure : Abstract: Right heart failure (RHF) is a disease characterized by abnormalities in the structure or function of the right ventricle (RV), which is associated with high morbidity and mortality. Lung di...
Multi-task GINN-LP for Multi-target Symbolic Regression : Abstract: In the area of explainable artificial intelligence, Symbolic Regression (SR) has emerged as a promising approach by discovering interpretable mathematical expressions that fit data. However,...
AdamX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate : Abstract: Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-d...
GREAT: Generalizable Representation Enhancement via Auxiliary Transformations for Zero-Shot Environmental Prediction : Abstract: Environmental modeling faces critical challenges in predicting ecosystem dynamics across unmonitored regions due to limited and geographically imbalanced observation data. This challenge is ...
Quantum Machine Learning via Contrastive Training : Abstract: Quantum machine learning (QML) has attracted growing interest with the rapid parallel advances in large-scale classical machine learning and quantum technologies. Similar to classical machin...
Naga: Vedic Encoding for Deep State Space Models : Abstract: This paper presents Naga, a deep State Space Model (SSM) encoding approach inspired by structural concepts from Vedic mathematics. The proposed method introduces a bidirectional representati...
A Quantum Tensor Network-Based Viewpoint for Modeling and Analysis of Time Series Data : Abstract: Accurate uncertainty quantification is a critical challenge in machine learning. While neural networks are highly versatile and capable of learning complex patterns, they often lack interpre...
Mitigating Spurious Correlations in Patch-wise Tumor Classification on High-Resolution Multimodal Images : Abstract: Patch-wise multi-label classification provides an efficient alternative to full pixel-wise segmentation on high-resolution images, particularly when the objective is to determine the presenc...
Fairness-Aware Graph Representation Learning with Limited Demographic Information : Abstract: Ensuring fairness in Graph Neural Networks is fundamental to promoting trustworthy and socially responsible machine learning systems. In response, numerous fair graph learning methods have b...
Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries : Abstract: A key challenge in graph out-of-distribution (OOD) detection lies in the absence of ground-truth OOD samples during training. Existing methods are typically optimized to capture features wit...
RAC-DMVC: Reliability-Aware Contrastive Deep Multi-View Clustering under Multi-Source Noise : Abstract: Multi-view clustering (MVC), which aims to separate the multi-view data into distinct clusters in an unsupervised manner, is a fundamental yet challenging task. To enhance its applicability ...
P1: Mastering Physics Olympiads with Reinforcement Learning : Abstract: Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against natu...
Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization : Abstract: Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the promise of parameters. A major computational bottleneck arises...
Towards Multimodal Representation Learning in Paediatric Kidney Disease : Abstract: Paediatric kidney disease varies widely in its presentation and progression, which calls for continuous monitoring of renal function. Using electronic health records collected between 2019 a...
Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures : Abstract: The rapid progress of large language models (LLMs) is fueled by the growing reliance on datasets that blend real and synthetic data. While synthetic data offers scalability and cost-efficien...
FuseSampleAgg: Fused Neighbor Sampling and Aggregation for Mini-batch GNNs : Abstract: We present FuseSampleAgg, a CUDA operator that fuses neighbor sampling and mean aggregation into a single pass for one and two hop GraphSAGE. By eliminating block materialization and extra k...
Weight-sparse transformers have interpretable circuits : Abstract: Finding human-understandable circuits in language models is a central goal of the field of mechanistic interpretability. We train models to have more understandable circuits by constraining ...
Tuning for Two Adversaries: Enhancing the Robustness Against Transfer and Query-Based Attacks using Hyperparameter Tuning : Abstract: In this paper, we present the first detailed analysis of how optimization hyperparameters -- such as learning rate, weight decay, momentum, and batch size -- influence robustness against bot...
Scientific Data Compression and Super-Resolution Sampling : Abstract: Modern scientific simulations, observations, and large-scale experiments generate data at volumes that often exceed the limits of storage, processing, and analysis. This challenge drives the...
Cross-Learning from Scarce Data via Multi-Task Constrained Optimization : Abstract: A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying di...
Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers : Abstract: In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yieldi...
Efficient Calibration for Decision Making : Abstract: A decision-theoretic characterization of perfect calibration is that an agent seeking to minimize a proper loss in expectation cannot improve their outcome by post-processing a perfectly cal...
Learning stochasticity: a nonparametric framework for intrinsic noise estimation : Abstract: Understanding the principles that govern dynamical systems is a central challenge across many scientific domains, including biology and ecology. Incomplete knowledge of nonlinear interaction...
ST-ProC: A Graph-Prototypical Framework for Robust Semi-Supervised Travel Mode Identification : Abstract: Travel mode identification (TMI) from GPS trajectories is critical for urban intelligence, but is hampered by the high cost of annotation, leading to severe label scarcity. Prevailing semi-s...
Rare Genomic Subtype Discovery from RNA-seq via Autoencoder Embeddings and Stability-Aware Clustering : Abstract: Unsupervised learning on high-dimensional RNA-seq data can reveal molecular subtypes beyond standard labels. We combine an autoencoder-based representation with clustering and stability anal...
From Black Box to Insight: Explainable AI for Extreme Event Preparedness : Abstract: As climate change accelerates the frequency and severity of extreme events such as wildfires, the need for accurate, explainable, and actionable forecasting becomes increasingly urgent. Whil...
Limitations of Quantum Advantage in Unsupervised Machine Learning : Abstract: Machine learning models are used for pattern recognition analysis of big data, without direct human intervention. The task of unsupervised learning is to find the probability distribution th...
LLM Architecture, Scaling Laws, and Economics: A Quick Summary : Abstract: The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute...
Social and Physical Attributes-Defined Trust Evaluation for Effective Collaborator Selection in Human-Device Coexistence Systems : Abstract: In human-device coexistence systems, collaborations among devices are determined by not only physical attributes such as network topology but also social attributes among human users. Conseq...
Mind the Gap: Revealing Inconsistencies Across Heterogeneous AI Accelerators : Abstract: While NVIDIA remains the dominant provider of AI accelerators within cloud data center, emerging vendors such as AMD, Intel, Mac, and Huawei offer cost-effective alternatives with claims of ...
Quantifying Skill and Chance: A Unified Framework for the Geometry of Games : Abstract: We introduce a quantitative framework for separating skill and chance in games by modeling them as complementary sources of control over stochastic decision trees. We define the Skill-Luck I...
Physics-Informed Neural Network-based Reliability Analysis of Buried Pipelines : Abstract: Buried pipelines transporting oil and gas across geohazard-prone regions are exposed to potential ground movement, leading to the risk of significant strain demand and structural failure. Re...
Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates : Abstract: Passive acoustic monitoring is a sustainable method of monitoring wildlife and environments that leads to the generation of large datasets and, currently, a processing backlog. Academic rese...
Hierarchical Federated Graph Attention Networks for Scalable and Resilient UAV Collision Avoidance : Abstract: The real-time performance, adversarial resiliency, and privacy preservation are the most important metrics that need to be balanced to practice collision avoidance in large-scale multi-UAV (...
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges : Abstract: Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devic...
Omics-scale polymer computational database transferable to real-world artificial intelligence applications : Abstract: Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural ...
Tactile Data Recording System for Clothing with Motion-Controlled Robotic Sliding : Abstract: The tactile sensation of clothing is critical to wearer comfort. To reveal physical properties that make clothing comfortable, systematic collection of tactile data during sliding motion is ...
Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications : Abstract: Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes. Leveraging speculative weight u...
The Environmental Impact of Ensemble Techniques in Recommender Systems : Abstract: Ensemble techniques in recommender systems have demonstrated accuracy improvements of 10-30%, yet their environmental impact remains unmeasured. While deep learning recommendation algorithms...
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning : Abstract: Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical ...
A Structure-Agnostic Co-Tuning Framework for LLMs and SLMs in Cloud-Edge Systems : Abstract: The surge in intelligent applications driven by large language models (LLMs) has made it increasingly difficult for bandwidth-limited cloud servers to process extensive LLM workloads in real...
Generalized Inequality-based Approach for Probabilistic WCET Estimation : Abstract: Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving...
Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation : Abstract: Generative vision-language models like Stable Diffusion demonstrate remarkable capabilities in creative media synthesis, but they also pose substantial risks of producing unsafe, offensive, ...
Harli: Harvest Underutilized Resources in LLM Serving with Finetuning Tasks : Abstract: Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems dis...
Noise-Aware Optimization in Nominally Identical Manufacturing and Measuring Systems for High-Throughput Parallel Workflows : Abstract: Device-to-device variability in experimental noise critically impacts reproducibility, especially in automated, high-throughput systems like additive manufacturing farms. While manageable in...
Socrates-Mol: Self-Oriented Cognitive Reasoning through Autonomous Trial-and-Error with Empirical-Bayesian Screening for Molecules : Abstract: Molecular property prediction is fundamental to chemical engineering applications such as solvent screening. We present Socrates-Mol, a framework that transforms language models into empiric...
Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction : Abstract: Generating complex, logically-sound SPARQL queries for multi-hop questions remains a critical bottleneck for Knowledge Graph Question Answering, as the brittle nature of one-shot generation ...
On the Measure of a Model: From Intelligence to Generality : Abstract: Benchmarks such as ARC, Raven-inspired tests, and the Blackbird Task are widely used to evaluate the intelligence of large language models (LLMs). Yet, the concept of intelligence remains el...
Towards Mitigating Systematics in Large-Scale Surveys via Few-Shot Optimal Transport-Based Feature Alignment : Abstract: Systematics contaminate observables, leading to distribution shifts relative to theoretically simulated signals-posing a major challenge for using pre-trained models to label such observable...
FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition : Abstract: Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dep...
A Computational Method for Solving the Stochastic Joint Replenishment Problem in High Dimensions : Abstract: We consider a discrete-time formulation for a class of high-dimensional stochastic joint replenishment problems. First, we approximate the problem by a continuous-time impulse control proble...
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) typically align visual features from an encoder with a pre-trained Large Language Model (LLM). However, this makes the visual perception module a bottlen...
MP-GFormer: A 3D-Geometry-Aware Dynamic Graph Transformer Approach for Machining Process Planning : Abstract: Machining process planning (MP) is inherently complex due to structural and geometrical dependencies among part features and machining operations. A key challenge lies in capturing dynamic i...
Modeling X-ray photon pile-up with a normalizing flow : Abstract: The dynamic range of imaging detectors flown on-board X-ray observatories often only covers a limited flux range of extrasolar X-ray sources. The analysis of bright X-ray sources is complica...
ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts : Abstract: Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor ge...
Forgetting-MarI: LLM Unlearning via Marginal Information Regularization : Abstract: As AI models are trained on ever-expanding datasets, the ability to remove the influence of specific data from trained models has become essential for privacy protection and regulatory compl...
Additive Large Language Models for Semi-Structured Text : Abstract: Large Language Models have advanced clinical text classification, but their opaque predictions remain a critical barrier to practical adoption in research and clinical settings where investi...
PCA recovery thresholds in low-rank matrix inference with sparse noise : Abstract: We study the high-dimensional inference of a rank-one signal corrupted by sparse noise. The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average conne...
Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering : Abstract: In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audi...
InData: Towards Secure Multi-Step, Tool-Based Data Analysis : Abstract: Large language model agents for data analysis typically generate and execute code directly on databases. However, when applied to sensitive data, this approach poses significant security ris...
A Deep Learning Framework for Thyroid Nodule Segmentation and Malignancy Classification from Ultrasound Images : Abstract: Ultrasound-based risk stratification of thyroid nodules is a critical clinical task, but it suffers from high inter-observer variability. While many deep learning (DL) models function as "bl...
Improving Neutrino Oscillation Measurements through Event Classification : Abstract: Precise neutrino energy reconstruction is essential for next-generation long-baseline oscillation experiments, yet current methods remain limited by large uncertainties in neutrino-nucleus i...
Augmenting The Weather: A Hybrid Counterfactual-SMOTE Algorithm for Improving Crop Growth Prediction When Climate Changes : Abstract: In recent years, humanity has begun to experience the catastrophic effects of climate change as economic sectors (such as agriculture) struggle with unpredictable and extreme weather events....
Improving LLM's Attachment to External Knowledge In Dialogue Generation Tasks Through Entity Anonymization : Abstract: Knowledge graph-based dialogue generation (KG-DG) is a challenging task requiring models to effectively incorporate external knowledge into conversational responses. While large language mod...
Temporal Micro-Doppler Spectrogram-based ViT Multiclass Target Classification : Abstract: In this paper, we propose a new Temporal MDS-Vision Transformer (T-MDS-ViT) for multiclass target classification using millimeter-wave FMCW radar micro-Doppler spectrograms. Specifically, we...
On the Entropy Calibration of Language Models : Abstract: We study the problem of entropy calibration, which asks whether a language model's entropy over generations matches its log loss on human text. Past work found that models are miscalibrated,...
Bayesian--AI Fusion for Epidemiological Decision Making: Calibrated Risk, Honest Uncertainty, and Hyperparameter Intelligence : Abstract: Modern epidemiological analytics increasingly use machine learning models that offer strong prediction but often lack calibrated uncertainty. Bayesian methods provide principled uncertainty ...
Goal-Oriented Multi-Agent Reinforcement Learning for Decentralized Agent Teams : Abstract: Connected and autonomous vehicles across land, water, and air must often operate in dynamic, unpredictable environments with limited communication, no centralized control, and partial observ...
Dynamic Parameter Optimization for Highly Transferable Transformation-Based Attacks : Abstract: Despite their wide application, the vulnerabilities of deep neural networks raise societal concerns. Among them, transformation-based attacks have demonstrated notable success in transfer at...
Uncertainty-Guided Selective Adaptation Enables Cross-Platform Predictive Fluorescence Microscopy : Abstract: Deep learning is transforming microscopy, yet models often fail when applied to images from new instruments or acquisition settings. Conventional adversarial domain adaptation (ADDA) retrain...
Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models : Abstract: AI tools in pathology have improved screening throughput, standardized quantification, and revealed prognostic patterns that inform treatment. However, adoption remains limited because most ...
Enhancing Road Safety Through Multi-Camera Image Segmentation with Post-Encroachment Time Analysis : Abstract: Traffic safety analysis at signalized intersections is vital for reducing vehicle and pedestrian collisions, yet traditional crash-based studies are limited by data sparsity and latency. Thi...
Calibrated Multimodal Representation Learning with Missing Modalities : Abstract: Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space. Recent research generalizes traditional cross-modal alignment to produce enhan...
Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys : Abstract: We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our appro...
BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning : Abstract: Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks ...
Aggregating Conformal Prediction Sets via {\alpha}-Allocation : Abstract: Conformal prediction offers a distribution-free framework for constructing prediction sets with finite-sample coverage. Yet, efficiently leveraging multiple conformity scores to reduce predi...
Informed Bootstrap Augmentation Improves EEG Decoding : Abstract: Electroencephalography (EEG) offers detailed access to neural dynamics but remains constrained by noise and trial-by-trial variability, limiting decoding performance in data-restricted or co...
From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction : Abstract: Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen i...
Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness : Abstract: Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant and persistent threat. These attacks exp...
Decoupled Action Head: Confining Task Knowledge to Conditioning Layers : Abstract: Behavior Cloning (BC) is a data-driven supervised learning approach that has gained increasing attention with the success of scaling laws in language and vision domains. Among its implementa...
TEMPO: Global Temporal Building Density and Height Estimation from Satellite Imagery : Abstract: We present TEMPO, a global, temporally resolved dataset of building density and height derived from high-resolution satellite imagery using deep learning models. We pair building footprint a...
Codebook-Centric Deep Hashing: End-to-End Joint Learning of Semantic Hash Centers and Neural Hash Function : Abstract: Hash center-based deep hashing methods improve upon pairwise or triplet-based approaches by assigning fixed hash centers to each class as learning targets, thereby avoiding the inefficiency ...
Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy : Abstract: The extensive use of pesticides and synthetic dyes poses critical threats to food safety, human health, and environmental sustainability, necessitating rapid and reliable detection methods. ...
MixAR: Mixture Autoregressive Image Generation : Abstract: Autoregressive (AR) approaches, which represent images as sequences of discrete tokens from a finite codebook, have achieved remarkable success in image generation. However, the quantization...
Chemistry-Enhanced Diffusion-Based Framework for Small-to-Large Molecular Conformation Generation : Abstract: Obtaining 3D conformations of realistic polyatomic molecules at the quantum chemistry level remains challenging, and although recent machine learning advances offer promise, predicting large...
Suppressing VLM Hallucinations with Spectral Representation Filtering : Abstract: Vision-language models (VLMs) frequently produce hallucinations in the form of descriptions of objects, attributes, or relations that do not exist in the image due to over-reliance on langua...
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts : Abstract: Large language models (LLMs), despite their remarkable text generation capabilities, often hallucinate and generate text that is factually incorrect and not grounded in real-world knowledge....
Reinforcement Learning for Chemical Ordering in Alloy Nanoparticles : Abstract: We approach the search for optimal element ordering in bimetallic alloy nanoparticles (NPs) as a reinforcement learning (RL) problem, and have built an RL agent that learns to perform such g...
PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning : Abstract: High-dimensional data often contain low-dimensional signals obscured by structured background noise, which limits the effectiveness of standard PCA. Motivated by contrastive learning, we add...
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs : Abstract: Diffusion-based multimodal large language models (Diffusion MLLMs) have recently demonstrated impressive non-autoregressive generative capabilities across vision-and-language tasks. However,...
Cmprsr: Abstractive Token-Level Question-Agnostic Prompt Compressor : Abstract: Motivated by the high costs of using black-box Large Language Models (LLMs), we introduce a novel prompt compression paradigm, under which we use smaller LLMs to compress inputs for the larg...
Learning Time in Static Classifiers : Abstract: Real-world visual data rarely presents as isolated, static instances. Instead, it often evolves gradually over time through variations in pose, lighting, object state, or scene context. Howe...
Linear time small coresets for k-mean clustering of segments with applications : Abstract: We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum...
Enhancing Machine Learning Model Efficiency through Quantization and Bit Depth Optimization: A Performance Analysis on Healthcare Data : Abstract: This research aims to optimize intricate learning models by implementing quantization and bit-depth optimization techniques. The objective is to significantly cut time complexity while prese...
LMM-IR: Large-Scale Netlist-Aware Multimodal Framework for Static IR-Drop Prediction : Abstract: Static IR drop analysis is a fundamental and critical task in the field of chip design. Nevertheless, this process can be quite time-consuming, potentially requiring several hours. Moreover,...
Symmetry-Aware Graph Metanetwork Autoencoders: Model Merging through Parameter Canonicalization : Abstract: Neural network parameterizations exhibit inherent symmetries that yield multiple equivalent minima within the loss landscape. Scale Graph Metanetworks (ScaleGMNs) explicitly leverage these s...
PID-controlled Langevin Dynamics for Faster Sampling of Generative Models : Abstract: Langevin dynamics sampling suffers from extremely low generation speed, fundamentally limited by numerous fine-grained iterations to converge to the target distribution. We introduce PID-con...
FedTopo: Topology-Informed Representation Alignment in Federated Learning under Non-I.I.D. Conditions : Abstract: Current federated-learning models deteriorate under heterogeneous (non-I.I.D.) client data, as their feature representations diverge and pixel- or patch-level objectives fail to capture the ...
NFQ2.0: The CartPole Benchmark Revisited : Abstract: This article revisits the 20-year-old neural fitted Q-iteration (NFQ) algorithm on its classical CartPole benchmark. NFQ was a pioneering approach towards modern Deep Reinforcement Learning ...
Sample Complexity of Agnostic Multiclass Classification: Natarajan Dimension Strikes Back : Abstract: The fundamental theorem of statistical learning states that binary PAC learning is governed by a single parameter -- the Vapnik-Chervonenkis (VC) dimension -- which determines both learnabil...
FLClear: Visually Verifiable Multi-Client Watermarking for Federated Learning : Abstract: Federated learning (FL) enables multiple clients to collaboratively train a shared global model while preserving the privacy of their local data. Within this paradigm, the intellectual prope...
Attention-Enhanced Convolutional Autoencoder and Structured Delay Embeddings for Weather Prediction : Abstract: Weather prediction is a quintessential problem involving the forecasting of a complex, nonlinear, and chaotic high-dimensional dynamical system. This work introduces an efficient reduced-ord...
A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning : Abstract: Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributi...
Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs : Abstract: Training general agents to follow complex instructions (tasks) in intricate environments (levels) remains a core challenge in reinforcement learning. Random sampling of task-level pairs ofte...
Adaptive Graph Rewiring to Mitigate Over-Squashing in Mesh-Based GNNs for Fluid Dynamics Simulations : Abstract: Mesh-based simulation using Graph Neural Networks (GNNs) has been recognized as a promising approach for modeling fluid dynamics. However, the mesh refinement techniques which allocate finer...
Oxytrees: Model Trees for Bipartite Learning : Abstract: Bipartite learning is a machine learning task that aims to predict interactions between pairs of instances. It has been applied to various domains, including drug-target interactions, RNA-di...
On Robustness of Linear Classifiers to Targeted Data Poisoning : Abstract: Data poisoning is a training-time attack that undermines the trustworthiness of learned models. In a targeted data poisoning attack, an adversary manipulates the training dataset to alter th...
LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks : Abstract: Deep neural networks typically rely on the representation produced by their final hidden layer to make predictions, implicitly assuming that this single vector fully captures the semantics e...
Convolutional Model Trees : Abstract: A method for creating a forest of model trees to fit samples of a function defined on images is described in several steps: down-sampling the images, determining a tree's hyperplanes, applyi...
Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering : Abstract: As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training i...
DIVIDE: A Framework for Learning from Independent Multi-Mechanism Data Using Deep Encoders and Gaussian Processes : Abstract: Scientific datasets often arise from multiple independent mechanisms such as spatial, categorical or structural effects, whose combined influence obscures their individual contributions. We ...
Are LLMs The Way Forward? A Case Study on LLM-Guided Reinforcement Learning for Decentralized Autonomous Driving : Abstract: Autonomous vehicle navigation in complex environments such as dense and fast-moving highways and merging scenarios remains an active area of research. A key limitation of RL is its reliance ...
Conformal Online Learning of Deep Koopman Linear Embeddings : Abstract: We introduce Conformal Online Learning of Koopman embeddings (COLoKe), a novel framework for adaptively updating Koopman-invariant representations of nonlinear dynamical systems from streami...
INC: An Indirect Neural Corrector for Auto-Regressive Hybrid PDE Solvers : Abstract: When simulating partial differential equations, hybrid solvers combine coarse numerical solvers with learned correctors. They promise accelerated simulations while adhering to physical const...
MolEdit: Knowledge Editing for Multimodal Molecule Language Models : Abstract: Understanding and continuously refining multimodal molecular knowledge is crucial for advancing biomedicine, chemistry, and materials science. Molecule language models (MoLMs) have become po...
Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation : Abstract: We study the problem of efficiently estimating policies that simultaneously optimize multiple objectives in reinforcement learning (RL). Given $n$ objectives (or tasks), we seek the optimal ...
Physics-Constrained Adaptive Neural Networks Enable Real-Time Semiconductor Manufacturing Optimization with Minimal Training Data : Abstract: The semiconductor industry faces a computational crisis in extreme ultraviolet (EUV) lithography optimization, where traditional methods consume billions of CPU hours while failing to achiev...
Optimal Look-back Horizon for Time Series Forecasting in Federated Learning : Abstract: Selecting an appropriate look-back horizon remains a fundamental challenge in time series forecasting (TSF), particularly in the federated learning scenarios where data is decentralized, het...
Genomic Next-Token Predictors are In-Context Learners : Abstract: In-context learning (ICL) -- the capacity of a model to infer and apply abstract patterns from examples provided within its input -- has been extensively studied in large language models tra...
The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation : Abstract: In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. We provide the first formal foundation ...
Expressive Temporal Specifications for Reward Monitoring : Abstract: Specifying informative and dense reward functions remains a pivotal challenge in Reinforcement Learning, as it directly affects the efficiency of agent training. In this work, we harness the...
Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs : Abstract: The recent proliferation of large language models (LLMs) holds the potential to revolutionize healthcare, with strong capabilities in diverse medical tasks. Yet, deploying LLMs in high-stake...
Catastrophic Forgetting in Kolmogorov-Arnold Networks : Abstract: Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have bee...
An Evaluation of Representation Learning Methods in Particle Physics Foundation Models : Abstract: We present a systematic evaluation of representation learning objectives for particle physics within a unified framework. Our study employs a shared transformer-based particle-cloud encoder ...
Connectivity-Guided Sparsification of 2-FWL GNNs: Preserving Full Expressivity with Improved Efficiency : Abstract: Higher-order Graph Neural Networks (HOGNNs) based on the 2-FWL test achieve superior expressivity by modeling 2- and 3-node interactions, but at $\mathcal{O}(n^3)$ computational cost. Howeve...
RoS-Guard: Robust and Scalable Online Change Detection with Delay-Optimal Guarantees : Abstract: Online change detection (OCD) aims to rapidly identify change points in streaming data and is critical in applications such as power system monitoring, wireless network sensing, and financia...
From Black-Box to White-Box: Control-Theoretic Neural Network Interpretability : Abstract: Deep neural networks achieve state of the art performance but remain difficult to interpret mechanistically. In this work, we propose a control theoretic framework that treats a trained neur...
An approach of deep reinforcement learning for maximizing the net present value of stochastic projects : Abstract: This paper investigates a project with stochastic activity durations and cash flows under discrete scenarios, where activities must satisfy precedence constraints generating cash inflows and...
On the Fundamental Limits of LLMs at Scale : Abstract: Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning de...
On the Information Processing of One-Dimensional Wasserstein Distances with Finite Samples : Abstract: Leveraging the Wasserstein distance -- a summation of sample-wise transport distances in data space -- is advantageous in many applications for measuring support differences between two unde...
Method of Manufactured Learning for Solver-free Training of Neural Operators : Abstract: Training neural operators to approximate mappings between infinite-dimensional function spaces often requires extensive datasets generated by either demanding experimental setups or computat...
Functional Mean Flow in Hilbert Space : Abstract: We present Functional Mean Flow (FMF) as a one-step generative model defined in infinite-dimensional Hilbert space. FMF extends the one-step Mean Flow framework to functional domains by prov...
Contrastive Entropy Bounds for Density and Conditional Density Decomposition : Abstract: This paper studies the interpretability of neural network features from a Bayesian Gaussian view, where optimizing a cost is reaching a probabilistic bound; learning a model approximates a d...
LinkedIn Profile Characteristics and Professional Success Indicators : Abstract: This study explores the relationship between LinkedIn profile characteristics and professional success, focusing on the indicators of promotions, follower count, and career progression rate....
AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking : Abstract: In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation ...
APT: Affine Prototype-Timestamp For Time Series Forecasting Under Distribution Shift : Abstract: Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to c...
A FEDformer-Based Hybrid Framework for Anomaly Detection and Risk Forecasting in Financial Time Series : Abstract: Financial markets are inherently volatile and prone to sudden disruptions such as market crashes, flash collapses, and liquidity crises. Accurate anomaly detection and early risk forecasting...
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series : Abstract: Multivariate time series classification is increasingly investigated in space weather research as a means to predict intense solar flare events, which can cause widespread disruptions across...
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems : Abstract: Retrieval-Augmented Generation (RAG) is a critical paradigm for building reliable, knowledge-intensive Large Language Model (LLM) applications. However, the multi-stage pipeline (retrieve, g...
Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks : Abstract: Adversarial examples in neural networks have been extensively studied in Euclidean geometry, but recent advances in \textit{hyperbolic networks} call for a reevaluation of attack strategies ...
Learning Branching Policies for MILPs with Proximal Policy Optimization : Abstract: Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale insta...
Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs : Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for learning on graph-structured data, but often struggle to balance local and global information. While graph Transformers aim to...
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training : Abstract: Reward design is central to reinforcement learning from human feedback (RLHF) and alignment research. In this work, we propose a unified framework to study hard, continuous, and hybrid rewar...
The Final-Stage Bottleneck: A Systematic Dissection of the R-Learner for Network Causal Inference : Abstract: The R-Learner is a powerful, theoretically-grounded framework for estimating heterogeneous treatment effects, prized for its robustness to nuisance model errors. However, its application to ...
Learning Time-Scale Invariant Population-Level Neural Representations : Abstract: General-purpose foundation models for neural time series can help accelerate neuroscientific discoveries and enable applications such as brain computer interfaces (BCIs). A key component in ...
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment : Abstract: Despite the growing interest in Small Language Models (SLMs) as resource-efficient alternatives to Large Language Models (LLMs), their deployment on edge devices remains challenging due to u...
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow : Abstract: We introduce a one-step generative policy for offline reinforcement learning that maps noise directly to actions via a residual reformulation of MeanFlow, making it compatible with Q-learnin...
Bi-View Embedding Fusion: A Hybrid Learning Approach for Knowledge Graph's Nodes Classification Addressing Problems with Limited Data : Abstract: Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional sy...
Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information : Abstract: We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common ...
Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting : Abstract: Language models (LMs) are often adapted through supervised fine-tuning (SFT) to specialize their capabilities for downstream tasks. However, in typical scenarios where the fine-tuning data i...
Self-Organization of Attractor Landscapes in High-Capacity Kernel Logistic Regression Hopfield Networks : Abstract: Kernel-based learning methods can dramatically increase the storage capacity of Hopfield networks, yet the dynamical mechanism behind this enhancement remains poorly understood. We address t...
Latency and Ordering Effects in Online Decisions : Abstract: Online decision systems routinely operate under delayed feedback and order-sensitive (noncommutative) dynamics: actions affect which observations arrive, and in what sequence. Taking a Bregm...
MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity : Abstract: Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in the inference of sparse Large Language Models (LLMs). Because existing SpMV methods perform poorly under the low and ...
Self-Adaptive Graph Mixture of Models : Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for learning over graph-structured data, yet recent studies have shown that their performance gains are beginning to plateau. In m...
A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning : Abstract: Emergency Medical Technicians (EMTs) operate in high-pressure environments, making rapid, life-critical decisions under heavy cognitive and operational loads. We present EMSGlass, a smart-gl...
Real-time prediction of breast cancer sites using deformation-aware graph neural network : Abstract: Early diagnosis of breast cancer is crucial, enabling the establishment of appropriate treatment plans and markedly enhancing patient prognosis. While direct magnetic resonance imaging-guide...
Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions : Abstract: Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leadin...
Synthetic Forgetting without Access: A Few-shot Zero-glance Framework for Machine Unlearning : Abstract: Machine unlearning aims to eliminate the influence of specific data from trained models to ensure privacy compliance. However, most existing methods assume full access to the original traini...
Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schr\"odinger Bridges : Abstract: Predicting single-cell perturbation outcomes directly advances gene function analysis and facilitates drug candidate selection, making it a key driver of both basic and translational biomedi...
Soft Conflict-Resolution Decision Transformer for Offline Multi-Task Reinforcement Learning : Abstract: Multi-task reinforcement learning (MTRL) seeks to learn a unified policy for diverse tasks, but often suffers from gradient conflicts across tasks. Existing masking-based methods attempt to ...
Personalized Federated Learning with Bidirectional Communication Compression via One-Bit Random Sketching : Abstract: Federated Learning (FL) enables collaborative training across decentralized data, but faces key challenges of bidirectional communication overhead and client-side data heterogeneity. To addr...
OTARo: Once Tuning for All Precisions toward Robust On-Device LLMs : Abstract: Large Language Models (LLMs) fine-tuning techniques not only improve the adaptability to diverse downstream tasks, but also mitigate adverse effects of model quantization. Despite this, conv...
Warm-starting active-set solvers using graph neural networks : Abstract: Quadratic programming (QP) solvers are widely used in real-time control and optimization, but their computational cost often limits applicability in time-critical settings. We propose a lear...
Real-time distortion prediction in metallic additive manufacturing via a physics-informed neural operator approach : Abstract: With the development of digital twins and smart manufacturing systems, there is an urgent need for real-time distortion field prediction to control defects in metal Additive Manufacturing (A...
Uncertainty-aware Physics-informed Neural Networks for Robust CARS-to-Raman Signal Reconstruction : Abstract: Coherent anti-Stokes Raman scattering (CARS) spectroscopy is a powerful and rapid technique widely used in medicine, material science, and chemical analyses. However, its effectiveness is hi...
DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play : Abstract: Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behavi...
ParaDySe: A Parallel-Strategy Switching Framework for Dynamic Sequence Lengths in Transformer : Abstract: Dynamic sequences with varying lengths have been widely used in the training of Transformer-based large language models (LLMs). However, current training frameworks adopt a pre-defined stati...
TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs : Abstract: Emerging reasoning LLMs such as OpenAI-o1 and DeepSeek-R1 have achieved strong performance on complex reasoning tasks by generating long chain-of-thought (CoT) traces. However, these long Co...
Laplace Learning in Wasserstein Space : Abstract: The manifold hypothesis posits that high-dimensional data typically resides on low-dimensional sub spaces. In this paper, we assume manifold hypothesis to investigate graph-based semi-superv...
MorphBoost: Self-Organizing Universal Gradient Boosting with Adaptive Tree Morphing : Abstract: Traditional gradient boosting algorithms employ static tree structures with fixed splitting criteria that remain unchanged throughout training, limiting their ability to adapt to evolving gr...
Counterfactual Explainable AI (XAI) Method for Deep Learning-Based Multivariate Time Series Classification : Abstract: Recent advances in deep learning have improved multivariate time series (MTS) classification and regression by capturing complex patterns, but their lack of transparency hinders decision-mak...
Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms : Abstract: This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent po...
Incoherent Beliefs & Inconsistent Actions in Large Language Models : Abstract: Real-world tasks and environments exhibit differences from the static datasets that large language models (LLMs) are typically evaluated on. Such tasks can involve sequential interaction, re...
Uncovering and Mitigating Transient Blindness in Multimodal Model Editing : Abstract: Multimodal Model Editing (MMED) aims to correct erroneous knowledge in multimodal models. Existing evaluation methods, adapted from textual model editing, overstate success by relying on low...
Seek and You Shall Fold : Abstract: Accurate protein structures are essential for understanding biological function, yet incorporating experimental data into protein generative models remains a major challenge. Most predictors...
Edge-aware baselines for ogbn-proteins in PyTorch Geometric: species-wise normalization, post-hoc calibration, and cost-accuracy trade-offs : Abstract: We present reproducible, edge-aware baselines for ogbn-proteins in PyTorch Geometric (PyG). We study two system choices that dominate practice: (i) how 8-dimensional edge evidence is aggrega...
KForge: Program Synthesis for Diverse AI Hardware Accelerators : Abstract: GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agent...
Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning : Abstract: Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transpar...
Tab-PET: Graph-Based Positional Encodings for Tabular Transformers : Abstract: Supervised learning with tabular data presents unique challenges, including low data sizes, the absence of structural cues, and heterogeneous features spanning both categorical and continuou...
Statistically Accurate and Robust Generative Prediction of Rock Discontinuities with A Tabular Foundation Model : Abstract: Rock discontinuities critically govern the mechanical behavior and stability of rock masses. Their internal distributions remain largely unobservable and are typically inferred from surface-...
Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning : Abstract: Food analysis has become increasingly critical for health-related tasks such as personalized nutrition and chronic disease prevention. However, existing large multimodal models (LMMs) in foo...
A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs : Abstract: Large Language Models (LLMs) face significant challenges in distributed healthcare, including consolidating specialized domain knowledge across institutions while maintaining privacy, reduci...
Federated Learning for Pediatric Pneumonia Detection: Enabling Collaborative Diagnosis Without Sharing Patient Data : Abstract: Early and accurate pneumonia detection from chest X-rays (CXRs) is clinically critical to expedite treatment and isolation, reduce complications, and curb unnecessary antibiotic use. Althoug...
Multiscale Grassmann Manifolds for Single-Cell Data Analysis : Abstract: Single-cell data analysis seeks to characterize cellular heterogeneity based on high-dimensional gene expression profiles. Conventional approaches represent each cell as a vector in Euclidea...
Fast 3D Surrogate Modeling for Data Center Thermal Management : Abstract: Reducing energy consumption and carbon emissions in data centers by enabling real-time temperature prediction is critical for sustainability and operational efficiency. Achieving this requir...
Optimizing Input of Denoising Score Matching is Biased Towards Higher Score Norm : Abstract: Many recent works utilize denoising score matching to optimize the conditional input of diffusion models. In this workshop paper, we demonstrate that such optimization breaks the equivalence...
Physics-Informed Neural ODEs with Scale-Aware Residuals for Learning Stiff Biophysical Dynamics : Abstract: Neural differential equations offer a powerful framework for modeling continuous-time dynamics, but forecasting stiff biophysical systems remains unreliable. Standard Neural ODEs and physics...
KAN/H: Kolmogorov-Arnold Network using Haar-like bases : Abstract: This paper proposes KAN/H, a variant of Kolmogorov-Arnold Network (KAN) that uses a Haar-variant basis system having both global and local bases instead of B-spline. The resulting algorithm ...
DK-Root: A Joint Data-and-Knowledge-Driven Framework for Root Cause Analysis of QoE Degradations in Mobile Networks : Abstract: Diagnosing the root causes of Quality of Experience (QoE) degradations in operational mobile networks is challenging due to complex cross-layer interactions among kernel performance indicato...
Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts : Abstract: Deploying deep neural networks on resource-constrained devices faces two critical challenges: maintaining accuracy under aggressive quantization while ensuring predictable inference latency....
Diffusion Models: A Mathematical Introduction : Abstract: We present a concise, self-contained derivation of diffusion-based generative models. Starting from basic properties of Gaussian distributions (densities, quadratic expectations, re-paramete...
IDOL: Meeting Diverse Distribution Shifts with Prior Physics for Tropical Cyclone Multi-Task Estimation : Abstract: Tropical Cyclone (TC) estimation aims to accurately estimate various TC attributes in real time. However, distribution shifts arising from the complex and dynamic nature of TC environmental ...
Improving a Hybrid Graphsage Deep Network for Automatic Multi-objective Logistics Management in Supply Chain : Abstract: Systematic logistics, conveyance amenities and facilities as well as warehousing information play a key role in fostering profitable development in a supply chain. The aim of transformation ...
Sumudu Neural Operator for ODEs and PDEs : Abstract: We introduce the Sumudu Neural Operator (SNO), a neural operator rooted in the properties of the Sumudu Transform. We leverage the relationship between the polynomial expansions of transform...
Learning Fair Representations with Kolmogorov-Arnold Networks : Abstract: Despite recent advances in fairness-aware machine learning, predictive models often exhibit discriminatory behavior towards marginalized groups. Such unfairness might arise from biased train...
CATCHFed: Efficient Unlabeled Data Utilization for Semi-Supervised Federated Learning in Limited Labels Environments : Abstract: Federated learning is a promising paradigm that utilizes distributed client resources while preserving data privacy. Most existing FL approaches assume clients possess labeled data, however,...
Coordinate Descent for Network Linearization : Abstract: ReLU activations are the main bottleneck in Private Inference that is based on ResNet networks. This is because they incur significant inference latency. Reducing ReLU count is a discrete op...
Simplicial covering dimension of extremal concept classes : Abstract: Dimension theory is a branch of topology concerned with defining and analyzing dimensions of geometric and topological spaces in purely topological terms. In this work, we adapt the classica...
Conformal Constrained Policy Optimization for Cost-Effective LLM Agents : Abstract: While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We pro...
Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers : Abstract: Adversarial robustness remains a critical challenge in deploying neural network classifiers, particularly in real-time systems where ground-truth labels are unavailable during inference. Thi...
On the Trade-Off Between Transparency and Security in Adversarial Machine Learning : Abstract: Transparency and security are both central to Responsible AI, but they may conflict in adversarial settings. We investigate the strategic effect of transparency for agents through the lens o...
Leveraging Exogenous Signals for Hydrology Time Series Forecasting : Abstract: Recent advances in time series research facilitate the development of foundation models. While many state-of-the-art time series foundation models have been introduced, few studies examine t...
Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production : Abstract: Monitoring the spatiotemporal dynamics of forest CO$_2$ uptake (Gross Primary Production, GPP), remains a central challenge in terrestrial ecosystem research. While Eddy Covariance (EC) towe...
Better LLM Reasoning via Dual-Play : Abstract: Large Language Models (LLMs) have achieved remarkable progress through Reinforcement Learning with Verifiable Rewards (RLVR), yet still rely heavily on external supervision (e.g., curated la...
FLEX: Feature Importance from Layered Counterfactual Explanations : Abstract: Machine learning models achieve state-of-the-art performance across domains, yet their lack of interpretability limits safe deployment in high-stakes settings. Counterfactual explanations ar...
Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular Design : Abstract: Text-conditioned molecular generation aims to translate natural-language descriptions into chemical structures, enabling scientists to specify functional groups, scaffolds, and physicochemic...
Robust Bidirectional Associative Memory via Regularization Inspired by the Subspace Rotation Algorithm : Abstract: Bidirectional Associative Memory (BAM) trained with Bidirectional Backpropagation (B-BP) often suffers from poor robustness and high sensitivity to noise and adversarial attacks. To address ...
A Systematic Study of Model Extraction Attacks on Graph Foundation Models : Abstract: Graph machine learning has advanced rapidly in tasks such as link prediction, anomaly detection, and node classification. As models scale up, pretrained graph models have become valuable int...
Batch Matrix-form Equations and Implementation of Multilayer Perceptrons : Abstract: Multilayer perceptrons (MLPs) remain fundamental to modern deep learning, yet their algorithmic details are rarely presented in complete, explicit \emph{batch matrix-form}. Rather, most refe...
Beyond the Laplacian: Interpolated Spectral Augmentation for Graph Neural Networks : Abstract: Graph neural networks (GNNs) are fundamental tools in graph machine learning. The performance of GNNs relies crucially on the availability of informative node features, which can be limited ...
A Systematic Analysis of Out-of-Distribution Detection Under Representation and Training Paradigm Shifts : Abstract: We present a systematic comparison of out-of-distribution (OOD) detection methods across CLIP-stratified regimes using AURC and AUGRC as primary metrics. Experiments cover two representation...
SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis : Abstract: Electronic health record (EHR) data present tremendous opportunities for advancing survival analysis through deep learning, yet reproducibility remains severely constrained by inconsistent p...
Learning the relative composition of EEG signals using pairwise relative shift pretraining : Abstract: Self-supervised learning (SSL) offers a promising approach for learning electroencephalography (EEG) representations from unlabeled data, reducing the need for expensive annotations for clin...
Computation-aware Energy-harvesting Federated Learning: Cyclic Scheduling with Selective Participation : Abstract: Federated Learning (FL) is a powerful paradigm for distributed learning, but its increasing complexity leads to significant energy consumption from client-side computations for training mode...
Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression : Abstract: Offline reinforcement learning (RL) enables policy learning from fixed datasets without further environment interaction, making it particularly valuable in high-risk or costly domains. Extre...
ReCast: Reliability-aware Codebook Assisted Lightweight Time Series Forecasting : Abstract: Time series forecasting is crucial for applications in various domains. Conventional methods often rely on global decomposition into trend, seasonal, and residual components, which become in...
Selecting Fine-Tuning Examples by Quizzing VLMs : Abstract: A challenge in fine-tuning text-to-image diffusion models for specific topics is to select good examples. Fine-tuning from image sets of varying quality, such as Wikipedia Commons, will ofte...
EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation : Abstract: Recent advances in large language models (LLMs) have demonstrated significant potential in hardware design automation, particularly in using natural language to synthesize Register-Transfer ...
Mesh-based Super-resolution of Detonation Flows with Multiscale Graph Transformers : Abstract: Super-resolution flow reconstruction using state-of-the-art data-driven techniques is valuable for a variety of applications, such as subgrid/subfilter closure modeling, accelerating spatiot...
Improving Graph Embeddings in Machine Learning Using Knowledge Completion with Validation in a Case Study on COVID-19 Spread : Abstract: The rise of graph-structured data has driven major advances in Graph Machine Learning (GML), where graph embeddings (GEs) map features from Knowledge Graphs (KGs) into vector spaces, enablin...
Treatment Stitching with Schr\"odinger Bridge for Enhancing Offline Reinforcement Learning in Adaptive Treatment Strategies : Abstract: Adaptive treatment strategies (ATS) are sequential decision-making processes that enable personalized care by dynamically adjusting treatment decisions in response to evolving patient sympto...
SenseRay-3D: Generalizable and Physics-Informed Framework for End-to-End Indoor Propagation Modeling : Abstract: Modeling indoor radio propagation is crucial for wireless network planning and optimization. However, existing approaches often rely on labor-intensive manual modeling of geometry and materi...
To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance : Abstract: Multimodal learning often relies on aligning representations across modalities to enable effective information integration, an approach traditionally assumed to be universally beneficial. Ho...
Dynamic Anomaly Identification in Accounting Transactions via Multi-Head Self-Attention Networks : Abstract: This study addresses the problem of dynamic anomaly detection in accounting transactions and proposes a real-time detection method based on a Transformer to tackle the challenges of hidden a...
HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning : Abstract: In cooperative Multi-Agent Reinforcement Learning (MARL), efficient exploration is crucial for optimizing the performance of joint policy. However, existing methods often update joint polici...
FairGSE: Fairness-Aware Graph Neural Network without High False Positive Rates : Abstract: Graph neural networks (GNNs) have emerged as the mainstream paradigm for graph representation learning due to their effective message aggregation. However, this advantage also amplifies bias...
Fusion-ResNet: A Lightweight multi-label NILM Model Using PCA-ICA Feature Fusion : Abstract: Non-intrusive load monitoring (NILM) is an advanced load monitoring technique that uses data-driven algorithms to disaggregate the total power consumption of a household into the consumption...
Variation-Bounded Loss for Noise-Tolerant Learning : Abstract: Mitigating the negative impact of noisy labels has been aperennial issue in supervised learning. Robust loss functions have emerged as a prevalent solution to this problem. In this work, we ...
Finding Time Series Anomalies using Granular-ball Vector Data Description : Abstract: Modeling normal behavior in dynamic, nonlinear time series data is challenging for effective anomaly detection. Traditional methods, such as nearest neighbor and clustering approaches, often...
Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions : Abstract: We introduced a multimodal foundational model for financial transactions that integrates both structured attributes and unstructured textual descriptions into a unified representation. By ad...
Rethinking Deep Alignment Through The Lens Of Incomplete Learning : Abstract: Large language models exhibit systematic vulnerabilities to adversarial attacks despite extensive safety alignment. We provide a mechanistic analysis revealing that position-dependent gradie...
Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis : Abstract: Many bioacoustics, neuroscience, and linguistics research utilize birdsongs as proxy models to acquire knowledge in diverse areas. Developing models generally requires precisely annotated da...
FGM optimization in complex domains using Gaussian process regression based profile generation algorithm : Abstract: This manuscript addresses the challenge of designing functionally graded materials (FGMs) for arbitrary-shaped domains. Towards this goal, the present work proposes a generic volume fraction...
TSGDiff: Rethinking Synthetic Time Series Generation from a Pure Graph Perspective : Abstract: Diffusion models have shown great promise in data generation, yet generating time series data remains challenging due to the need to capture complex temporal dependencies and structural patt...
Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering : Abstract: Contrastive learning has emerged as a cornerstone of unsupervised representation learning across vision, language, and graph domains, with InfoNCE as its dominant objective. Despite its empi...
Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size? : Abstract: The recent success of large language models (LLMs) has sparked a growing interest in training large-scale models. As the model size continues to scale, concerns are growing about the depleti...
Evaluation of Multi- and Single-objective Learning Algorithms for Imbalanced Data : Abstract: Many machine learning tasks aim to find models that work well not for a single, but for a group of criteria, often opposing ones. One such example is imbalanced data classification, where, o...
MPD-SGR: Robust Spiking Neural Networks with Membrane Potential Distribution-Driven Surrogate Gradient Regularization : Abstract: The surrogate gradient (SG) method has shown significant promise in enhancing the performance of deep spiking neural networks (SNNs), but it also introduces vulnerabilities to adversarial at...
AlignTree: Efficient Defense Against LLM Jailbreak Attacks : Abstract: Large Language Models (LLMs) are vulnerable to adversarial attacks that bypass safety guidelines and generate harmful content. Mitigating these vulnerabilities requires defense mechanisms th...
Chicken Swarm Kernel Particle Filter: A Structured Rejuvenation Approach with KLD-Efficient Sampling : Abstract: Particle filters (PFs) are often combined with swarm intelligence (SI) algorithms, such as Chicken Swarm Optimization (CSO), for particle rejuvenation. Separately, Kullback--Leibler divergen...
SCI: An Equilibrium for Signal Intelligence : Abstract: We present SCI, a closed-loop, control-theoretic framework that models interpretability as a regulated state. SCI formalizes the interpretive error Delta SP and actively drives SP(t) in [0, ...
Cross-view Joint Learning for Mixed-Missing Multi-view Unsupervised Feature Selection : Abstract: Incomplete multi-view unsupervised feature selection (IMUFS), which aims to identify representative features from unlabeled multi-view data containing missing values, has received growing at...
Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks : Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has ...
MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing : Abstract: Large AI models have been widely adopted in wireless communications for channel modeling, beamforming, and resource optimization. However, most existing efforts remain limited to single-moda...
Optimal Self-Consistency for Efficient Reasoning with Large Language Models : Abstract: Self-consistency (SC) is a widely used test-time inference technique for improving performance in chain-of-thought reasoning. It involves generating multiple responses, or samples from a lar...
Active Learning of Symbolic Automata Over Rational Numbers : Abstract: Automata learning has many applications in artificial intelligence and software engineering. Central to these applications is the $L^*$ algorithm, introduced by Angluin. The $L^*$ algorithm ...
BlinDNO: A Distributional Neural Operator for Dynamical System Reconstruction from Time-Label-Free data : Abstract: We study an inverse problem for stochastic and quantum dynamical systems in a time-label-free setting, where only unordered density snapshots sampled at unknown times drawn from an observati...
LILogic Net: Compact Logic Gate Networks with Learnable Connectivity for Efficient Hardware Deployment : Abstract: Efficient deployment of machine learning models ultimately requires taking hardware constraints into account. The binary logic gate is the fundamental building block of all digital chips. De...
Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach : Abstract: Detecting anomalies in multivariate time series is essential for monitoring complex industrial systems, where high dimensionality, limited labeled data, and subtle dependencies between senso...
BitSnap: Checkpoint Sparsification and Quantization in LLM Training : Abstract: As large language models (LLMs) continue to grow in size and complexity, efficient checkpoint saving\&loading has become crucial for managing storage, memory usage, and fault tolerance in LL...
CEDL: Centre-Enhanced Discriminative Learning for Anomaly Detection : Abstract: Supervised anomaly detection methods perform well in identifying known anomalies that are well represented in the training set. However, they often struggle to generalise beyond the training...
On the Dimension-Free Approximation of Deep Neural Networks for Symmetric Korobov Functions : Abstract: Deep neural networks have been widely used as universal approximators for functions with inherent physical structures, including permutation symmetry. In this paper, we construct symmetric d...
Interpretable Fine-Gray Deep Survival Model for Competing Risks: Predicting Post-Discharge Foot Complications for Diabetic Patients in Ontario : Abstract: Model interpretability is crucial for establishing AI safety and clinician trust in medical applications for example, in survival modelling with competing risks. Recent deep learning models ...
The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models : Abstract: Backdoor attacks on large language models (LLMs) typically couple a secret trigger to an explicit malicious output. We show that this explicit association is unnecessary for common LLMs. We ...
Integrating Neural Differential Forecasting with Safe Reinforcement Learning for Blood Glucose Regulation : Abstract: Automated insulin delivery for Type 1 Diabetes must balance glucose control and safety under uncertain meals and physiological variability. While reinforcement learning (RL) enables adaptive...
Tailored Primitive Initialization is the Secret Key to Reinforcement Learning : Abstract: Reinforcement learning (RL) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). While RL has demonstrated substantial performance gai...
VISAGNN: Versatile Staleness-Aware Efficient Training on Large-Scale Graphs : Abstract: Graph Neural Networks (GNNs) have shown exceptional success in graph representation learning and a wide range of real-world applications. However, scaling deeper GNNs poses challenges due to...
Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction : Abstract: Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, a...
Personality-guided Public-Private Domain Disentangled Hypergraph-Former Network for Multimodal Depression Detection : Abstract: Depression represents a global mental health challenge requiring efficient and reliable automated detection methods. Current Transformer- or Graph Neural Networks (GNNs)-based multimodal dep...
Redundancy-optimized Multi-head Attention Networks for Multi-View Multi-Label Feature Selection : Abstract: Multi-view multi-label data offers richer perspectives for artificial intelligence, but simultaneously presents significant challenges for feature selection due to the inherent complexity of...
Logarithmic Regret and Polynomial Scaling in Online Multi-step-ahead Prediction : Abstract: This letter studies the problem of online multi-step-ahead prediction for unknown linear stochastic systems. Using conditional distribution theory, we derive an optimal parameterization of t...
Diffusion Model Based Signal Recovery Under 1-Bit Quantization : Abstract: Diffusion models (DMs) have demonstrated to be powerful priors for signal recovery, but their application to 1-bit quantization tasks, such as 1-bit compressed sensing and logistic regressio...
SculptDrug : A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design : Abstract: Structure-Based drug design (SBDD) has emerged as a popular approach in drug discovery, leveraging three-dimensional protein structures to generate drug ligands. However, existing generative...
Uncover and Unlearn Nuisances: Agnostic Fully Test-Time Adaptation : Abstract: Fully Test-Time Adaptation (FTTA) addresses domain shifts without access to source data and training protocols of the pre-trained models. Traditional strategies that align source and target ...
Towards Better IncomLDL: We Are Unaware of Hidden Labels in Advance : Abstract: Label distribution learning (LDL) is a novel paradigm that describe the samples by label distribution of a sample. However, acquiring LDL dataset is costly and time-consuming, which leads to...
BSO: Binary Spiking Online Optimization Algorithm : Abstract: Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead ...
Hierarchical Frequency-Decomposition Graph Neural Networks for Road Network Representation Learning : Abstract: Road networks are critical infrastructures underpinning intelligent transportation systems and their related applications. Effective representation learning of road networks remains challeng...
Spectral Bias Mitigation via xLSTM-PINN: Memory-Gated Representation Refinement for Physics-Informed Learning : Abstract: Physics-informed learning for PDEs is surging across scientific computing and industrial simulation, yet prevailing methods face spectral bias, residual-data imbalance, and weak extrapolatio...
Regret Guarantees for Linear Contextual Stochastic Shortest Path : Abstract: We define the problem of linear Contextual Stochastic Shortest Path (CSSP), where at the beginning of each episode, the learner observes an adversarially chosen context that determines the M...
Center-Outward q-Dominance: A Sample-Computable Proxy for Strong Stochastic Dominance in Multi-Objective Optimisation : Abstract: Stochastic multi-objective optimization (SMOOP) requires ranking multivariate distributions; yet, most empirical studies perform scalarization, which loses information and is unreliable. Bas...
CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching : Abstract: First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector pr...
Training Instabilities Induce Flatness Bias in Gradient Descent : Abstract: Classical analyses of gradient descent (GD) define a stability threshold based on the largest eigenvalue of the loss Hessian, often termed sharpness. When the learning rate lies below this t...
Softmax as a Lagrangian-Legendrian Seam : Abstract: This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric inte...
LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora : Abstract: Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. ...
Detecting Statistically Significant Fairness Violations in Recidivism Forecasting Algorithms : Abstract: Machine learning algorithms are increasingly deployed in critical domains such as finance, healthcare, and criminal justice [1]. The increasing popularity of algorithmic decision-making has ...
DAOpt: Modeling and Evaluation of Data-Driven Optimization under Uncertainty with LLMs : Abstract: Recent advances in large language models (LLMs) have accelerated research on automated optimization modeling. While real-world decision-making is inherently uncertain, most existing work has...
Decoupling Positional and Symbolic Attention Behavior in Transformers : Abstract: An important aspect subtending language understanding and production is the ability to independently encode positional and symbolic information of the words within a sentence. In Transformer...
The Anatomy of a Triton Attention Kernel : Abstract: A long-standing goal in both industry and academia is to develop an LLM inference platform that is portable across hardware architectures, eliminates the need for low-level hand-tuning, and ...
Parallel and Multi-Stage Knowledge Graph Retrieval for Behaviorally Aligned Financial Asset Recommendations : Abstract: Large language models (LLMs) show promise for personalized financial recommendations but are hampered by context limits, hallucinations, and a lack of behavioral grounding. Our prior work, F...
Output Supervision Can Obfuscate the Chain of Thought : Abstract: OpenAI (2025) showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monito...
Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge : Abstract: Large generative models (for example, language and diffusion models) enable high-quality text and image synthesis but are hard to train or adapt in cross-device federated settings due to hea...
WildfireGenome: Interpretable Machine Learning Reveals Local Drivers of Wildfire Risk and Their Cross-County Variation : Abstract: Current wildfire risk assessments rely on coarse hazard maps and opaque machine learning models that optimize regional accuracy while sacrificing interpretability at the decision scale. Wild...
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL : Abstract: Maximum entropy has become a mainstream off-policy reinforcement learning (RL) framework for balancing exploitation and exploration. However, two bottlenecks still limit further performance ...
Sound Logical Explanations for Mean Aggregation Graph Neural Networks : Abstract: Graph neural networks (GNNs) are frequently used for knowledge graph completion. Their black-box nature has motivated work that uses sound logical rules to explain predictions and characteri...
Loss Given Default Prediction Under Measurement-Induced Mixture Distributions: An Information-Theoretic Approach : Abstract: Loss Given Default (LGD) modeling faces a fundamental data quality constraint: 90% of available training data consists of proxy estimates based on pre-distress balance sheets rather than act...
Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part A: Stochastic Stability in Non-zero-Sum Games : Abstract: Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. ...
Enhancing failure prediction in nuclear industry: Hybridization of knowledge- and data-driven techniques : Abstract: The convergence of the Internet of Things (IoT) and Industry 4.0 has significantly enhanced data-driven methodologies within the nuclear industry, notably enhancing safety and economic effic...
Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning : Abstract: Reinforcement learning (RL) has made significant advancements, achieving superhuman performance in various tasks. However, RL agents often operate under the assumption of environmental stati...
Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models : Abstract: Tokenization and transfer learning are two critical components in building state of the art time series foundation models for forecasting. In this work, we systematically study the effect of...
Early GVHD Prediction in Liver Transplantation via Multi-Modal Deep Learning on Imbalanced EHR Data : Abstract: Graft-versus-host disease (GVHD) is a rare but often fatal complication in liver transplantation, with a very high mortality rate. By harnessing multi-modal deep learning methods to integrat...
MedFedPure: A Medical Federated Framework with MAE-based Detection and Diffusion Purification for Inference-Time Attacks : Abstract: Artificial intelligence (AI) has shown great potential in medical imaging, particularly for brain tumor detection using Magnetic Resonance Imaging (MRI). However, the models remain vulnerabl...
SA-EMO: Structure-Aligned Encoder Mixture of Operators for Generalizable Full-waveform Inversion : Abstract: Full-waveform inversion (FWI) can produce high-resolution subsurface models, yet it remains inherently ill-posed, highly nonlinear, and computationally intensive. Although recent deep learni...
Global Feature Enhancing and Fusion Framework for Strain Gauge Time Series Classification : Abstract: Strain Gauge Status (SGS) recognition is crucial in the field of intelligent manufacturing based on the Internet of Things, as accurate identification helps timely detection of failed mechan...
Predicting Grain Growth in Polycrystalline Materials Using Deep Learning Time Series Models : Abstract: Grain Growth strongly influences the mechanical behavior of materials, making its prediction a key objective in microstructural engineering. In this study, several deep learning approaches w...
Toward Better Generalization in Few-Shot Learning through the Meta-Component Combination : Abstract: In few-shot learning, classifiers are expected to generalize to unseen classes given only a small number of instances of each new class. One of the popular solutions to few-shot learning is ...
An Explainable and Fair AI Tool for PCOS Risk Assessment: Calibration, Subgroup Equity, and Interactive Clinical Deployment : Abstract: This paper presents a fairness-audited and interpretable machine learning framework for predicting polycystic ovary syndrome (PCOS), designed to evaluate model performance and identify diagn...
Enhancing PINN Accuracy for the RLW Equation: Adaptive and Conservative Approaches : Abstract: Standard physics-informed neural network implementations have produced large error rates when using these models to solve the regularized long wave (RLW) equation. Two improved PINN approach...
EcoSpa: Efficient Transformer Training with Coupled Sparsity : Abstract: Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges. While sparse training offers efficiency gains, existing methods fail...
A Deep Learning Model to Predicting Changes in Consumer Attributes for New Line-extended Products : Abstract: Product line extension is a marketing strategy that enhances a company's sphere of influence. Because excessive line extensions disrupt brand image, only appropriate line extensions based on...
Environment-Aware Transfer Reinforcement Learning for Sustainable Beam Selection : Abstract: This paper presents a novel and sustainable approach for improving beam selection in 5G and beyond networks using transfer learning and Reinforcement Learning (RL). Traditional RL-based beam...
Lightweight Time Series Data Valuation on Time Series Foundation Models via In-Context Finetuning : Abstract: Time series foundation models (TSFMs) have demonstrated increasing capabilities due to their extensive pretraining on large volumes of diverse time series data. Consequently, the quality of ...
Enhanced Water Leak Detection with Convolutional Neural Networks and One-Class Support Vector Machine : Abstract: Water is a critical resource that must be managed efficiently. However, a substantial amount of water is lost each year due to leaks in Water Distribution Networks (WDNs). This underscores t...
Incomplete Depression Feature Selection with Missing EEG Channels : Abstract: As a critical mental health disorder, depression has severe effects on both human physical and mental well-being. Recent developments in EEG-based depression analysis have shown promise in i...
How many stations are sufficient? Exploring the effect of urban weather station density reduction on imputation accuracy of air temperature and humidity : Abstract: Urban weather station networks (WSNs) are widely used to monitor urban weather and climate patterns and aid urban planning. However, maintaining WSNs is expensive and labor-intensive. Here, ...
Convergence of Multiagent Learning Systems for Traffic control : Abstract: Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often mo...
On the Probabilistic Learnability of Compact Neural Network Preimage Bounds : Abstract: Although recent provable methods have been developed to compute preimage bounds for neural networks, their scalability is fundamentally limited by the #P-hardness of the problem. In this wor...
SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization : Abstract: The emergence of accurate open large language models (LLMs) has sparked a push for advanced quantization techniques to enable efficient deployment on end-user devices. In this paper, we revi...
Clifford Algebraic Rotor Embeddings : Maybe embeddings should start to CARE : Abstract: Rotary Positional Embeddings (RoPE) have demonstrated exceptional performance as a positional encoding method, consistently outperforming their baselines. While recent work has sought to ext...
Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks : Abstract: Bayesian neural networks (BNNs) require scalable sampling algorithms to approximate posterior distributions over parameters. Existing stochastic gradient Markov Chain Monte Carlo (SGMCMC) me...
Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion : Abstract: Machine unlearning, which selectively removes harmful knowledge from a pre-trained model without retraining from scratch, is crucial for addressing privacy, regulatory compliance, and ethica...
Do traveling waves make good positional encodings? : Abstract: Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positio...
H-Model: Dynamic Neural Architectures for Adaptive Processing : Abstract: This article explores the design and experimentation of a neural network architecture capable of dynamically adjusting its internal structure based on the input data. The proposed model intr...
Evaluation of LLM-based Explanations for a Learning Analytics Dashboard : Abstract: Learning Analytics Dashboards can be a powerful tool to support self-regulated learning in Digital Learning Environments and promote development of meta-cognitive skills, such as reflection....
Synergistic Feature Fusion for Latent Lyrical Classification: A Gated Deep Learning Architecture : Abstract: This study addresses the challenge of integrating complex, high-dimensional deep semantic features with simple, interpretable structural cues for lyrical content classification. We introduce...
Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff : Abstract: As a widely adopted model compression technique, model pruning has demonstrated strong effectiveness across various architectures. However, we observe that when sparsity exceeds a certain th...
Learning with Preserving for Continual Multitask Learning : Abstract: Artificial intelligence systems in critical fields like autonomous driving and medical imaging analysis often continually learn new tasks using a shared stream of input data. For instance, a...
Homotopy-Guided Self-Supervised Learning of Parametric Solutions for AC Optimal Power Flow : Abstract: Learning to optimize (L2O) parametric approximations of AC optimal power flow (AC-OPF) solutions offers the potential for fast, reusable decision-making in real-time power system operations....
A neural optimization framework for free-boundary diffeomorphic mapping problems and its applications : Abstract: Free-boundary diffeomorphism optimization is a core ingredient in the surface mapping problem but remains notoriously difficult because the boundary is unconstrained and local bijectivity mu...
Probabilistic Wildfire Susceptibility from Remote Sensing Using Random Forests and SHAP : Abstract: Wildfires pose a significant global threat to ecosystems worldwide, with California experiencing recurring fires due to various factors, including climate, topographical features, vegetation...
MPCM-Net: Multi-scale network integrates partial attention convolution with Mamba for ground-based cloud image segmentation : Abstract: Ground-based cloud image segmentation is a critical research domain for photovoltaic power forecasting. Current deep learning approaches primarily focus on encoder-decoder architectural refi...
Stratified Knowledge-Density Super-Network for Scalable Vision Transformers : Abstract: Training and deploying multiple vision transformer (ViT) models for different resource constraints is costly and inefficient. To address this, we propose transforming a pre-trained ViT into ...
A Bayesian Model for Multi-stage Censoring : Abstract: Many sequential decision settings in healthcare feature funnel structures characterized by a series of stages, such as screenings or evaluations, where the number of patients who advance to ...
R-Tuning: Wavelet-Decomposed Replay and Semantic Alignment for Continual Adaptation of Pretrained Time-Series Models : Abstract: Pre-trained models have demonstrated exceptional generalization capabilities in time-series forecasting; however, adapting them to evolving data distributions remains a significant challenge...
Regularized Schr\"odinger: Alleviating Distortion and Exposure Bias in Solving Inverse Problems : Abstract: Diffusion models serve as a powerful generative framework for solving inverse problems. However, they still face two key challenges: 1) the distortion-perception tradeoff, where improving pe...
Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling : Abstract: Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this pro...
Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models : Abstract: Test-time prompt tuning for vision-language models has demonstrated impressive generalization capabilities under zero-shot settings. However, tuning the learnable prompts solely based on unl...
Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues : Abstract: Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram...
AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation : Abstract: Optimization-based text-to-3D methods distill guidance from 2D generative models via Score Distillation Sampling (SDS), but implicitly treat this guidance as static. This work shows that ign...
Toward Dignity-Aware AI: Next-Generation Elderly Monitoring from Fall Detection to ADL : Abstract: This position paper envisions a next-generation elderly monitoring system that moves beyond fall detection toward the broader goal of Activities of Daily Living (ADL) recognition. Our ultima...
Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification : Abstract: We present MatUQ, a benchmark framework for evaluating graph neural networks (GNNs) on out-of-distribution (OOD) materials property prediction with uncertainty quantification (UQ). MatUQ com...
Moirai 2.0: When Less Is More for Time Series Forecasting : Abstract: We introduce Moirai 2.0, a decoder-only time-series foundation model trained on a new corpus of 36M series. The model adopts quantile forecasting and multi-token prediction, improving both p...
Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification : Abstract: Robustness verification is a promising technique for rigorously proving Recurrent Neural Networks (RNNs) robustly. A key challenge is to over-approximate the nonlinear activation functions w...
Bayesian Neural Networks with Monte Carlo Dropout for Probabilistic Electricity Price Forecasting : Abstract: Accurate electricity price forecasting is critical for strategic decision-making in deregulated electricity markets, where volatility stems from complex supply-demand dynamics and external f...
Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom : Abstract: Reinforcement learning (RL) in 3D environments with high-dimensional sensory input poses two major challenges: (1) the high memory consumption induced by memory buffers required to stabilise...
Simple Vision-Language Math Reasoning via Rendered Text : Abstract: We present a lightweight yet effective pipeline for training vision-language models to solve math problems by rendering LaTeX encoded equations into images and pairing them with structured c...
Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs : Abstract: This paper determines the extent to which short textual inputs (in this case, names of dishes) can improve calorie estimation compared to an image-only baseline model and whether any improve...
Context-Aware Multimodal Representation Learning for Spatio-Temporally Explicit Environmental modelling : Abstract: Earth observation (EO) foundation models have emerged as an effective approach to derive latent representations of the Earth system from various remote sensing sensors. These models produce ...
FSC-Net: Fast-Slow Consolidation Networks for Continual Learning : Abstract: Continual learning remains challenging due to catastrophic forgetting, where neural networks lose previously acquired knowledge when learning new tasks. Inspired by memory consolidation in n...
Which Sparse Autoencoder Features Are Real? Model-X Knockoffs for False Discovery Rate Control : Abstract: Although sparse autoencoders (SAEs) are crucial for identifying interpretable features in neural networks, it is still challenging to distinguish between real computational patterns and erro...
Reasoning: From Reflection to Solution : Abstract: What is reasoning? This question has driven centuries of philosophical inquiry, from Aristotle's syllogisms to modern computational complexity theory. In the age of large language models ach...

Research Sources: 877 | Generated: 11/18/2025