AI RESEARCH PAPERS & ACADEMIC SOURCES
- FastVGGT: Training-Free Acceleration of Visual Geometry Transformer : Abstract: Foundation models for 3D vision have recently demonstrated remarkable capabilities in 3D perception. However, scaling these models to long-sequence image inputs remains a significant challen...
- UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning : Abstract: Video Scene Graph Generation (VidSGG) aims to represent dynamic visual content by detecting objects and modeling their temporal interactions as structured graphs. Prior studies typically tar...
- Evaluating BM3D and NBNet: A Comprehensive Study of Image Denoising Across Multiple Datasets : Abstract: This paper investigates image denoising, comparing traditional non-learning-based techniques, represented by Block-Matching 3D (BM3D), with modern learning-based methods, exemplified by NBNe...
- MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting : Abstract: Real-time SLAM with dense 3D mapping is computationally challenging, especially on resource-limited devices. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approac...
- Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning : Abstract: The classification of medical images is a pivotal aspect of disease diagnosis, often enhanced by deep learning techniques. However, traditional approaches typically focus on unimodal medical...
- Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked Diffusion : Abstract: Diffusion model shows remarkable potential on sparse-view computed tomography (SVCT) reconstruction. However, when a network is trained on a limited sample space, its generalization capabili...
- MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment : Abstract: Propelled by the breakthrough in deep generative models, audio-to-image generation has emerged as a pivotal cross-modal task that converts complex auditory signals into rich visual represent...
- Distilling Diversity and Control in Diffusion Models : Abstract: Distilled diffusion models generate images in far fewer timesteps but suffer from reduced sample diversity when generating multiple outputs from the same prompt. To understand this phenomeno...
- ACT-R: Adaptive Camera Trajectories for Single View 3D Reconstruction : Abstract: We introduce the simple idea of adaptive view planning to multi-view synthesis, aiming to improve both occlusion revelation and 3D consistency for single-view 3D reconstruction. Instead of p...
- A Lightweight Complex-Valued Deformable CNN for High-Quality Computer-Generated Holography : Abstract: Holographic displays have significant potential in virtual reality and augmented reality owing to their ability to provide all the depth cues. Deep learning-based methods play an important r...
- Onboard Hyperspectral Super-Resolution with Deep Pushbroom Neural Network : Abstract: Hyperspectral imagers on satellites obtain the fine spectral signatures essential for distinguishing one material from another at the expense of limited spatial resolution. Enhancing the lat...
- HarmoQ: Harmonized Post-Training Quantization for High-Fidelity Image : Abstract: Post-training quantization offers an efficient pathway to deploy super-resolution models, yet existing methods treat weight and activation quantization independently, missing their critical ...
- EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion : Abstract: Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are...
- Towards a Humanized Social-Media Ecosystem: AI-Augmented HCI Design Patterns for Safety, Agency & Well-Being : Abstract: Social platforms connect billions of people, yet their engagement-first algorithms often work on users rather than with them, amplifying stress, misinformation, and a loss of control. We pro...
- Pinching Visuo-haptic Display: Investigating Cross-Modal Effects of Visual Textures on Electrostatic Cloth Tactile Sensations : Abstract: This paper investigates how visual texture presentation influences tactile perception when interacting with electrostatic cloth displays. We propose a visuo-haptic system that allows users t...
- Identity Card Presentation Attack Detection: A Systematic Review : Abstract: Remote identity verification is essential for modern digital security; however, it remains highly vulnerable to sophisticated Presentation Attacks (PAs) that utilise forged or manipulated id...
- ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects : Abstract: Robots operating in real-world environments frequently encounter unknown objects with complex structures and articulated components, such as doors, drawers, cabinets, and tools. The ability ...
- Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression : Abstract: While zero-shot diffusion-based compression methods have seen significant progress in recent years, they remain notoriously slow and computationally demanding. This paper presents an efficie...
- A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving : Abstract: Vision Language Models (VLMs) are increasingly used in autonomous driving to help understand traffic scenes, but they sometimes produce hallucinations, which are false details not grounded i...
- Semi-distributed Cross-modal Air-Ground Relative Localization : Abstract: Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in th...
- Hierarchical Spatial-Frequency Aggregation for Spectral Deconvolution Imaging : Abstract: Computational spectral imaging (CSI) achieves real-time hyperspectral imaging through co-designed optics and algorithms, but typical CSI methods suffer from a bulky footprint and limited fid...
- SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation : Abstract: Inspired by how humans reason over discrete objects and their relationships, we explore whether compact object-centric and object-relation representations can form a foundation for multitask...
- RRTS Dataset: A Benchmark Colonoscopy Dataset from Resource-Limited Settings for Computer-Aided Diagnosis Research : Abstract: Background and Objective: Colorectal cancer prevention relies on early detection of polyps during colonoscopy. Existing public datasets, such as CVC-ClinicDB and Kvasir-SEG, provide valuable...
- Vision-Based System Identification of a Quadrotor : Abstract: This paper explores the application of vision-based system identification techniques in quadrotor modeling and control. Through experiments and analysis, we address the complexities and limi...
- TauFlow: Dynamic Causal Constraint for Complexity-Adaptive Lightweight Segmentation : Abstract: Deploying lightweight medical image segmentation models on edge devices presents two major challenges: 1) efficiently handling the stark contrast between lesion boundaries and background reg...
- Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models : Abstract: Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cos...
- Task-Adaptive Low-Dose CT Reconstruction : Abstract: Deep learning-based low-dose computed tomography reconstruction methods already achieve high performance on standard image quality metrics like peak signal-to-noise ratio and structural simi...
- Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models : Abstract: Large language models (LLMs) have recently achieved impressive results in speech recognition across multiple modalities, including Auditory Speech Recognition (ASR), Visual Speech Recognitio...
- CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video : Abstract: The prevalence of user-generated content (UGC) on platforms such as YouTube and TikTok has rendered no-reference (NR) perceptual video quality assessment (VQA) vital for optimizing video del...
- PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving : Abstract: Most recent work in autonomous driving has prioritized benchmark performance and methodological innovation over in-depth analysis of model failures, biases, and shortcut learning. This has l...
- Verifying rich robustness properties for neural networks : Abstract: Robustness is a important problem in AI alignment and safety, with models such as neural networks being increasingly used in safety-critical systems. In the last decade, a large body of work...
- Robot Learning from a Physical World Model : Abstract: We introduce PhysWorld, a framework that enables robot learning from video generation through physical world modeling. Recent video generation models can synthesize photorealistic visual dem...
- Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields : Abstract: Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics. We present Lightning Grasp, a novel hig...
- Intelligent Sampling Consensus for Homography Estimation in Football Videos Using Featureless Unpaired Points : Abstract: Estimating the homography matrix between images captured under radically different camera poses and zoom factors is a complex challenge. Traditional methods rely on the Random Sample Consens...
- HyCTAS: Multi-Objective Hybrid Convolution-Transformer Architecture Search for Real-Time Image Segmentation : Abstract: Real-time image segmentation demands architectures that preserve fine spatial detail while capturing global context under tight latency and memory budgets. Image segmentation is one of the m...
- SkinCaRe: A Multimodal Dermatology Dataset Annotated with Medical Caption and Chain-of-Thought Reasoning : Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision large language models (VLLMs), in skin disease diagnosis, the need for interpretab...
- The Wisdom of a Crowd of Brains: A Universal Brain Encoder : Abstract: Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, th...
- DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation : Abstract: Accurate segmentation of ultrasound images is essential for reliable medical diagnoses but is challenged by poor image quality and scarce labeled data. Prior approaches have relied on manual...
- LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation : Abstract: Semantic segmentation of large-scale 3D landscape meshes is critical for geospatial analysis in complex environments, yet existing approaches face persistent challenges of scalability, end-t...
- STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences : Abstract: Self-supervised pretraining methods with masked prediction demonstrate remarkable within-dataset performance in skeleton-based action recognition. However, we show that, unlike contrastive l...
- Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data : Abstract: Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-tim...
- Improving Contactless Fingerprint Recognition with Robust 3D Feature Extraction and Graph Embedding : Abstract: Contactless fingerprint has gained lots of attention in recent fingerprint studies. However, most existing contactless fingerprint algorithms treat contactless fingerprints as 2D plain finge...
- Multi-Scale Fusion for Object Representation : Abstract: Representing images or videos as object-level feature vectors, rather than pixel-level feature maps, facilitates advanced visual tasks. Object-Centric Learning (OCL) primarily achieves this ...
- Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment : Abstract: Training AI foundation models has emerged as a promising large-scale learning approach for addressing real-world healthcare challenges, including digital pathology. While many of these model...
- Incomplete Multi-view Multi-label Classification via a Dual-level Contrastive Learning Framework : Abstract: Recently, multi-view and multi-label classification have become significant domains for comprehensive data analysis and exploration. However, incompleteness both in views and labels is still...
- Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis : Abstract: Scene-aware motion synthesis has been widely researched recently due to its numerous applications. Prevailing methods rely heavily on paired motion-scene data, while it is difficult to gener...
- MutualVPR: A Mutual Learning Framework for Resolving Supervision Inconsistencies via Adaptive Clustering : Abstract: Visual Place Recognition (VPR) enables robust localization through image retrieval based on learned descriptors. However, drastic appearance variations of images at the same place caused b...
- EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing : Abstract: Editing complex visual content from ambiguous or partially specified instructions remains a core challenge in vision-language modeling. Existing models can contextualize content but often fa...
- Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment : Abstract: As super-resolution (SR) techniques introduce unique distortions that fundamentally differ from those caused by traditional degradation processes (e.g., compression), there is an increasing ...
- Towards Visual Grounding: A Survey : Abstract: Visual Grounding, also known as Referring Expression Comprehension and Phrase Grounding, aims to ground the specific region(s) within the image(s) based on the given expression text. This ta...
- SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection : Abstract: With the rapid advancement of remote sensing technology, high-resolution multi-modal imagery is now more widely accessible. Conventional Object detection models are trained on a single datas...
- LWGANet: Addressing Spatial and Channel Redundancy in Remote Sensing Visual Tasks with Light-Weight Grouped Attention : Abstract: Light-weight neural networks for remote sensing (RS) visual analysis must overcome two inherent redundancies: spatial redundancy from vast, homogeneous backgrounds, and channel redundancy, w...
- Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain : Abstract: Enabling humanoid robots to synthesize complex, physically coherent motions from natural language commands is a cornerstone of autonomous robotics and human-robot interaction. While diffusio...
- Environment-Driven Online LiDAR-Camera Extrinsic Calibration : Abstract: LiDAR-camera extrinsic calibration (LCEC) is crucial for multi-modal data fusion in autonomous robotic systems. Existing methods, whether target-based or target-free, typically rely on custo...
- FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion : Abstract: Concept blending is a promising yet underexplored area in generative models. While recent approaches, such as embedding mixing and latent modification based on structural sketches, have been...
- Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization : Abstract: We present ATOP (Articulate That Object Part), a novel few-shot method based on motion personalization to articulate a static 3D object with respect to a part and its motion as prescribed in...
- Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance : Abstract: 3D Semantic Scene Completion (SSC) provides comprehensive scene geometry and semantics for autonomous driving perception, which is crucial for enabling accurate and reliable decision-making....
- Role Bias in Diffusion Models: Diagnosing and Mitigating through Intermediate Decomposition : Abstract: Text-to-image (T2I) diffusion models exhibit impressive photorealistic image generation capabilities, yet they struggle in compositional image generation. In this work, we introduce RoleBenc...
- Distilling 3D distinctive local descriptors for 6D pose estimation : Abstract: Three-dimensional local descriptors are crucial for encoding geometric surface properties, making them essential for various point cloud understanding tasks. Among these descriptors, GeDi ha...
- Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Sign Language and Fingerspelling Recognition : Abstract: Hand gesture-based Sign Language Recognition (SLR) serves as a crucial communication bridge between deaf and non-deaf individuals. While Graph Convolutional Networks (GCNs) are common, they ...
- MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation : Abstract: Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning. While recent multimodal large language models (MLLMs) have demonstrated i...
- LangBridge: Interpreting Image as a Combination of Language Embeddings : Abstract: Recent years have witnessed remarkable advances in Large Vision-Language Models (LVLMs), which have achieved human-level performance across various complex vision-language tasks. Following L...
- Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models : Abstract: Vision foundation models (VFMs) have demonstrated remarkable capabilities in learning universal visual representations. However, adapting these models to downstream tasks conventionally requ...
- AGO: Adaptive Grounding for Open World 3D Occupancy Prediction : Abstract: Open-world 3D semantic occupancy prediction aims to generate a voxelized 3D representation from sensor inputs while recognizing both known and unknown objects. Transferring open-vocabulary k...
- Enhanced Partially Relevant Video Retrieval through Inter- and Intra-Sample Analysis with Coherence Prediction : Abstract: Partially Relevant Video Retrieval (PRVR) aims to retrieve the target video that is partially relevant to the text query. The primary challenge in PRVR arises from the semantic asymmetry bet...
- DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes : Abstract: By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptatio...
- Descriptive Image-Text Matching with Graded Contextual Similarity : Abstract: Image-text matching aims to build correspondences between visual and textual data by learning their pairwise similarities. Most existing approaches have adopted sparse binary supervision, in...
- PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment : Abstract: Transparent image layer generation plays a significant role in digital art and design workflows. Existing methods typically decompose transparent layers from a single RGB image using a set o...
- DNOI-4DRO: Deep 4D Radar Odometry with Differentiable Neural-Optimization Iterations : Abstract: A novel learning-optimization-combined 4D radar odometry model, named DNOI-4DRO, is proposed in this paper. The proposed model seamlessly integrates traditional geometric optimization with e...
- TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis : Abstract: Text-embedded image generation plays a critical role in industries such as graphic design, advertising, and digital content creation. Text-to-Image generation methods leveraging diffusion mo...
- Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation : Abstract: Recently, test-time adaptation has attracted wide interest in the context of vision-language models for image classification. However, to the best of our knowledge, the problem is completely...
- VideoCAD: A Dataset and Model for Learning Long-Horizon 3D CAD UI Interactions from Video : Abstract: Computer-Aided Design (CAD) is a time-consuming and complex process, requiring precise, long-horizon user interactions with intricate 3D interfaces. While recent advances in AI-driven user i...
- FaceSleuth-R: Adaptive Orientation-Aware Attention for Robust Micro-Expression Recognition : Abstract: Micro-expression recognition (MER) has achieved impressive accuracy in controlled laboratory settings. However, its real-world applicability faces a significant generalization cliff, severel...
- Bridging Weakly-Supervised Learning and VLM Distillation: Noisy Partial Label Learning for Efficient Downstream Adaptation : Abstract: In the context of noisy partial label learning (NPLL), each training sample is associated with a set of candidate labels annotated by multiple noisy annotators. With the emergence of high-pe...
- LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation : Abstract: Most of the current top-down multi-person pose estimation lightweight methods are based on multi-branch parallel pure CNN network architecture, which often struggle to capture the global con...
- Bidirectional Image-Event Guided Fusion Framework for Low-Light Image Enhancement : Abstract: Under extreme low-light conditions, frame-based cameras suffer from severe detail loss due to limited dynamic range. Recent studies have introduced event cameras for event-guided low-light i...
- Sekai: A Video Dataset towards World Exploration : Abstract: Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited fo...
- Fine-grained Image Retrieval via Dual-Vision Adaptation : Abstract: Fine-Grained Image Retrieval~(FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features. Current leading FGIR solutions t...
- StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation : Abstract: Recent video depth estimation methods achieve great performance by following the paradigm of image depth estimation, i.e., typically fine-tuning pre-trained video diffusion models with massi...
- High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery : Abstract: Object detection in Unmanned Aerial Vehicle (UAV) imagery is fundamentally challenged by a prevalence of small, densely packed, and occluded objects within cluttered backgrounds. Conventiona...
- ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays : Abstract: The global demand for radiologists is increasing rapidly due to a growing reliance on medical imaging services, while the supply of radiologists is not keeping pace. Advances in computer vis...
- Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions : Abstract: The rapid evolution of deep learning (DL) models and the ever-increasing size of available datasets have raised the interest of the research community in the always important field of visual...
- FedVLM: Scalable Personalized Vision-Language Models through Federated Learning : Abstract: Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at sc...
- NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models : Abstract: Video large language models (Video LLMs) have recently achieved strong performance on tasks such as captioning, summarization, and question answering. Many models and training methods explic...
- Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models : Abstract: Complex visual narratives, such as comics, present a significant challenge to Vision-Language Models (VLMs). Despite excelling on natural images, VLMs often struggle with stylized line art, ...
- SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports : Abstract: Deeply understanding sports requires an intricate blend of fine-grained visual perception and rule-based reasoning - a challenge that pushes the limits of current multimodal models. To succe...
- Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR) : Abstract: Robotic- and computer-assisted minimally invasive surgery (RAMIS) is increasingly relying on computer vision methods for reliable instrument recognition and surgical workflow understanding. ...
- Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion : Abstract: Multi-Modal Image Fusion (MMIF) aims to integrate complementary image information from different modalities to produce informative images. Previous deep learning-based MMIF methods generally...
- On Accurate and Robust Estimation of 3D and 2D Circular Center: Method and Application to Camera-Lidar Calibration : Abstract: Circular targets are widely used in LiDAR-camera extrinsic calibration due to their geometric consistency and ease of detection. However, achieving accurate 3D-2D circular center corresponde...
- DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting : Abstract: Urban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human a...
- UniADC: A Unified Framework for Anomaly Detection and Classification : Abstract: In this paper, we introduce the task of unified anomaly detection and classification, which aims to simultaneously detect anomalous regions in images and identify their specific categories. ...
- FreqGRL: Suppressing Low-Frequency Bias and Mining High-Frequency Knowledge for Cross-Domain Few-Shot Learning : Abstract: Cross-domain few-shot learning (CD-FSL) aims to recognize novel classes with only a few labeled examples under significant domain shifts. While recent approaches leverage a limited amount of...
- NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation : Abstract: In this study, we propose NOVO (NO text, Visual-Only prompts), a novel framework that bridges vision-language models (VLMs) and segmentation models through visual-only prompts. Unlike prior ...
- Active Learning for Animal Re-Identification with Ambiguity-Aware Sampling : Abstract: Animal Re-ID has recently gained substantial attention in the AI research community due to its high impact on biodiversity monitoring and unique research challenges arising from environmenta...
- Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks : Abstract: Despite significant progress in pixel-level medical image analysis, existing medical image segmentation models rarely explore medical segmentation and diagnosis tasks jointly. However, it is...
- REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction : Abstract: Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of se...
- AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer : Abstract: Anomaly generation has been widely explored to address the scarcity of anomaly images in real-world data. However, existing methods typically suffer from at least one of the following limita...
- SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection : Abstract: Existing monocular 3D detectors typically tame the pronounced nonlinear regression of 3D bounding box through decoupled prediction paradigm, which employs multiple branches to estimate geome...
- K-Stain: Keypoint-Driven Correspondence for H&E-to-IHC Virtual Staining : Abstract: Virtual staining offers a promising method for converting Hematoxylin and Eosin (H&E) images into Immunohistochemical (IHC) images, eliminating the need for costly chemical processes. Howeve...
- MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos : Abstract: Video mirror detection has received significant research attention, yet existing methods suffer from limited performance and robustness. These approaches often over-rely on single, unreliabl...
- MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression : Abstract: Recent advances in extreme image compression have revealed that mapping pixel data into highly compact latent representations can significantly improve coding efficiency. However, most exist...
- Relative Energy Learning for LiDAR Out-of-Distribution Detection : Abstract: Out-of-distribution (OOD) detection is a critical requirement for reliable autonomous driving, where safety depends on recognizing road obstacles and unexpected objects beyond the training d...
- AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars : Abstract: We present AvatarTex, a high-fidelity facial texture reconstruction framework capable of generating both stylized and photorealistic textures from a single image. Existing methods struggle w...
- Argus: Quality-Aware High-Throughput Text-to-Image Inference Serving System : Abstract: Text-to-image (T2I) models have gained significant popularity. Most of these are diffusion models with unique computational characteristics, distinct from both traditional small-scale ML mod...
- Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning : Abstract: Rain degrades the visual quality of multi-view images, which are essential for 3D scene reconstruction, resulting in inaccurate and incomplete reconstruction results. Existing datasets often...
- SinSEMI: A One-Shot Image Generation Model and Data-Efficient Evaluation Framework for Semiconductor Inspection Equipment : Abstract: In the early stages of semiconductor equipment development, obtaining large quantities of raw optical images poses a significant challenge. This data scarcity hinder the advancement of AI-po...
- Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV : Abstract: Wide-angle videos in few-shot action recognition (FSAR) effectively express actions within specific scenarios. However, without a global understanding of both subjects and background, recogn...
- PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks : Abstract: In this paper, we propose PointCubeNet, a novel multi-modal 3D understanding framework that achieves part-level reasoning without requiring any part annotations. PointCubeNet comprises globa...
- Image Restoration via Primal Dual Hybrid Gradient and Flow Generative Model : Abstract: Regularized optimization has been a classical approach to solving imaging inverse problems, where the regularization term enforces desirable properties of the unknown image. Recently, the in...
- Med-SORA: Symptom to Organ Reasoning in Abdomen CT Images : Abstract: Understanding symptom-image associations is crucial for clinical reasoning. However, existing medical multimodal models often rely on simple one-to-one hard labeling, oversimplifying clinica...
- CAST-LUT: Tokenizer-Guided HSV Look-Up Tables for Purple Flare Removal : Abstract: Purple flare, a diffuse chromatic aberration artifact commonly found around highlight areas, severely degrades the tone transition and color of the image. Existing traditional methods are ba...
- Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation due to its balance between efficiency and visual quality. To address the issues of unstable po...
- ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives : Abstract: 3D Gaussian Splatting (3DGS) achieves state-of-the-art image quality and real-time performance in novel view synthesis but often suffers from a suboptimal spatial distribution of primitives....
- TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning : Abstract: Stereo matching in minimally invasive surgery (MIS) is essential for next-generation navigation and augmented reality. Yet, dense disparity supervision is nearly impossible due to anatomical...
- Integrating Reweighted Least Squares with Plug-and-Play Diffusion Priors for Noisy Image Restoration : Abstract: Existing plug-and-play image restoration methods typically employ off-the-shelf Gaussian denoisers as proximal operators within classical optimization frameworks based on variable splitting....
- MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks : Abstract: Gaussian Splatting (GS) has recently emerged as a promising technique for 3D object reconstruction, delivering high-quality rendering results with significantly improved reconstruction speed...
- ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search : Abstract: Recent advancements in video diffusion models have significantly enhanced audio-driven portrait animation. However, current methods still suffer from flickering, identity drift, and poor aud...
- NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment : Abstract: Visual neural decoding seeks to reconstruct or infer perceived visual stimuli from brain activity patterns, providing critical insights into human cognition and enabling transformative appli...
- PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory : Abstract: Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. Whil...
- Aerial Image Stitching Using IMU Data from a UAV : Abstract: Unmanned Aerial Vehicles (UAVs) are widely used for aerial photography and remote sensing applications. One of the main challenges is to stitch together multiple images into a single high-re...
- Gaussian-Augmented Physics Simulation and System Identification with Complex Colliders : Abstract: System identification involving the geometry, appearance, and physical properties from video observations is a challenging task with applications in robotics and graphics. Recent approaches ...
- Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers : Abstract: While feature-based knowledge distillation has proven highly effective for compressing CNNs, these techniques unexpectedly fail when applied to Vision Transformers (ViTs), often performing w...
- Ambiguity-aware Truncated Flow Matching for Ambiguous Medical Image Segmentation : Abstract: A simultaneous enhancement of accuracy and diversity of predictions remains a challenge in ambiguous medical image segmentation (AMIS) due to the inherent trade-offs. While truncated diffusi...
- VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling : Abstract: Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer...
- Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions : Abstract: Text-to-image models have rapidly evolved from casual creative tools to professional-grade systems, achieving unprecedented levels of image quality and realism. Yet, most models are trained ...
- A Two-Stage System for Layout-Controlled Image Generation using Large Language Models and Diffusion Models : Abstract: Text-to-image diffusion models exhibit remarkable generative capabilities, but lack precise control over object counts and spatial arrangements. This work introduces a two-stage system to ad...
- Adaptive Morph-Patch Transformer for Arotic Vessel Segmentation : Abstract: Accurate segmentation of aortic vascular structures is critical for diagnosing and treating cardiovascular diseases.Traditional Transformer-based models have shown promise in this domain by ...
- Classification of Microplastic Particles in Water using Polarized Light Scattering and Machine Learning Methods : Abstract: Facing the critical need for continuous, large-scale microplastic monitoring, which is hindered by the limitations of gold-standard methods in aquatic environments, this paper introduces and...
- Mono3DVG-EnSD: Enhanced Spatial-aware and Dimension-decoupled Text Encoding for Monocular 3D Visual Grounding : Abstract: Monocular 3D Visual Grounding (Mono3DVG) is an emerging task that locates 3D objects in RGB images using text descriptions with geometric cues. However, existing methods face two key limitat...
- DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling : Abstract: Video shadow detection confronts two entwined difficulties: distinguishing shadows from complex backgrounds and modeling dynamic shadow deformations under varying illumination. To address sh...
- PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data : Abstract: Global plant maps of plant traits, such as leaf nitrogen or plant height, are essential for understanding ecosystem processes, including the carbon and energy cycles of the Earth system. How...
- From Attribution to Action: Jointly ALIGNing Predictions and Explanations : Abstract: Explanation-guided learning (EGL) has shown promise in aligning model predictions with interpretable reasoning, particularly in computer vision tasks. However, most approaches rely on extern...
- FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection : Abstract: The well-aligned attribute of CLIP-based models enables its effective application like CLIPscore as a widely adopted image quality assessment metric. However, such a CLIP-based metric is vul...
- PADM: A Physics-aware Diffusion Model for Attenuation Correction : Abstract: Attenuation artifacts remain a significant challenge in cardiac Myocardial Perfusion Imaging (MPI) using Single-Photon Emission Computed Tomography (SPECT), often compromising diagnostic acc...
- GFix: Perceptually Enhanced Gaussian Splatting Video Compression : Abstract: 3D Gaussian Splatting (3DGS) enhances 3D scene reconstruction through explicit representation and fast rendering, demonstrating potential benefits for various low-level vision tasks, includi...
- Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning : Abstract: Whole-slide images are central to digital pathology, yet their extreme size and scarce annotations make self-supervised learning essential. Masked Autoencoders (MAEs) with Vision Transformer...
- Exploring the "Great Unseen" in Medieval Manuscripts: Instance-Level Labeling of Legacy Image Collections with Zero-Shot Models : Abstract: We aim to theorize the medieval manuscript page and its contents more holistically, using state-of-the-art techniques to segment and describe the entire manuscript folio, for the purpose of ...
- Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data : Abstract: The continually advancing quality of deepfake technology exacerbates the threats of disinformation, fraud, and harassment by making maliciously-generated synthetic content increasingly diffi...
- Certified L2-Norm Robustness of 3D Point Cloud Recognition in the Frequency Domain : Abstract: 3D point cloud classification is a fundamental task in safety-critical applications such as autonomous driving, robotics, and augmented reality. However, recent studies reveal that point clo...
- 3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition : Abstract: Deep neural networks have recently achieved notable progress in 3D point cloud recognition, yet their vulnerability to adversarial perturbations poses critical security challenges in practic...
- From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge : Abstract: Large-scale Video Foundation Models (VFMs) has significantly advanced various video-related tasks, either through task-specific models or Multi-modal Large Language Models (MLLMs). However, ...
- Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation : Abstract: The generalization capability of deepfake detectors is critical for real-world use. Data augmentation via synthetic fake face generation effectively enhances generalization, yet current SoTA...
- RaLD: Generating High-Resolution 3D Radar Point Clouds with Latent Diffusion : Abstract: Millimeter-wave radar offers a promising sensing modality for autonomous systems thanks to its robustness in adverse conditions and low cost. However, its utility is significantly limited by...
- LeCoT: revisiting network architecture for two-view correspondence pruning : Abstract: Two-view correspondence pruning aims to accurately remove incorrect correspondences (outliers) from initial ones and is widely applied to various computer vision tasks. Current popular strat...
- Pandar128 dataset for lane line detection : Abstract: We present Pandar128, the largest public dataset for lane line detection using a 128-beam LiDAR. It contains over 52,000 camera frames and 34,000 LiDAR scans, captured in diverse real-world ...
- How Bias Binds: Measuring Hidden Associations for Bias Control in Text-to-Image Compositions : Abstract: Text-to-image generative models often exhibit bias related to sensitive attributes. However, current research tends to focus narrowly on single-object prompts with limited contextual diversi...
- GEWDiff: Geometric Enhanced Wavelet-based Diffusion Model for Hyperspectral Image Super-resolution : Abstract: Improving the quality of hyperspectral images (HSIs), such as through super-resolution, is a crucial research area. However, generative modeling for HSIs presents several challenges. Due to ...
- HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving : Abstract: Three-dimensional feature extraction is a critical component of autonomous driving systems, where perception tasks such as 3D object detection, bird's-eye-view (BEV) semantic segmentation, a...
- Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction : Abstract: Dynamic Gaussian Splatting approaches have achieved remarkable performance for 4D scene reconstruction. However, these approaches rely on dense-frame video sequences for photorealistic recon...
- MPJudge: Towards Perceptual Assessment of Music-Induced Paintings : Abstract: Music induced painting is a unique artistic practice, where visual artworks are created under the influence of music. Evaluating whether a painting faithfully reflects the music that inspire...
- ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction : Abstract: We introduce ProcGen3D, a new approach for 3D content creation by generating procedural graph abstractions of 3D objects, which can then be decoded into rich, complex 3D assets. Inspired by ...
- LiteUpdate: A Lightweight Framework for Updating AI-Generated Image Detectors : Abstract: The rapid progress of generative AI has led to the emergence of new generative models, while existing detection methods struggle to keep pace, resulting in significant degradation in the det...
- Automated Estimation of Anatomical Risk Metrics for Endoscopic Sinus Surgery Using Deep Learning : Abstract: Endoscopic sinus surgery requires careful preoperative assessment of the skull base anatomy to minimize risks such as cerebrospinal fluid leakage. Anatomical risk scores like the Keros, Gera...
- Geometric implicit neural representations for signed distance functions : Abstract: \textit{Implicit neural representations} (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the s...
- Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images : Abstract: This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that "generation facilitates ...
- Mapping Reduced Accessibility to WASH Facilities in Rohingya Refugee Camps with Sub-Meter Imagery : Abstract: Access to Water, Sanitation, and Hygiene (WASH) services remains a major public health concern in refugee camps. This study introduces a remote sensing-driven framework to quantify WASH acce...
- Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation : Abstract: In autonomous driving and robotics, ensuring road safety and reliable decision-making critically depends on out-of-distribution (OOD) segmentation. While numerous methods have been proposed ...
- 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation : Abstract: Remarkable advances in recent 2D image and 3D shape generation have induced a significant focus on dynamic 4D content generation. However, previous 4D generation methods commonly struggle to...
- MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs : Abstract: The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, ove...
- StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression : Abstract: Video Large Language Models (Video-LLMs) have demonstrated significant potential in the areas of video captioning, search, and summarization. However, current Video-LLMs still face challenge...
- Segmentation of Ischemic Stroke Lesions using Transfer Learning on Multi-sequence MRI : Abstract: The accurate understanding of ischemic stroke lesions is critical for efficient therapy and prognosis of stroke patients. Magnetic resonance imaging (MRI) is sensitive to acute ischemic stro...
- Glioma C6: A Novel Dataset for Training and Benchmarking Cell Segmentation : Abstract: We present Glioma C6, a new open dataset for instance segmentation of glioma C6 cells, designed as both a benchmark and a training resource for deep learning models. The dataset comprises 75...
- LMM-IQA: Image Quality Assessment for Low-Dose CT Imaging : Abstract: Low-dose computed tomography (CT) represents a significant improvement in patient safety through lower radiation doses, but increased noise, blur, and contrast loss can diminish diagnostic q...
- VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models : Abstract: Video anomaly understanding (VAU) aims to provide detailed interpretation and semantic comprehension of anomalous events within videos, addressing limitations of traditional methods that foc...
- Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection : Abstract: Source-Free Object Detection (SFOD) aims to adapt a source-pretrained object detector to a target domain without access to source data. However, existing SFOD methods predominantly rely on i...
- YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting : Abstract: Fast and flexible 3D scene reconstruction from unstructured image collections remains a significant challenge. We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gau...
- Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion : Abstract: LiDAR super-resolution addresses the challenge of achieving high-quality 3D perception from cost-effective, low-resolution sensors. While recent transformer-based approaches like TULIP show ...
- DIMO: Diverse 3D Motion Generation for Arbitrary Objects : Abstract: We present DIMO, a generative approach capable of generating diverse 3D motions for arbitrary objects from a single image. The core idea of our work is to leverage the rich priors in well-tr...
- TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research : Abstract: Developing embodied AI for intelligent surgical systems requires safe, controllable environments for continual learning and evaluation. However, safety regulations and operational constraint...
- sMRI-based Brain Age Estimation in MCI using Persistent Homology : Abstract: In this study, we propose the use of persistent homology- specifically Betti curves for brain age prediction and for distinguishing between healthy and pathological aging. The proposed frame...
- Selective Diabetic Retinopathy Screening with Accuracy-Weighted Deep Ensembles and Entropy-Guided Abstention : Abstract: Diabetic retinopathy (DR), a microvascular complication of diabetes and a leading cause of preventable blindness, is projected to affect more than 130 million individuals worldwide by 2030. ...
- Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots : Abstract: The deployment of artificial intelligence models at the edge is increasingly critical for autonomous robots operating in GPS-denied environments where local, resource-efficient reasoning is ...
- Training-Free Adaptive Quantization for Variable Rate Image Coding for Machines : Abstract: Image Coding for Machines (ICM) has become increasingly important with the rapid integration of computer vision into real-world applications. However, most ICM frameworks utilize learned ima...
- StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video : Abstract: Streaming free-viewpoint video~(FVV) in real-time still faces significant challenges, particularly in training, rendering, and transmission efficiency. Harnessing superior performance of 3D ...
- Neodragon: Mobile Video Generation using Diffusion Transformer : Abstract: We introduce Neodragon, a text-to-video system capable of generating 2s (49 frames @24 fps) videos at the 640x1024 resolution directly on a Qualcomm Hexagon NPU in a record 6.7s (7 FPS). Dif...
- LoopExpose: An Unsupervised Framework for Arbitrary-Length Exposure Correction : Abstract: Exposure correction is essential for enhancing image quality under challenging lighting conditions. While supervised learning has achieved significant progress in this area, it relies heavil...
- An Artificial Intelligence-based Assistant for the Visually Impaired : Abstract: This paper describes an artificial intelligence-based assistant application, AIDEN, developed during 2023 and 2024, aimed at improving the quality of life for visually impaired individuals. ...
- Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration : Abstract: Motion blur in scene text images severely impairs readability and hinders the reliability of computer vision tasks, including autonomous driving, document digitization, and visual informatio...
- DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects : Abstract: In this work, we propose a disentangled latent optimization-based method for parameterizing grouped deforming 3D objects into shape and deformation factors in an unsupervised manner. Our app...
- Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving : Abstract: Recent advances in inverse problem solving have increasingly adopted flow priors over diffusion models due to their ability to construct straight probability paths from noise to data, thereb...
- Real-Time Bundle Adjustment for Ultra-High-Resolution UAV Imagery Using Adaptive Patch-Based Feature Tracking : Abstract: Real-time processing of UAV imagery is crucial for applications requiring urgent geospatial information, such as disaster response, where rapid decision-making and accurate spatial data are ...
- MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution : Abstract: Chinese opera is celebrated for preserving classical art. However, early filming equipment limitations have degraded videos of last-century performances by renowned artists (e.g., low frame ...
- NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling : Abstract: Generating editable 3D CAD models from natural language remains challenging, as existing text-to-CAD systems either produce meshes or rely on scarce design-history data. We present NURBGen, ...
- Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models : Abstract: This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local par...
- MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition : Abstract: Pre-trained vision language models have shown remarkable performance on visual recognition tasks, but they typically assume the availability of complete multimodal inputs during both trainin...
- Temporal-Guided Visual Foundation Models for Event-Based Vision : Abstract: Event cameras offer unique advantages for vision tasks in challenging environments, yet processing asynchronous event streams remains an open challenge. While existing methods rely on specia...
- Physics-Informed Image Restoration via Progressive PDE Integration : Abstract: Motion blur, caused by relative movement between camera and scene during exposure, significantly degrades image quality and impairs downstream computer vision tasks such as object detection,...
- Gait Recognition via Collaborating Discriminative and Generative Diffusion Models : Abstract: Gait recognition offers a non-intrusive biometric solution by identifying individuals through their walking patterns. Although discriminative models have achieved notable success in this dom...
- AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving : Abstract: Effectively integrating Large Language Models (LLMs) into autonomous driving requires a balance between leveraging high-level reasoning and maintaining real-time efficiency. Existing approac...
- VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving : Abstract: Recent advancements in language-grounded autonomous driving have been significantly promoted by the sophisticated cognition and reasoning capabilities of large language models (LLMs). Howeve...
- Robust Nearest Neighbour Retrieval Using Targeted Manifold Manipulation : Abstract: Nearest-neighbour retrieval is central to classification and explainable-AI pipelines, but current practice relies on hand-tuning feature layers and distance metrics. We propose Targeted Man...
- A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images : Abstract: We propose a modular framework for predicting cancer specific survival from whole slide pathology images (WSIs). The method integrates four components: (i) Quantile Gated Patch Selection via...
- LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval : Abstract: Cross-modal retrieval is essential for interpreting cultural heritage data, but its effectiveness is often limited by incomplete or inconsistent textual descriptions, caused by historical da...
- RelightMaster: Precise Video Relighting with Multi-plane Light Images : Abstract: Recent advances in diffusion models enable high-quality video generation and editing, but precise relighting with consistent video contents, which is critical for shaping scene atmosphere an...
- LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation : Abstract: Centerline graphs, crucial for path planning in autonomous driving, are traditionally learned using deterministic methods. However, these methods often lack spatial reasoning and struggle wi...
- VideoSSR: Video Self-Supervised Reinforcement Learning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has substantially advanced the video understanding capabilities of Multimodal Large Language Models (MLLMs). However, the rapid progress...
- From ACR O-RADS 2022 to Explainable Deep Learning: Comparative Performance of Expert Radiologists, Convolutional Neural Networks, Vision Transformers, and Fusion Models in Ovarian Masses : Abstract: Background: The 2022 update of the Ovarian-Adnexal Reporting and Data System (O-RADS) ultrasound classification refines risk stratification for adnexal lesions, yet human interpretation rema...
- TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks : Abstract: While Vision Language Models (VLMs) have demonstrated remarkable capabilities in general visual understanding, their application in the chemical domain has been limited, with previous works ...
- Learning-Based Vision Systems for Semi-Autonomous Forklift Operation in Industrial Warehouse Environments : Abstract: The automation of material handling in warehouses increasingly relies on robust, low cost perception systems for forklifts and Automated Guided Vehicles (AGVs). This work presents a vision b...
- SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection : Abstract: Recent multispectral object detection methods have primarily focused on spatial-domain feature fusion based on CNNs or Transformers, while the potential of frequency-domain feature remains u...
- Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field : Abstract: Recently, 3D Gaussian Splatting (3DGS), an explicit scene representation technique, has shown significant promise for dynamic novel-view synthesis from monocular video input. However, purely...
- Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates : Abstract: Reconstructing high-quality point clouds from images remains challenging in computer vision. Existing generative-model-based approaches, particularly diffusion-model approaches that directly...
- Seq2Seq Models Reconstruct Visual Jigsaw Puzzles without Seeing Them : Abstract: Jigsaw puzzles are primarily visual objects, whose algorithmic solutions have traditionally been framed from a visual perspective. In this work, however, we explore a fundamentally different...
- CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection : Abstract: While context-based detectors have achieved strong generalization for AI-generated text by measuring distributional inconsistencies, image-based detectors still struggle with overfitting to ...
- Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection : Abstract: Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos. However, imbalanced unimodal performance often leads to suboptimal fused rep...
- Label-Efficient 3D Forest Mapping: Self-Supervised and Transfer Learning for Individual, Structural, and Species Analysis : Abstract: Detailed structural and species information on individual tree level is increasingly important to support precision forestry, biodiversity conservation, and provide reference data for biomas...
- BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models : Abstract: As digital twins become central to the transformation of modern cities, accurate and structured 3D building models emerge as a key enabler of high-fidelity, updatable urban representations. ...
- GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding : Abstract: Gaze understanding unifies the detection of people, their gaze targets, and objects of interest into a single framework, offering critical insight into visual attention and intent estimation...
- AesTest: Measuring Aesthetic Intelligence from Perception to Production : Abstract: Perceiving and producing aesthetic judgments is a fundamental yet underexplored capability for multimodal large language models (MLLMs). However, existing benchmarks for image aesthetic asse...
- V-Shuffle: Zero-Shot Style Transfer via Value Shuffle : Abstract: Attention injection-based style transfer has achieved remarkable progress in recent years. However, existing methods often suffer from content leakage, where the undesired semantic content o...
- InfoAffect: A Dataset for Affective Analysis of Infographics : Abstract: Infographics are widely used to convey complex information, yet their affective dimensions remain underexplored due to the scarcity of data resources. We introduce a 3.5k-sample affect-annot...
- On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective : Abstract: Infrared and visible object detection (IVOD) is essential for numerous around-the-clock applications. Despite notable advancements, current IVOD models exhibit notable performance declines w...
- VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes : Abstract: Neural Radiance Fields (NeRFs) implicitly model continuous three-dimensional scenes using a set of images with known camera poses, enabling the rendering of photorealistic novel views. Howev...
- DiffusionUavLoc: Visually Prompted Diffusion for Cross-View UAV Localization : Abstract: With the rapid growth of the low-altitude economy, unmanned aerial vehicles (UAVs) have become key platforms for measurement and tracking in intelligent patrol systems. However, in GNSS-deni...
- Diagnose Like A REAL Pathologist: An Uncertainty-Focused Approach for Trustworthy Multi-Resolution Multiple Instance Learning : Abstract: With the increasing demand for histopathological specimen examination and diagnostic reporting, Multiple Instance Learning (MIL) has received heightened research focus as a viable solution f...
- EIDSeg: A Pixel-Level Semantic Segmentation Dataset for Post-Earthquake Damage Assessment from Social Media Images : Abstract: Rapid post-earthquake damage assessment is crucial for rescue and resource planning. Still, existing remote sensing methods depend on costly aerial images, expert labeling, and produce only ...
- Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360{\deg} Scenes : Abstract: Despite recent advances in single-object front-facing inpainting using NeRF and 3D Gaussian Splatting (3DGS), inpainting in complex 360{\deg} scenes remains largely underexplored. This is pr...
- Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction : Abstract: In da Vinci robotic surgery, surgeons' hands and eyes are fully engaged in the procedure, making it difficult to access and manipulate multimodal patient data without interruption. We propos...
- ConvFill: Model Collaboration for Responsive Conversational Voice Agents : Abstract: Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency t...
- SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations : Abstract: We introduce SPOT (Stopping Points in Online Threads), the first annotated corpus translating the sociological concept of stopping point into a reproducible NLP task. Stopping points are ord...
- The Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis : Abstract: Large-language models (LLMs) are rapidly being applied to radiology, enabling automated image interpretation and report generation tasks. Their deployment in clinical practice requires both ...
- Predicting Oscar-Nominated Screenplays with Sentence Embeddings : Abstract: Oscar nominations are an important factor in the movie industry because they can boost both the visibility and the commercial success. This work explores whether it is possible to predict Os...
- Approximating the Mathematical Structure of Psychodynamics : Abstract: The complexity of human cognition has meant that psychology makes more use of theory and conceptual models than perhaps any other biomedical field. To enable precise quantitative study of th...
- A Representation Sharpening Framework for Zero Shot Dense Retrieval : Abstract: Zero-shot dense retrieval is a challenging setting where a document corpus is provided without relevant queries, necessitating a reliance on pretrained dense retrievers (DRs). However, since...
- Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale : Abstract: Recent progress in multimodal reasoning has been driven largely by undisclosed datasets and proprietary data synthesis recipes, leaving open questions about how to systematically build large...
- Persian Musical Instruments Classification Using Polyphonic Data Augmentation : Abstract: Musical instrument classification is essential for music information retrieval (MIR) and generative music systems. However, research on non-Western traditions, particularly Persian music, re...
- Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs : Abstract: Large language models (LLMs) are increasingly examined as both behavioral subjects and decision systems, yet it remains unclear whether observed cognitive biases reflect surface imitation or...
- MCP-RiskCue: Can LLM infer risk information from MCP server System Logs? : Abstract: Large language models (LLMs) demonstrate strong capabilities in solving complex tasks when integrated with external tools. The Model Context Protocol (MCP) has become a standard interface fo...
- The Imperfect Learner: Incorporating Developmental Trajectories in Memory-based Student Simulation : Abstract: User simulation is important for developing and evaluating human-centered AI, yet current student simulation in educational applications has significant limitations. Existing approaches focu...
- Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs : Abstract: LLMs are now an integral part of information retrieval. As such, their role as question answering chatbots raises significant concerns due to their shown vulnerability to adversarial man-in-...
- ScRPO: From Errors to Insights : Abstract: We propose Self-correction Relative Policy Optimization (ScRPO), a novel reinforcement learning framework designed to enhance large language models on challenging mathemati- cal problems by ...
- Simulating Students with Large Language Models: A Review of Architecture, Mechanisms, and Role Modelling in Education with Generative AI : Abstract: Simulated Students offer a valuable methodological framework for evaluating pedagogical approaches and modelling diverse learner profiles, tasks which are otherwise challenging to undertake ...
- Large Language Models Develop Novel Social Biases Through Adaptive Exploration : Abstract: As large language models (LLMs) are adopted into frameworks that grant them the capacity to make real decisions, it is increasingly important to ensure that they are unbiased. In this paper,...
- Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles : Abstract: While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmark...
- Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads : Abstract: Solving complex tasks usually requires LLMs to generate long multi-step reasoning chains. Previous work has shown that verifying the correctness of individual reasoning steps can further imp...
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B : Abstract: Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-S...
- Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective : Abstract: Multimodal Misinformation Detection (MMD) refers to the task of detecting social media posts involving misinformation, where the post often contains text and image modalities. However, by ob...
- ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction : Abstract: Audio-visual target speaker extraction (AV-TSE) models primarily rely on visual cues from the target speaker. However, humans also leverage linguistic knowledge, such as syntactic constraint...
- LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation : Abstract: Large Language Models (LLMs) have made rapid progress in reasoning, question answering, and professional applications; however, their true capabilities remain difficult to evaluate using exi...
- MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) suffer from sycophantic behavior, where models tend to agree with users' incorrect beliefs and follow misinformation rather than maintain independent reasoning....
- When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms : Abstract: In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powered by large language model (LLM) agents. We investigate whether agents can collaborate ...
- On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception : Abstract: Artificial Neural Networks, the building blocks of AI, were inspired by the human brain's network of neurons. Over the years, these networks have evolved to replicate the complex capabilitie...
- FPGA or GPU? Analyzing comparative research for application-specific guidance : Abstract: The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing U...
- HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment : Abstract: Contrastive vision-language models like CLIP have achieved impressive results in image-text retrieval by aligning image and text representations in a shared embedding space. However, these m...
- Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries : Abstract: How do we make a meaningful comparison of a large language model's knowledge of the law in one place compared to another? Quantifying these differences is critical to understanding if the qu...
- Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have spurred significant progress in Chain-of-Thought (CoT) reasoning. Building on the success of Deepseek-R1, researchers extende...
- MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Risks in LLMs on Domain Tasks : Abstract: Ensuring the safety and value alignment of large language models (LLMs) is critical for their deployment. Current alignment efforts primarily target explicit risks such as bias, hate speech,...
- Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents : Abstract: Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale. Within this paradigm, fe...
- IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction : Abstract: Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-...
- DiLA: Enhancing LLM Tool Learning with Differential Logic Layer : Abstract: Considering the challenges faced by large language models (LLMs) in logical reasoning and planning, prior efforts have sought to augment LLMs with access to external solvers. While progress ...
- Likelihood-based Mitigation of Evaluation Bias in Large Language Models : Abstract: Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can ...
- FedCoT: Federated Chain-of-Thought Distillation for Large Language Models : Abstract: Large Language Models (LLMs) have emerged as a transformative force in artificial intelligence, demonstrating exceptional proficiency across various tasks. However, their deployment in resou...
- BLADE: Benchmarking Language Model Agents for Data-Driven Science : Abstract: Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical d...
- Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM : Abstract: Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remai...
- Skill Path: Unveiling Language Skills from Circuit Graphs : Abstract: Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from ato...
- Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation : Abstract: The rapid development of large language models (LLMs) has highlighted the need for efficient and reliable methods to evaluate their performance. Traditional evaluation methods often face cha...
- All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing : Abstract: Due to their capacity to acquire world knowledge from large corpora, pre-trained language models (PLMs) are extensively used in ultra-fine entity typing tasks where the space of labels is ex...
- Shared Heritage, Distinct Writing: Rethinking Resource Selection for East Asian Historical Documents : Abstract: Historical documents in the Sinosphere are known to share common formats and practices, particularly in veritable records compiled by court historians. This shared linguistic heritage has le...
- LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification : Abstract: With the ever-increasing number of news stories available online, classifying them by topic, regardless of the language they are written in, has become crucial for enhancing readers' access ...
- RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment : Abstract: Rare diseases, despite their low individual incidence, collectively impact around 300 million people worldwide due to the vast number of diseases. The involvement of multiple organs and syst...
- Revealing emergent human-like conceptual representations from language prediction : Abstract: People acquire concepts through rich physical and social experiences and use them to understand and navigate the world. In contrast, large language models (LLMs), trained solely through next...
- Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models : Abstract: The tendency of users to anthropomorphise large language models (LLMs) is of growing interest to AI developers, researchers, and policy-makers. Here, we present a novel method for empiricall...
- KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse : Abstract: We describe KVLink, an approach for efficient key-value (KV) cache reuse in large language models (LLMs). In many LLM applications, different inputs can share overlapping context, such as th...
- Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation : Abstract: Logical reasoning is essential for large language models (LLMs) to ensure accurate and coherent inference. However, LLMs struggle with reasoning order variations and fail to generalize acros...
- ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer : Abstract: To achieve equitable performance across languages, large language models (LLMs) must be able to abstract knowledge beyond the language in which it was learnt. However, the current literature...
- Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond : Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly ...
- Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering : Abstract: The collaborative paradigm of large and small language models (LMs) effectively balances performance and cost, yet its pivotal challenge lies in precisely pinpointing the moment of invocatio...
- Atomic Consistency Preference Optimization for Long-Form Question Answering : Abstract: Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by train...
- Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection : Abstract: Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimod...
- Enhancing Large Language Models for Detecting Mental Manipulation via Annotation-Free Data Augmentation and Anti-Curriculum Distillation : Abstract: Mental manipulation is a subtle yet pervasive form of psychological abuse that poses serious threats to mental health. Nevertheless, detecting mental manipulation remains a largely underexpl...
- Language Model Distillation: A Temporal Difference Imitation Learning Perspective : Abstract: Large language models have led to significant progress across many NLP tasks, although their massive sizes often incur substantial computational costs. Distillation has become a common pract...
- Rethinking Text-based Protein Understanding: Retrieval or LLM? : Abstract: In recent years, protein-text models have gained significant attention for their potential in protein generation and understanding. Current approaches focus on integrating protein-related kn...
- DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning : Abstract: Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering. Existing prompting ...
- When Language Shapes Thought: Cross-Lingual Transfer of Factual Knowledge in Question Answering : Abstract: Multilingual large language models (LLMs) offer promising opportunities for cross-lingual information access, yet their use of factual knowledge remains highly sensitive to the input languag...
- LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text : Abstract: As large language models (LLMs) are increasingly used in legal applications, current evaluation benchmarks tend to focus mainly on factual accuracy while largely neglecting important linguis...
- OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics : Abstract: Robust unlearning is crucial for safely deploying large language models (LLMs) in environments where data privacy, model safety, and regulatory compliance must be ensured. Yet the task is in...
- Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations : Abstract: Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency rela...
- ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues? : Abstract: In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dyn...
- Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited : Abstract: We investigate the abilities of 28 Large language Models (LLMs) to reason about cardinal directions (CDs) using a benchmark generated from a set of templates, extensively testing an LLM's ab...
- Normality and the Turing Test : Abstract: This paper proposes to revisit the Turing test through the concept of normality. Its core argument is that the Turing test is a test of normal intelligence as assessed by a normal judge. Fir...
- SEAGraph: Unveiling the Whole Story of Paper Review Comments : Abstract: Peer review, as a cornerstone of scientific research, ensures the integrity and quality of scholarly work by providing authors with objective feedback for refinement. However, in the traditi...
- Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines : Abstract: Data preparation (DP) transforms raw data into a form suitable for downstream applications, typically by composing operations into executable pipelines. Building such pipelines is time-consu...
- Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2 : Abstract: Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention a...
- Automated Invoice Data Extraction: Using LLM and OCR : Abstract: Conventional Optical Character Recognition (OCR) systems are challenged by variant invoice layouts, handwritten text, and low- quality scans, which are often caused by strong template depend...
- In-Context-Learning-Assisted Quality Assessment Vision-Language Models for Metal Additive Manufacturing : Abstract: Vision-based quality assessment in additive manufacturing often requires dedicated machine learning models and application-specific datasets. However, data collection and model training can ...
- EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning : Abstract: In complex embodied long-horizon manipulation tasks, effective task decomposition and execution require synergistic integration of textual logical reasoning and visual-spatial imagination to...
- Compressing Multi-Task Model for Autonomous Driving via Pruning and Knowledge Distillation : Abstract: Autonomous driving systems rely on panoptic perception to jointly handle object detection, drivable area segmentation, and lane line segmentation. Although multi-task learning is an effectiv...
- FilletRec: A Lightweight Graph Neural Network with Intrinsic Features for Automated Fillet Recognition : Abstract: Automated recognition and simplification of fillet features in CAD models is critical for CAE analysis, yet it remains an open challenge. Traditional rule-based methods lack robustness, whil...
- M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection : Abstract: Video anomaly detection (VAD) is an essential task in the image processing community with prospects in video surveillance, which faces fundamental challenges in balancing detection accuracy ...
- In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy : Abstract: Foundation vision-language models (VLMs) excel on natural images, but their utility for biomedical microscopy remains underexplored. In this paper, we investigate how in-context learning ena...
- Efficient Online Continual Learning in Sensor-Based Human Activity Recognition : Abstract: Machine learning models for sensor-based human activity recognition (HAR) are expected to adapt post-deployment to recognize new activities and different ways of performing existing ones. To...
- C3-Diff: Super-resolving Spatial Transcriptomics via Cross-modal Cross-content Contrastive Diffusion Modelling : Abstract: The rapid advancement of spatial transcriptomics (ST), i.e., spatial gene expressions, has made it possible to measure gene expression within original tissue, enabling us to discover molecul...
- Video Text Preservation with Synthetic Text-Rich Videos : Abstract: While Text-To-Video (T2V) models have advanced rapidly, they continue to struggle with generating legible and coherent text within videos. In particular, existing models often fail to render...
- Elements of Active Continuous Learning and Uncertainty Self-Awareness: a Narrow Implementation for Face and Facial Expression Recognition : Abstract: Reflection on one's thought process and making corrections to it if there exists dissatisfaction in its performance is, perhaps, one of the essential traits of intelligence. However, such hi...
- DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping : Abstract: Diffusion-based approaches have recently achieved strong results in face swapping, offering improved visual quality over traditional GAN-based methods. However, even state-of-the-art models ...
- Google-MedGemma Based Abnormality Detection in Musculoskeletal radiographs : Abstract: This paper proposes a MedGemma-based framework for automatic abnormality detection in musculoskeletal radiographs. Departing from conventional autoencoder and neural network pipelines, the p...
- In-process 3D Deviation Mapping and Defect Monitoring (3D-DM2) in High Production-rate Robotic Additive Manufacturing : Abstract: Additive manufacturing (AM) is an emerging digital manufacturing technology to produce complex and freeform objects through a layer-wise deposition. High deposition rate robotic AM (HDRRAM) ...
- Walking the Schr\"odinger Bridge: A Direct Trajectory for Text-to-3D Generation : Abstract: Recent advancements in optimization-based text-to-3D generation heavily rely on distilling knowledge from pre-trained text-to-image diffusion models using techniques like Score Distillation ...
- Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment : Abstract: Human pose serves as a cornerstone of action quality assessment (AQA), where subtle spatial-temporal variations in pose often distinguish excellence from mediocrity. In high-level competitio...
- Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization : Abstract: Text-to-image (T2I) diffusion models have made remarkable strides in generating and editing high-fidelity images from text. Yet, these models remain fundamentally generic, failing to adapt t...
- Convolutional Fully-Connected Capsule Network (CFC-CapsNet): A Novel and Fast Capsule Network : Abstract: A Capsule Network (CapsNet) is a relatively new classifier and one of the possible successors of Convolutional Neural Networks (CNNs). CapsNet maintains the spatial hierarchies between the f...
- Culture in Action: Evaluating Text-to-Image Models through Social Activities : Abstract: Text-to-image (T2I) diffusion models achieve impressive photorealism by training on large-scale web data, but models inherit cultural biases and fail to depict underrepresented regions faith...
- Pedicle Screw Pairing and Registration for Screw Pose Estimation from Dual C-arm Images Using CAD Models : Abstract: Accurate matching of pedicle screws in both anteroposterior (AP) and lateral (LAT) images is critical for successful spinal decompression and stabilization during surgery. However, establish...
- Towards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective : Abstract: Ultrasound (US) video segmentation remains a challenging problem due to strong inter- and intra-dataset variability, motion artifacts, and limited annotated data. Although foundation models ...
- A Second-Order Attention Mechanism For Prostate Cancer Segmentation and Detection in Bi-Parametric MRI : Abstract: The detection of clinically significant prostate cancer lesions (csPCa) from biparametric magnetic resonance imaging (bp-MRI) has emerged as a noninvasive imaging technique for improving acc...
- TCSA-UDA: Text-Driven Cross-Semantic Alignment for Unsupervised Domain Adaptation in Medical Image Segmentation : Abstract: Unsupervised domain adaptation for medical image segmentation remains a significant challenge due to substantial domain shifts across imaging modalities, such as CT and MRI. While recent vis...
- Position-Prior-Guided Network for System Matrix Super-Resolution in Magnetic Particle Imaging : Abstract: Magnetic Particle Imaging (MPI) is a novel medical imaging modality. One of the established methods for MPI reconstruction is based on the System Matrix (SM). However, the calibration of the...
- MACMD: Multi-dilated Contextual Attention and Channel Mixer Decoding for Medical Image Segmentation : Abstract: Medical image segmentation faces challenges due to variations in anatomical structures. While convolutional neural networks (CNNs) effectively capture local features, they struggle with mode...
- LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting : Abstract: End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text...
- Hilbert-Guided Block-Sparse Local Attention : Abstract: The quadratic compute and memory costs of global self-attention severely limit its use in high-resolution images. Local attention reduces complexity by restricting attention to neighborhoods...
- TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation : Abstract: Remote photoplethysmography (rPPG) can remotely extract physiological signals from RGB video, which has many advantages in detecting heart rate, such as low cost and no invasion to patients....
- Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation : Abstract: Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective wi...
- Point Cloud Segmentation of Integrated Circuits Package Substrates Surface Defects Using Causal Inference: Dataset Construction and Methodology : Abstract: The effective segmentation of 3D data is crucial for a wide range of industrial applications, especially for detecting subtle defects in the field of integrated circuits (IC). Ceramic packag...
- CGCE: Classifier-Guided Concept Erasure in Generative Models : Abstract: Recent advancements in large-scale generative models have enabled the creation of high-quality images and videos, but have also raised significant safety concerns regarding the generation of...
- Light-Field Dataset for Disparity Based Depth Estimation : Abstract: A Light Field (LF) camera consists of an additional two-dimensional array of micro-lenses placed between the main lens and sensor, compared to a conventional camera. The sensor pixels under ...
- Towards Frequency-Adaptive Learning for SAR Despeckling : Abstract: Synthetic Aperture Radar (SAR) images are inherently corrupted by speckle noise, limiting their utility in high-precision applications. While deep learning methods have shown promise in SAR ...
- Hybrid second-order gradient histogram based global low-rank sparse regression for robust face recognition : Abstract: Low-rank sparse regression models have been widely applied in the field of face recognition. To further address the challenges caused by complex occlusions and illumination variations, this ...
- Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning : Abstract: Understanding 3D scenes in open-world settings poses fundamental challenges for vision and robotics, particularly due to the limitations of closed-vocabulary supervision and static annotatio...
- GABFusion: Rethinking Feature Fusion for Low-Bit Quantization of Multi-Task Networks : Abstract: Despite the effectiveness of quantization-aware training (QAT) in compressing deep neural networks, its performance on multi-task architectures often degrades significantly due to task-speci...
- Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation : Abstract: Despite the remarkable advancements of Large Vision-Language Models (LVLMs), the mechanistic interpretability remains underexplored. Existing analyses are insufficiently comprehensive and la...
- CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework : Abstract: Masked Autoencoders (MAE) achieve self-supervised learning of image representations by randomly removing a portion of visual tokens and reconstructing the original image as a pretext task, t...
- AD-DAE: Unsupervised Modeling of Longitudinal Alzheimer's Disease Progression with Diffusion Auto-Encoder : Abstract: Generative modeling frameworks have emerged as an effective approach to capture high-dimensional image distributions from large datasets without requiring domain-specific knowledge, a capabi...
- Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation : Abstract: Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained ...
- Global Multiple Extraction Network for Low-Resolution Facial Expression Recognition : Abstract: Facial expression recognition, as a vital computer vision task, is garnering significant attention and undergoing extensive research. Although facial expression recognition algorithms demons...
- Polymap: generating high definition map based on rasterized polygons : Abstract: The perception of high-definition maps is an integral component of environmental perception in autonomous driving systems. Existing research have often focused on online construction of high...
- Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement : Abstract: Remote photoplethysmography (rPPG) is an emerging contactless physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart ra...
- U(PM)$^2$:Unsupervised polygon matching with pre-trained models for challenging stereo images : Abstract: Stereo image matching is a fundamental task in computer vision, photogrammetry and remote sensing, but there is an almost unexplored field, i.e., polygon matching, which faces the following ...
- Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration : Abstract: Typical detection-free methods for image-to-point cloud registration leverage transformer-based architectures to aggregate cross-modal features and establish correspondences. However, they o...
- Commonality in Few: Few-Shot Multimodal Anomaly Detection via Hypergraph-Enhanced Memory : Abstract: Few-shot multimodal industrial anomaly detection is a critical yet underexplored task, offering the ability to quickly adapt to complex industrial scenarios. In few-shot settings, insufficie...
- Adapted Foundation Models for Breast MRI Triaging in Contrast-Enhanced and Non-Contrast Enhanced Protocols : Abstract: Background: Magnetic resonance imaging (MRI) has high sensitivity for breast cancer detection, but interpretation is time-consuming. Artificial intelligence may aid in pre-screening. Purpose...
- A Dual-Mode ViT-Conditioned Diffusion Framework with an Adaptive Conditioning Bridge for Breast Cancer Segmentation : Abstract: In breast ultrasound images, precise lesion segmentation is essential for early diagnosis; however, low contrast, speckle noise, and unclear boundaries make this difficult. Even though deep ...
- Exploring Category-level Articulated Object Pose Tracking on SE(3) Manifolds : Abstract: Articulated objects are prevalent in daily life and robotic manipulation tasks. However, compared to rigid objects, pose tracking for articulated objects remains an underexplored problem due...
- MALeR: Improving Compositional Fidelity in Layout-Guided Generation : Abstract: Recent advances in text-to-image models have enabled a new era of creative and controllable image generation. However, generating compositional scenes with multiple subjects and attributes r...
- How Reasoning Influences Intersectional Biases in Vision Language Models : Abstract: Vision Language Models (VLMs) are increasingly deployed across downstream tasks, yet their training data often encode social biases that surface in outputs. Unlike humans, who interpret imag...
- Distributed Deep Learning for Medical Image Denoising with Data Obfuscation : Abstract: Medical image denoising is essential for improving image quality while minimizing the exposure of sensitive information, particularly when working with large-scale clinical datasets. This st...
- One-Shot Knowledge Transfer for Scalable Person Re-Identification : Abstract: Edge computing in person re-identification (ReID) is crucial for reducing the load on central cloud servers and ensuring user privacy. Conventional compression methods for obtaining compact ...
- MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model : Abstract: Video Frame Interpolation (VFI) remains a cornerstone in video enhancement, enabling temporal upscaling for tasks like slow-motion rendering, frame rate conversion, and video restoration. Wh...
- Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era : Abstract: Visual place recognition (VPR) is typically regarded as a specific image retrieval task, whose core lies in representing images as global descriptors. Over the past decade, dominant VPR meth...
- S2ML: Spatio-Spectral Mutual Learning for Depth Completion : Abstract: The raw depth images captured by RGB-D cameras using Time-of-Flight (TOF) or structured light often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifac...
- ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding : Abstract: Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enh...
- FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation : Abstract: While LLMs have shown great success in financial tasks like stock prediction and question answering, their application in fully automating Equity Research Report generation remains uncharted...
- Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains : Abstract: Large language models (LLMs) have achieved remarkable success across widespread tasks, yet their application in low-resource domains remains a significant challenge due to data scarcity and ...
- Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation : Abstract: Large Language Models (LLMs) have advanced the automated generation of code from natural language prompts. However, low-resource languages (LRLs) like Bangla remain underrepresented due to t...
- Mixed Semi-Supervised Generalized-Linear-Regression with Applications to Deep-Learning and Interpolators : Abstract: We present a methodology for using unlabeled data to design semi-supervised learning (SSL) methods that improve the predictive performance of supervised learning for regression tasks. The ma...
- Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation : Abstract: Equilibrium computation on Riemannian manifolds provides a unifying framework for numerous problems in machine learning and data analytics. One of the simplest yet most fundamental methods i...
- Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data : Abstract: Accurately estimating treatment effects over time is crucial in fields such as precision medicine, epidemiology, economics, and marketing. Many current methods for estimating treatment effec...
- GPU Cluster Scheduling for Network-Sensitive Deep Learning : Abstract: We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads that enables proximity based consolidation of GPU resources based on the DDL jobs' sensitivities to the anticipate...
- Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection : Abstract: This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and co...
- Large Language Model Empowered Next-Generation MIMO Networks: Fundamentals, Challenges, and Visions : Abstract: Next-generation Multiple-Input Multiple-Output (MIMO) is expected to be intelligent and scalable. In this paper, we study Large Language Model (LLM)-enabled next-generation MIMO networks. Fi...
- Contextual Linear Optimization with Partial Feedback : Abstract: Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A...
- Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries : Abstract: Recent progress in Large Language Model (LLM) technology has changed our role in interacting with these models. Instead of primarily testing these models with questions we already know answe...
- Uniform Convergence of Adversarially Robust Classifiers : Abstract: In recent years there has been significant interest in the effect of different types of adversarial perturbations in data classification problems. Many of these models incorporate the advers...
- JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models : Abstract: The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various...
- Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain : Abstract: Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to t...
- Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment : Abstract: Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance. This paper explores a novel approach that exploit...
- Disturbance-based Discretization, Differentiable IDS Channel, and an IDS-Correcting Code for DNA-based Storage : Abstract: With recent advancements in next-generation data storage, especially in biological molecule-based storage, insertion, deletion, and substitution (IDS) error-correcting codes have garnered in...
- Finite sample learning of moving targets : Abstract: We consider a moving target that we seek to learn from samples. Our results extend randomized techniques developed in control and optimization for a constant target to the case where the tar...
- Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth : Abstract: A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, w...
- Conceptual Belief-Informed Reinforcement Learning : Abstract: Reinforcement learning (RL) has achieved significant success but is hindered by inefficiency and instability, relying on large amounts of trial-and-error data and failing to efficiently use ...
- Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning : Abstract: As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted ...
- Grouped Discrete Representation for Object-Centric Learning : Abstract: Object-Centric Learning (OCL) aims to discover objects in images or videos by reconstructing the input. Representative methods achieve this by reconstructing the input as its Variational Aut...
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models : Abstract: Diffusion models can effectively generate high-quality images. However, as they scale, rising memory demands and higher latency pose substantial deployment challenges. In this work, we aim t...
- How do Machine Learning Models Change? : Abstract: The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) en...
- Sparsifying Suprema of Gaussian Processes : Abstract: We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\bol...
- Stochastic interior-point methods for smooth conic optimization with applications : Abstract: Conic optimization plays a crucial role in many machine learning (ML) problems. However, practical algorithms for conic constrained ML problems with large datasets are often limited to speci...
- Mitigating Sexual Content Generation via Embedding Distortion in Text-conditioned Diffusion Models : Abstract: Diffusion models show remarkable image generation performance following text prompts, but risk generating sexual contents. Existing approaches, such as prompt filtering, concept removal, and...
- On (Approximate) Pareto Optimality for the Multinomial Logistic Bandit : Abstract: We provide a new online learning algorithm for tackling the Multinomial Logit Bandit (MNL-Bandit) problem. Despite the challenges posed by the combinatorial nature of the MNL model, we devel...
- Learning Task Representations from In-Context Learning : Abstract: Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning (ICL), where models adapt to new tasks through example-based prompts without requiring parameter ...
- PCS: Perceived Confidence Scoring of Black Box LLMs with Metamorphic Relations : Abstract: Zero-shot LLMs are now also used for textual classification tasks, e.g., sentiment and bias detection in a sentence or article. However, their performance can be suboptimal in such data anno...
- ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor Policy : Abstract: Visuomotor imitation learning policies enable robots to efficiently acquire manipulation skills from visual demonstrations. However, as scene complexity and visual distractions increase, pol...
- PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation : Abstract: Compressing Large Language Models (LLMs) into task-specific Small Language Models (SLMs) encounters two significant challenges: safeguarding domain-specific knowledge privacy and managing li...
- Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood : Abstract: We adopt Gaussian Processes (GPs) as latent functions for probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertaint...
- Topology-Aware Conformal Prediction for Stream Networks : Abstract: Stream networks, a unique class of spatiotemporal graphs, exhibit complex directional flow constraints and evolving dependencies, making uncertainty quantification a critical yet challenging...
- Seismic inversion using hybrid quantum neural networks : Abstract: Seismic inversion-including post-stack, pre-stack, and full waveform inversion is compute and memory-intensive. Recently, several approaches, including physics-informed machine learning, hav...
- Quantitative Evaluation of Quantum/Classical Neural Network Using a Game Solver Metric : Abstract: To evaluate the performance of quantum computing systems relative to classical counterparts and explore the potential, we propose a game-solving benchmark based on Elo ratings in the game of...
- Stacking Variational Bayesian Monte Carlo : Abstract: Approximate Bayesian inference for models with computationally expensive, black-box likelihoods poses a significant challenge, especially when the posterior distribution is complex. Many inf...
- ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness : Abstract: Color plays an important role in human perception and usually provides critical clues in visual reasoning. However, it is unclear whether and how vision-language models (VLMs) can perceive, ...
- Environment-Aware Indoor LoRaWAN Ranging Using Path Loss Model Inversion and Adaptive RSSI Filtering : Abstract: Achieving sub-10 m indoor ranging with LoRaWAN is difficult because multipath, human blockage, and micro-climate dynamics induce non-stationary attenuation in received signal strength indica...
- Probabilistic Wind Power Modelling via Heteroscedastic Non-Stationary Gaussian Processes : Abstract: Accurate probabilistic prediction of wind power is crucial for maintaining grid stability and facilitating the efficient integration of renewable energy sources. Gaussian process (GP) models...
- Revisiting Stochastic Approximation and Stochastic Gradient Descent : Abstract: In this paper, we introduce a new approach to proving the convergence of the Stochastic Approximation (SA) and the Stochastic Gradient Descent (SGD) algorithms. The new approach is based on ...
- Data-assimilated model-informed reinforcement learning : Abstract: The control of spatio-temporally chaos is challenging because of high dimensionality and unpredictability. Model-free reinforcement learning (RL) discovers optimal control policies by intera...
- From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications : Abstract: The convergence of online learning algorithms in games under self-play is a fundamental question in game theory and machine learning. Among various notions of convergence, last-iterate conve...
- Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization : Abstract: Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simula...
- FlashMoE: Fast Distributed MoE in a Single Kernel : Abstract: The computational sparsity of Mixture-of-Experts (MoE) models enables sub-linear growth in compute cost as model size increases, thus offering a scalable path to training massive neural netw...
- Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion : Abstract: We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth con...
- CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study : Abstract: Aimed to develop and validate a CT radiomics-based explainable machine learning model for precise diagnosing malignancy and benignity specifically in endometrial cancer (EC) patients. A tota...
- ReCode: Updating Code API Knowledge with Reinforcement Learning : Abstract: Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from re...
- Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs : Abstract: Despite progress in Large Vision-Language Models (LVLMs), their capacity for visual reasoning is often limited by the binding problem: the failure to reliably associate perceptual features w...
- When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework : Abstract: Recent researchers have proposed using event cameras for person re-identification (ReID) due to their promising performance and better balance in terms of privacy protection, event camera-ba...
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation : Abstract: Existing speech models suffer from competing requirements on token representations by understanding and generation tasks. This discrepancy in representation prevents speech language models f...
- Retracing the Past: LLMs Emit Training Data When They Get Lost : Abstract: The memorization of training data in large language models (LLMs) poses significant privacy and copyright concerns. Existing data extraction methods, particularly heuristic-based divergence ...
- Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning : Abstract: The proliferation of harmful online content--e.g., toxicity, spam, and negative sentiment--demands robust and adaptable moderation systems. However, prevailing moderation systems are central...
- MCP4IFC: IFC-Based Building Design Using Large Language Models : Abstract: Bringing generative AI into the architecture, engineering and construction (AEC) field requires systems that can translate natural language instructions into actions on standardized data mod...
- FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference : Abstract: Traditional KV cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, causing context loss or hallucinations. Recent eff...
- Future of AI Models: A Computational perspective on Model collapse : Abstract: Artificial Intelligence, especially Large Language Models (LLMs), has transformed domains such as software engineering, journalism, creative writing, academia, and media (Naveed et al. 2025;...
- Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements : Abstract: We study architectural and optimization tech- niques for sample-efficient language modeling under the constraints of the BabyLM 2025 shared task. Our model, BLaLM, replaces self-attention wi...
- UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8 : Abstract: Subword tokenization segments input text according to a pre-defined vocabulary to feed it into a language model; the language model, in turn, generates a sequence made from this same vocabul...
- OckBench: Measuring the Efficiency of LLM Reasoning : Abstract: Large language models such as GPT-4, Claude 3, and the Gemini series have improved automated reasoning and code generation. However, existing benchmarks mainly focus on accuracy and output q...
- In-Context Learning Without Copying : Abstract: Induction heads are attention heads that perform inductive copying by matching patterns from earlier context and copying their continuations verbatim. As models develop induction heads, they...
- Multi-Scale Feature Fusion and Graph Neural Network Integration for Text Classification with Large Language Models : Abstract: This study investigates a hybrid method for text classification that integrates deep feature extraction from large language models, multi-scale fusion through feature pyramids, and structure...
- Quantifying Edits Decay in Fine-tuned LLMs : Abstract: Knowledge editing has emerged as a lightweight alternative to retraining for correcting or injecting specific facts in large language models (LLMs). Meanwhile, fine-tuning remains the defaul...
- Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations : Abstract: The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent...
- NILC: Discovering New Intents with LLM-assisted Clustering : Abstract: New intent discovery (NID) seeks to recognize both new and known intents from unlabeled user utterances, which finds prevalent use in practical dialogue systems. Existing works towards NID m...
- IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction : Abstract: Voice-controlled dialog systems have become immensely popular due to their ability to perform a wide range of actions in response to diverse user queries. These agents possess a predefined s...
- Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs : Abstract: Reinforcement learning (RL) is often credited with improving language model reasoning and generalization at the expense of degrading memorized knowledge. We challenge this narrative by obser...
- LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis : Abstract: Clinical interventions often hinge on age: medications and procedures safe for adults may be harmful to children or ineffective for older adults. However, as language models are increasingly...
- Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models: A Study Based on Chinese-Context Discrimination Data : Abstract: Large Language Models (LLMs) often exhibit implicit biases and discriminatory tendencies that reflect underlying social stereotypes. While recent alignment techniques such as RLHF and DPO ha...
- Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework : Abstract: This paper addresses the critical challenge of developing computationally efficient hate speech detection systems that maintain competitive performance while being practical for real-time de...
- ReMoD: Rethinking Modality Contribution in Multimodal Stance Detection via Dual Reasoning : Abstract: Multimodal Stance Detection (MSD) is a crucial task for understanding public opinion on social media. Existing work simply fuses information from various modalities to learn stance represent...
- Automating Hardware Design and Verification from Architectural Papers via a Neural-Symbolic Graph Framework : Abstract: The reproduction of hardware architectures from academic papers remains a significant challenge due to the lack of publicly available source code and the complexity of hardware description l...
- Evaluation of retrieval-based QA on QUEST-LOFT : Abstract: Despite the popularity of retrieval-augmented generation (RAG) as a solution for grounded QA in both academia and industry, current RAG methods struggle with questions where the necessary in...
- Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models : Abstract: Spatial Reasoning is an important component of human cognition and is an area in which the latest Vision-language models (VLMs) show signs of difficulty. The current analysis works use image...
- BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering : Abstract: Aspect-based summarization aims to generate summaries that highlight specific aspects of a text, enabling more personalized and targeted summaries. However, its application to books remains ...
- Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning : Abstract: Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of ...
- Explicit Knowledge-Guided In-Context Learning for Early Detection of Alzheimer's Disease : Abstract: Detecting Alzheimer's Disease (AD) from narrative transcripts remains a challenging task for large language models (LLMs), particularly under out-of-distribution (OOD) and data-scarce condit...
- SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization : Abstract: In high-stakes scenarios-such as self-harm, legal, or medical queries-LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new ali...
- Overview of CHIP 2025 Shared Task 2: Discharge Medication Recommendation for Metabolic Diseases Based on Chinese Electronic Health Records : Abstract: Discharge medication recommendation plays a critical role in ensuring treatment continuity, preventing readmission, and improving long-term management for patients with chronic metabolic dis...
- Analyzing and Mitigating Negation Artifacts using Data Augmentation for Improving ELECTRA-Small Model Accuracy : Abstract: Pre-trained models for natural language inference (NLI) often achieve high performance on benchmark datasets by using spurious correlations, or dataset artifacts, rather than understanding l...
- TimeSense:Making Large Language Models Proficient in Time-Series Analysis : Abstract: In the time-series domain, an increasing number of works combine text with temporal data to leverage the reasoning capabilities of large language models (LLMs) for various downstream time-se...
- HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection : Abstract: Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks....
- SugarTextNet: A Transformer-Based Framework for Detecting Sugar Dating-Related Content on Social Media with Context-Aware Focal Loss : Abstract: Sugar dating-related content has rapidly proliferated on mainstream social media platforms, giving rise to serious societal and regulatory concerns, including commercialization of intimate r...
- How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset : Abstract: Two scientific fields showing increasing interest in pre-trained large language models (LLMs) are drug development / repurposing, and personalized medicine. For both, LLMs have to demonstrat...
- Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop : Abstract: Metaphors and metaphorical language (MLs) play an important role in healthcare communication between clinicians, patients, and patients' family members. In this work, we focus on Dutch langu...
- SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention : Abstract: This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into ke...
- Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages : Abstract: Realignment is a promising strategy to improve cross-lingual transfer in multilingual language models. However, empirical results are mixed and often unreliable, particularly for typological...
- You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations : Abstract: Large Language Models (LLMs) excel across diverse tasks, yet many applications require only limited capabilities, making large variants inefficient in memory and latency. Existing approaches...
- Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement : Abstract: High-quality Question-Answer (QA) datasets are foundational for reliable Large Language Model (LLM) evaluation, yet even expert-crafted datasets exhibit persistent gaps in domain coverage, m...
- Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages : Abstract: Nigeria is the most populous country in Africa with a population of more than 200 million people. More than 500 languages are spoken in Nigeria and it is one of the most linguistically diver...
- MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making : Abstract: As large language models transition from text-based interfaces to audio interactions in clinical settings, they might introduce new vulnerabilities through paralinguistic cues in audio. We e...
- Duality-based Mode Operations and Pyramid Multilayer Mapping for Rhetorical Modes : Abstract: Rhetorical modes are useful in both academic and non-academic writing, and can be subjects to be studied within linguistic research and computational modeling. Establishing a conceptual brid...
- How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models : Abstract: Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that "the AI is biased". While this is often said jokingly, the light-hearted remark reflects a deep...
- Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation : Abstract: Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source...
- Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention : Abstract: Large Language Models (LLMs) have demonstrated remarkable generalization capabilities, but aligning their outputs with human preferences typically requires expensive supervised fine-tuning. ...
- Sentiment Analysis On YouTube Comments Using Machine Learning Techniques Based On Video Games Content : Abstract: The rapid evolution of the gaming industry, driven by technological advancements and a burgeoning community, necessitates a deeper understanding of user sentiments, especially as expressed o...
- Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights : Abstract: Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, ...
- Sensitivity of Small Language Models to Fine-tuning Data Contamination : Abstract: Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data contamination during instruction tuning remains poo...
- SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces : Abstract: The rapid advancement of Large Language Models (LLMs) has driven significant progress in Natural Language Interface to Database (NLIDB). However, the widespread adoption of LLMs has raised c...
- Beyond Plain Demos: A Demo-centric Anchoring Paradigm for In-Context Learning in Alzheimer's Disease Detection : Abstract: Detecting Alzheimer's disease (AD) from narrative transcripts challenges large language models (LLMs): pre-training rarely covers this out-of-distribution task, and all transcript demos desc...
- CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition : Abstract: Automatic speech recognition (ASR) for low-resource languages such as Taiwanese Hokkien is difficult due to the scarcity of annotated data. However, direct fine-tuning on Han-character trans...
- EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers : Abstract: Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their professional competence and ethical safe...
- RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation : Abstract: Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks. However, most of these benchmarks evaluate models p...
- HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection : Abstract: To prevent misinformation and social issues arising from trustworthy-looking content generated by LLMs, it is crucial to develop efficient and reliable methods for identifying the source of ...
- SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs : Abstract: Large language models sometimes inadvertently reproduce passages that are copyrighted, exposing downstream applications to legal risk. Most existing studies for inference-time defences focus...
- Automated Circuit Interpretation via Probe Prompting : Abstract: Mechanistic interpretability aims to understand neural networks by identifying which learned features mediate specific behaviors. Attribution graphs reveal these feature pathways, but interp...
- Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs : Abstract: Large language models have significantly advanced Multilingual Machine Translation (MMT), yet the broad language coverage, consistent translation quality, and English-centric bias remain ope...
- A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation : Abstract: In this paper, we describe our system under the team name BLEU Monday for the English-to-Indic Multimodal Translation Task at WAT 2025. We participate in the text-only translation tasks for ...
- Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks : Abstract: We introduce llama-embed-nemotron-8b, an open-weights text embedding model that achieves state-of-the-art performance on the Multilingual Massive Text Embedding Benchmark (MMTEB) leaderboard...
- Evaluating LLMs for Anxiety, Depression, and Stress Detection Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data : Abstract: Mental health disorders affect over one-fifth of adults globally, yet detecting such conditions from text remains challenging due to the subtle and varied nature of symptom expression. This ...
- Importance-Aware Data Selection for Efficient LLM Instruction Tuning : Abstract: Instruction tuning plays a critical role in enhancing the performance and efficiency of Large Language Models (LLMs). Its success depends not only on the quality of the instruction data but ...
- EmoBang: Detecting Emotion From Bengali Texts : Abstract: Emotion detection from text seeks to identify an individual's emotional or mental state - positive, negative, or neutral - based on linguistic cues. While significant progress has been made ...
- Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora : Abstract: The performance of large language models (LLMs) and large multimodal models (LMMs) depends heavily on the quality and scale of their pre-training datasets. Recent research shows that large m...
- More Agents Helps but Adversarial Robustness Gap Persists : Abstract: When LLM agents work together, they seem to be more powerful than a single LLM in mathematical question answering. However, are they also more robust to adversarial inputs? We investigate th...
- TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in modern medicine, yet their application in Traditional Chinese Medicine (TCM) remains severely limited by the absence...
- Categorical Emotions or Appraisals - Which Emotion Model Explains Argument Convincingness Better? : Abstract: The convincingness of an argument does not only depend on its structure (logos), the person who makes the argument (ethos), but also on the emotion that it causes in the recipient (pathos). ...
- AdaRec: Adaptive Recommendation with LLMs via Narrative Profiling and Dual-Channel Reasoning : Abstract: We propose AdaRec, a few-shot in-context learning framework that leverages large language models for an adaptive personalized recommendation. AdaRec introduces narrative profiling, transform...
- EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models : Abstract: Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we ...
- Discourse Graph Guided Document Translation with Large Language Models : Abstract: Adapting large language models to full document translation remains challenging due to the difficulty of capturing long-range dependencies and preserving discourse coherence throughout exten...
- Who Is the Story About? Protagonist Entity Recognition in News : Abstract: News articles often reference numerous organizations, but traditional Named Entity Recognition (NER) treats all mentions equally, obscuring which entities genuinely drive the narrative. This...
- Retriv at BLP-2025 Task 1: A Transformer Ensemble and Multi-Task Learning Approach for Bangla Hate Speech Identification : Abstract: This paper addresses the problem of Bangla hate speech identification, a socially impactful yet linguistically challenging task. As part of the "Bangla Multi-task Hate Speech Identification"...
- Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss : Abstract: Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question....
- Diffusion Posterior Sampling is Computationally Intractable : Abstract: Diffusion models are a remarkably effective way of learning and sampling from a distribution $p(x)$. In posterior sampling, one is also given a measurement model $p(y \mid x)$ and a measurem...
- Optimization without Retraction on the Random Generalized Stiefel Manifold : Abstract: Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices...
- $\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers : Abstract: Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (\emph{meta-generaliz...
- GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights : Abstract: Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown ...
- Addressing Polarization and Unfairness in Performative Prediction : Abstract: In many real-world applications of machine learning such as recommendations, hiring, and lending, deployed models influence the data they are trained on, leading to feedback loops between pr...
- Preference-Guided Reinforcement Learning for Efficient Exploration : Abstract: In this paper, we investigate preference-based reinforcement learning (PbRL), which enables reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable whe...
- Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning : Abstract: In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily ...
- Condensed Data Expansion Using Model Inversion for Knowledge Distillation : Abstract: Condensed datasets offer a compact representation of larger datasets, but training models directly on them or using them to enhance model performance through knowledge distillation (KD) can ...
- MENSA: A Multi-Event Network for Survival Analysis with Trajectory-based Likelihood Estimation : Abstract: Most existing time-to-event methods focus on either single-event or competing-risks settings, leaving multi-event scenarios relatively underexplored. In many healthcare applications, for exa...
- The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards : Abstract: While Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents to follow instructions, our research reveals that agents guided by VLM rewar...
- FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization : Abstract: Federated Learning (FL) is a method for training machine learning models using distributed data sources. It ensures privacy by allowing clients to collaboratively learn a shared global model...
- Understanding Forgetting in LLM Supervised Fine-Tuning and Preference Learning - A Convex Optimization Perspective : Abstract: The post-training of LLMs, which typically consists of the supervised fine-tuning (SFT) stage and the preference learning stage (RLHF or DPO), is crucial to effective and safe LLM applicatio...
- Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training : Abstract: Network pruning focuses on algorithms that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the ...
- On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients : Abstract: The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. The p...
- Adaptive Group Robust Ensemble Knowledge Distillation : Abstract: Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplifie...
- Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics : Abstract: Due largely to challenges associated with physical interpretability of machine learning (ML) methods, and because model interpretability is key to credibility in management applications, man...
- Understanding and Mitigating Memorization in Diffusion Models for Tabular Data : Abstract: Tabular data generation has attracted significant research interest in recent years, with the tabular diffusion models greatly improving the quality of synthetic data. However, while memoriz...
- A solvable model of learning generative diffusion: theory and insights : Abstract: In this manuscript, we consider the problem of learning a flow or diffusion-based generative model parametrized by a two-layer auto-encoder, trained with online stochastic gradient descent, ...
- pMixFed: Efficient Personalized Federated Learning through Adaptive Layer-Wise Mixup : Abstract: Traditional Federated Learning (FL) methods encounter significant challenges when dealing with heterogeneous data and providing personalized solutions for non-IID scenarios. Personalized Fed...
- HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor : Abstract: Large Language Models (LLMs) commonly rely on explicit refusal prefixes for safety, making them vulnerable to prefix injection attacks. We introduce HumorReject, a novel data-driven approach...
- Distributional Surgery for Language Model Activations : Abstract: Language models, while capable of generating remarkably coherent and seemingly accurate text, can occasionally produce undesirable content, including harmful or toxic outputs. In this paper,...
- HyperSHAP: Shapley Values and Interactions for Explaining Hyperparameter Optimization : Abstract: Hyperparameter optimization (HPO) is a crucial step in achieving strong predictive performance. Yet, the impact of individual hyperparameters on model generalization is highly context-depend...
- Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation : Abstract: The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-...
- Tight Bounds for Jensen's Gap with Applications to Variational Inference : Abstract: Since its original formulation, Jensen's inequality has played a fundamental role across mathematics, statistics, and machine learning, with its probabilistic version highlighting the nonneg...
- Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling : Abstract: Diffusion models, though originally designed for generative tasks, have demonstrated impressive self-supervised representation learning capabilities. A particularly intriguing phenomenon in ...
- SPIRIT: Short-term Prediction of solar IRradIance for zero-shot Transfer learning using Foundation Models : Abstract: Traditional solar forecasting models are based on several years of site-specific historical irradiance data, often spanning five or more years, which are unavailable for newer photovoltaic f...
- TrustChain: A Blockchain Framework for Auditing and Verifying Aggregators in Decentralized Federated Learning : Abstract: The server-less nature of Decentralized Federated Learning (DFL) requires allocating the aggregation role to specific participants in each federated round. Current DFL architectures ensure t...
- A Novel Loss Function for Deep Learning Based Daily Stock Trading System : Abstract: Making consistently profitable financial decisions in a continuously evolving and volatile stock market has always been a difficult task. Professionals from different disciplines have develo...
- Continual Pre-training of MoEs: How robust is your router? : Abstract: Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers that require the same amount of floating-point oper...
- Bayesian Network Structural Consensus via Greedy Min-Cut Analysis : Abstract: This paper presents the Min-Cut Bayesian Network Consensus (MCBNC) algorithm, a greedy method for structural consensus of Bayesian Networks (BNs), with applications in federated learning and...
- Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching : Abstract: Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. In this paper, we propose an L2 Cache-...
- Robust Hallucination Detection in LLMs via Adaptive Token Selection : Abstract: Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment. Recent research in hallucination detection has demonstrated that LLMs' i...
- Quantum Doubly Stochastic Transformers : Abstract: At the core of the Transformer, the softmax normalizes the attention matrix to be right stochastic. Previous research has shown that this often de-stabilizes training and that enforcing the ...
- When Bias Helps Learning: Bridging Initial Prejudice and Trainability : Abstract: Understanding the statistical properties of deep neural networks (DNNs) at initialization is crucial for elucidating both their trainability and the intrinsic architectural biases they encod...
- Private Statistical Estimation via Truncation : Abstract: We introduce a novel framework for differentially private (DP) statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded. Tra...
- Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling : Abstract: Learning the underlying dynamics of single cells from snapshot data has gained increasing attention in scientific and machine learning research. The destructive measurement technique and cel...
- The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute : Abstract: Scaling large language models (LLMs) has driven significant advancements, yet it faces diminishing returns and escalating energy demands. This work explores how test-time compute (TTC) can s...
- Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets : Abstract: Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discover...
- Graph-Conditional Flow Matching for Relational Data Generation : Abstract: Data synthesis is gaining momentum as a privacy-enhancing technology. While single-table tabular data generation has seen considerable progress, current methods for multi-table data often la...
- Guided Diffusion Sampling on Function Spaces with Applications to PDEs : Abstract: We propose a general framework for conditional sampling in PDE-based inverse problems, targeting the recovery of whole solutions from extremely sparse or noisy measurements. This is accompli...
- On the Relation between Rectified Flows and Optimal Transport : Abstract: This paper investigates the connections between rectified flows, flow matching, and optimal transport. Flow matching is a recent approach to learning generative models by estimating velocity...
- Rotary Masked Autoencoders are Versatile Learners : Abstract: Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method c...
- Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields : Abstract: Flow matching casts sample generation as learning a continuous-time velocity field that transports noise to data. Existing flow matching networks typically predict each point's velocity inde...
- From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching : Abstract: Models that learn spurious correlations from training data often fail when deployed in new environments. While many methods aim to learn invariant representations to address this, they often...
- Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications : Abstract: Differential privacy (DP) is a key technique for protecting sensitive patient data in medical deep learning (DL). As clinical models grow more data-dependent, balancing privacy with utility ...
- Causal Discovery in Dynamic Fading Wireless Networks : Abstract: Dynamic causal discovery in wireless networks is essential due to evolving interference, fading, and mobility, which complicate traditional static causal models. This paper addresses causal ...
- Dissecting Long-Chain-of-Thought Reasoning Models: An Empirical Study : Abstract: Despite recent progress in training long-chain-of-thought reasoning models via scaling reinforcement learning (RL), its underlying training dynamics remain poorly understood, and several cou...
- PyLO: Towards Accessible Learned Optimizers in PyTorch : Abstract: Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for w...
- Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers : Abstract: When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do m...
- Learning Stochastic Multiscale Models : Abstract: The physical sciences are replete with dynamical systems that require the resolution of a wide range of length and time scales. This presents significant computational challenges since direc...
- Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes : Abstract: Safety is an essential requirement for reinforcement learning systems. The newly emerging framework of robust constrained Markov decision processes allows learning policies that satisfy long...
- GPT, But Backwards: Exactly Inverting Language Model Outputs : Abstract: The task of reconstructing unknown textual inputs to language models is a fundamental auditing primitive that allows us to assess the model's vulnerability to a range of security issues, inc...
- Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention : Abstract: Multi-head attention powers Transformer networks, the primary deep learning architecture behind the success of large language models (LLMs). Yet, the theoretical advantages of multi-head ver...
- Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery : Abstract: Large Language Models frequently generate outputs that appear scientifically reasonable yet violate fundamental principles--a phenomenon we characterize as the "plausibility-validity gap." T...
- Monitoring Risks in Test-Time Adaptation : Abstract: Encountering shifted data at test time is a ubiquitous challenge when deploying predictive models. Test-time adaptation (TTA) methods address this issue by continuously adapting a deployed m...
- MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling : Abstract: Recent years have witnessed a growing interest for time series foundation models, with a strong emphasis on the forecasting task. Yet, the crucial task of out-of-domain imputation of missing...
- Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning : Abstract: Fine-tuning large language models (LLMs) can lead to unintended out-of-distribution generalization. Standard approaches to this problem rely on modifying training data, for example by adding...
- Dual-Branch Convolutional Framework for Spatial and Frequency-Based Image Forgery Detection : Abstract: With a very rapid increase in deepfakes and digital image forgeries, ensuring the authenticity of images is becoming increasingly challenging. This report introduces a forgery detection fram...
- An upper bound of the silhouette validation metric for clustering : Abstract: The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in [-1, 1]. The average silhouette widt...
- EASE: Practical and Efficient Safety Alignment for Small Language Models : Abstract: Small language models (SLMs) are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. Current shallow alignment methods that rely on direct refusal o...
- FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis : Abstract: Mathematical reasoning requires abstracting symbolic rules from visual patterns -- inferring the infinite from the finite. We investigate whether multimodal AI systems possess this capabilit...
- Rep2Text: Decoding Full Text from a Single LLM Token Representation : Abstract: Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to ...
- CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning : Abstract: Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliabilit...
- TabRAG: Tabular Document Retrieval via Structured Language Representations : Abstract: Ingesting data for Retrieval-Augmented Generation (RAG) involves either fine-tuning the embedding model directly on the target corpus or parsing documents for embedding model encoding. The f...
- Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm : Abstract: The convergence of statistical learning and molecular physics is transforming our approach to modeling biomolecular systems. Physics-informed machine learning (PIML) offers a systematic fram...
- GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization : Abstract: Contracts are complex documents featuring detailed formal structures, explicit and implicit dependencies and rich semantic content. Given these document properties, contract drafting and man...
- Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from LDCT : Abstract: Low-dose chest computed tomography (LDCT) inherently captures both pulmonary and cardiac structures, offering a unique opportunity for joint assessment of lung and cardiovascular health. How...
- Adaptive Testing for Segmenting Watermarked Texts From Language Models : Abstract: The rapid adoption of large language models (LLMs), such as GPT-4 and Claude 3.5, underscores the need to distinguish LLM-generated text from human-written content to mitigate the spread of ...
- GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising : Abstract: Accurate Channel State Information (CSI) is critical for Hybrid Beamforming (HBF) tasks. However, obtaining high-resolution CSI remains challenging in practical wireless communication system...
- When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare : Abstract: In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed...
- Adam symmetry theorem: characterization of the convergence of the stochastic Adam optimizer : Abstract: Beside the standard stochastic gradient descent (SGD) method, the Adam optimizer due to Kingma & Ba (2014) is currently probably the best-known optimization method for the training of deep n...
- Flexible Concept Bottleneck Model : Abstract: Concept bottleneck models (CBMs) improve neural network interpretability by introducing an intermediate layer that maps human-understandable concepts to predictions. Recent work has explored...
- Lassoed Forests: Random Forests with Adaptive Lasso Post-selection : Abstract: Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regress...
- The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning : Abstract: This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with ...
- SRNN: Spatiotemporal Relational Neural Network for Intuitive Physics Understanding : Abstract: Human prowess in intuitive physics remains unmatched by machines. To bridge this gap, we argue for a fundamental shift towards brain-inspired computational principles. This paper introduces ...
- Bilevel Learning via Inexact Stochastic Gradient Descent : Abstract: Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-ad...
- OntoTune: Ontology-Driven Learning for Query Optimization with Convolutional Models : Abstract: Query optimization has been studied using machine learning, reinforcement learning, and, more recently, graph-based convolutional networks. Ontology, as a structured, information-rich knowle...
- HEDN: A Hard-Easy Dual Network with Task Difficulty Assessment for EEG Emotion Recognition : Abstract: Multi-source domain adaptation represents an effective approach to addressing individual differences in cross-subject EEG emotion recognition. However, existing methods treat all source doma...
- Learning to Fast Unrank in Collaborative Filtering Recommendation : Abstract: Modern data-driven recommendation systems risk memorizing sensitive user behavioral patterns, raising privacy concerns. Existing recommendation unlearning methods, while capable of removing ...
- MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning : Abstract: Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in vision-language answering tasks. Despite their strengths, these models often encounter challenges in ach...
- Convergence of Actor-Critic Learning for Mean Field Games and Mean Field Control in Continuous Spaces : Abstract: We establish the convergence of the deep actor-critic reinforcement learning algorithm presented in [Angiuli et al., 2023a] in the setting of continuous state and action spaces with an infin...
- Learning to Focus: Focal Attention for Selective and Scalable Transformers : Abstract: Attention is a core component of transformer architecture, whether encoder-only, decoder-only, or encoder-decoder model. However, the standard softmax attention often produces noisy probabil...
- Dimensionality reduction and width of deep neural networks based on topological degree theory : Abstract: In this paper we present a mathematical framework on linking of embeddings of compact topological spaces into Euclidean spaces and separability of linked embeddings under a specific class of...
- P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats : Abstract: The substantial memory bandwidth and computational demand of large language models (LLMs) present critical challenges for efficient inference. To tackle this, the literature has explored het...
- Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment : Abstract: Safety alignment instills in Large Language Models (LLMs) a critical capacity to refuse malicious requests. Prior works have modeled this refusal mechanism as a single linear direction in th...
- Inclusion of Role into Named Entity Recognition and Ranking : Abstract: Most of the Natural Language Processing sys- tems are involved in entity-based processing for several tasks like Information Extraction, Question-Answering, Text-Summarization and so on. A n...
- Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization : Abstract: Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However,...
- TrueCity: Real and Simulated Urban Data for Cross-Domain 3D Scene Understanding : Abstract: 3D semantic scene understanding remains a long-standing challenge in the 3D computer vision community. One of the key issues pertains to limited real-world annotated data to facilitate gener...
- Multilingual Lexical Feature Analysis of Spoken Language for Predicting Major Depression Symptom Severity : Abstract: Background: Captured between clinical appointments using mobile devices, spoken language has potential for objective, more regular assessment of symptom severity and earlier detection of rel...
- Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT : Abstract: Early cancer detection is crucial for improving patient outcomes, and 18F FDG PET/CT imaging plays a vital role by combining metabolic and anatomical information. Accurate lesion detection r...
- When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Evidence Extraction : Abstract: Feature attribution methods typically provide minimal sufficient evidence justifying a model decision. However, in many applications this is inadequate. For compliance and cataloging, the fu...
- Aligning Attention with Human Rationales for Self-Explaining Hate Speech Detection : Abstract: The opaque nature of deep learning models presents significant challenges for the ethical deployment of hate speech detection systems. To address this limitation, we introduce Supervised Rat...
- ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora : Abstract: Large-scale visual out-of-distribution (OOD) detection has witnessed remarkable progress by leveraging vision-language models such as CLIP. However, a significant limitation of current metho...
- RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services : Abstract: As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifti...
- Sample-efficient quantum error mitigation via classical learning surrogates : Abstract: The pursuit of practical quantum utility on near-term quantum processors is critically challenged by their inherent noise. Quantum error mitigation (QEM) techniques are leading solutions to ...
- E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis : Abstract: Recent advancements in speech synthesis technology have enriched our daily lives, with high-quality and human-like audio widely adopted across real-world applications. However, malicious exp...
- A Provably-Correct and Robust Convex Model for Smooth Separable NMF : Abstract: Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for nonnegative data, with applications such as hyperspectral unmixing and topic modeling. NMF is a diff...
- Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought : Abstract: Large Language Models (LLMs) have demonstrated strong reasoning capabilities through \emph{Chain-of-Thought} (CoT) prompting, which enables step-by-step intermediate reasoning. However, expl...
- LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging : Abstract: Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for fine-tuning large language models.However, conventional LoRA adapters are typically trained for a single task, li...
- Trading Vector Data in Vector Databases : Abstract: Vector data trading is essential for cross-domain learning with vector databases, yet it remains largely unexplored. We study this problem under online learning, where sellers face uncertain...
- Dynamics-Decoupled Trajectory Alignment for Sim-to-Real Transfer in Reinforcement Learning for Autonomous Driving : Abstract: Reinforcement learning (RL) has shown promise in robotics, but deploying RL on real vehicles remains challenging due to the complexity of vehicle dynamics and the mismatch between simulation...
- Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use : Abstract: Deep learning-based video surveillance increasingly demands privacy-preserving architectures with low computational and environmental overhead. Federated learning preserves privacy but deplo...
- Simulation-based Methods for Optimal Sampling Design in Systems Biology : Abstract: In many areas of systems biology, including virology, pharmacokinetics, and population biology, dynamical systems are commonly used to describe biological processes. These systems can be cha...
- Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization : Abstract: Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critic...
- Noise & pattern: identity-anchored Tikhonov regularization for robust structural anomaly detection : Abstract: Anomaly detection plays a pivotal role in automated industrial inspection, aiming to identify subtle or rare defects in otherwise uniform visual patterns. As collecting representative exampl...
- A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube : Abstract: We give the first fully polynomial-time algorithm for learning halfspaces with respect to the uniform distribution on the hypercube in the presence of contamination, where an adversary may c...
- PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork : Abstract: Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent...
- AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning : Abstract: Scientific Machine Learning (SciML) integrates data-driven inference with physical modeling to solve complex problems in science and engineering. However, the design of SciML architectures, ...
- High-Dimensional Asymptotics of Differentially Private PCA : Abstract: In differential privacy, statistics of a sensitive dataset are privatized by introducing random noise. Most privacy analyses provide privacy bounds specifying a noise level sufficient to ach...
- The Value of Personalized Recommendations: Evidence from Netflix : Abstract: Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We buil...
- De-Individualizing fMRI Signals via Mahalanobis Whitening and Bures Geometry : Abstract: Functional connectivity has been widely investigated to understand brain disease in clinical studies and imaging-based neuroscience, and analyzing changes in functional connectivity has prov...
- RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments : Abstract: We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach using verifiable environments that procedurally generate problems and provide algorithmical...
- When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs : Abstract: Despite substantial advances, large language models (LLMs) continue to exhibit hallucinations, generating plausible yet incorrect responses. In this paper, we highlight a critical yet previo...
- Garbage Vulnerable Point Monitoring using IoT and Computer Vision : Abstract: This paper proposes a smart way to manage municipal solid waste by using the Internet of Things (IoT) and computer vision (CV) to monitor illegal waste dumping at garbage vulnerable points (...
- DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas : Abstract: Simulating human profiles by instilling personas into large language models (LLMs) is rapidly transforming research in agentic behavioral simulation, LLM personalization, and human-AI alignm...
- Walsh-Hadamard Neural Operators for Solving PDEs with Discontinuous Coefficients : Abstract: Neural operators have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). However, standard spectral methods based on Fourier transforms strug...
- Inference-Time Scaling of Diffusion Models for Infrared Data Generation : Abstract: Infrared imagery enables temperature-based scene understanding using passive sensors, particularly under conditions of low visibility where traditional RGB imaging fails. Yet, developing dow...
- UAV-Assisted Resilience in 6G and Beyond Network Energy Saving: A Multi-Agent DRL Approach : Abstract: This paper investigates the unmanned aerial vehicle (UAV)-assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base ...
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence : Abstract: Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert exis...
- Solving bilevel optimization via sequential minimax optimization : Abstract: In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth...
- StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation : Abstract: Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered. Previous image-based streaming diffusion models have powered efficien...
- SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards : Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language tasks, but they continue to struggle with spatial understanding. Existing spatial MLLMs often re...
- DigiData: Training and Evaluating General-Purpose Mobile Control Agents : Abstract: AI agents capable of controlling user interfaces have the potential to transform human interaction with digital devices. To accelerate this transformation, two fundamental building blocks ar...
- Language Generation with Infinite Contamination : Abstract: We study language generation in the limit, where an algorithm observes an adversarial enumeration of strings from an unknown target language $K$ and must eventually generate new, unseen stri...
- A CNN-LSTM Quantifier for Single Access Point CSI Indoor Localization : Abstract: This paper proposes a combined network structure between convolutional neural network (CNN) and long-short term memory (LSTM) quantifier for WiFi fingerprinting indoor localization. In contr...
- Revenue Maximization and Learning in Products Ranking : Abstract: We consider the revenue maximization problem for an online retailer who plans to display in order a set of products differing in their prices and qualities. Consumers have attention spans, i...
- Explaining Bayesian Neural Networks : Abstract: To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of Explainable AI (XAI) was established to provide interpretations of DNNs' predictions. While...
- Impacts of Individual Fairness on Group Fairness from the Perspective of Generalized Entropy : Abstract: This paper investigates how the degree of group fairness changes when the degree of individual fairness is actively controlled. As a metric quantifying individual fairness, we consider gener...
- Corruptions of Supervised Learning Problems: Typology and Mitigations : Abstract: Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unif...
- Weight-Entanglement Meets Gradient-Based Neural Architecture Search : Abstract: Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architectural spaces significantly faster than traditional ...
- Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery : Abstract: Traditional methods for identifying structurally similar spreadsheets fail to capture the spatial layouts and type patterns defining templates. To quantify spreadsheet similarity, we introdu...
- Rethinking Crystal Symmetry Prediction: A Decoupled Perspective : Abstract: Efficiently and accurately determining the symmetry is a crucial step in the structural analysis of crystalline materials. Existing methods usually mindlessly apply deep learning models whil...
- Fast Bayesian Updates via Harmonic Representations : Abstract: Bayesian inference, while foundational to probabilistic reasoning, is often hampered by the computational intractability of posterior distributions, particularly through the challenging evid...
- Breaking the Gradient Barrier: Unveiling Large Language Models for Strategic Classification : Abstract: Strategic classification~(SC) explores how individuals or entities modify their features strategically to achieve favorable classification outcomes. However, existing SC methods, which are l...
- HCFSLN: Adaptive Hyperbolic Few-Shot Learning for Multimodal Anxiety Detection : Abstract: Anxiety disorders impact millions globally, yet traditional diagnosis relies on clinical interviews, while machine learning models struggle with overfitting due to limited data. Large-scale ...
- CoLM: Collaborative Large Models via A Client-Server Paradigm : Abstract: Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively gen...
- S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening : Abstract: Virtual screening (VS) is an essential task in drug discovery, focusing on the identification of small-molecule ligands that bind to specific protein pockets. Existing deep learning methods,...
- Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time : Abstract: Graph anomaly detection (GAD), which aims to detect outliers in graph-structured data, has received increasing research attention recently. However, existing GAD methods assume identical tra...
- Fair Bayesian Data Selection via Generalized Discrepancy Measures : Abstract: Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level,...
- Learning Quantized Continuous Controllers for Integer Hardware : Abstract: Deploying continuous-control reinforcement learning policies on embedded hardware requires meeting tight latency and power budgets. Small FPGAs can deliver these, but only if costly floating...
- Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations : Abstract: Federated clustering allows multiple parties to discover patterns in distributed data without sharing raw samples. To reduce overhead, many protocols disclose intermediate centroids during t...
- Direct Molecular Polarizability Prediction with SO(3) Equivariant Local Frame GNNs : Abstract: We introduce a novel equivariant graph neural network (GNN) architecture designed to predict the tensorial response properties of molecules. Unlike traditional frameworks that focus on regre...
- On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation : Abstract: Explicit latent variable models provide a flexible yet powerful framework for data synthesis, enabling controlled manipulation of generative factors. With latent variables drawn from a tract...
- REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks : Abstract: Large Language Models (LLMs) and causal learning each hold strong potential for clinical decision making (CDM). However, their synergy remains poorly understood, largely due to the lack of s...
- Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation : Abstract: Recent advances in latent diffusion models have demonstrated state-of-the-art performance in high-dimensional time-series data synthesis while providing flexible control through conditioning...
- Guiding Generative Models to Uncover Diverse and Novel Crystals via Reinforcement Learning : Abstract: Discovering functional crystalline materials entails navigating an immense combinatorial design space. While recent advances in generative artificial intelligence have enabled the sampling o...
- LLMscape : Abstract: LLMscape is an interactive installation that investigates how humans and AI construct meaning under shared conditions of uncertainty. Within a mutable, projection-mapped landscape, human par...
- Combining digital data streams and epidemic networks for real time outbreak detection : Abstract: Responding to disease outbreaks requires close surveillance of their trajectories, but outbreak detection is hindered by the high noise in epidemic time series. Aggregating information acros...
- Fuzzy Label: From Concept to Its Application in Label Learning : Abstract: Label learning is a fundamental task in machine learning that aims to construct intelligent models using labeled data, encompassing traditional single-label and multi-label classification mo...
- On Stealing Graph Neural Network Models : Abstract: Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in reality, the number of allowed queries can b...
- Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning : Abstract: Large language models (LLMs) demonstrate impressive generalization abilities, yet adapting them effectively across multiple heterogeneous domains remains challenging due to inter-domain inte...
- SMiLE: Provably Enforcing Global Relational Properties in Neural Networks : Abstract: Artificial Intelligence systems are increasingly deployed in settings where ensuring robustness, fairness, or domain-specific properties is essential for regulation compliance and alignment ...
- DETECT: Data-Driven Evaluation of Treatments Enabled by Classification Transformers : Abstract: Chronic pain is a global health challenge affecting millions of individuals, making it essential for physicians to have reliable and objective methods to measure the functional impact of cli...
- Deep Neural Operator Learning for Probabilistic Models : Abstract: We propose a deep neural-operator framework for a general class of probability models. Under global Lipschitz conditions on the operator over the entire Euclidean space-and for a broad class...
- Does TabPFN Understand Causal Structures? : Abstract: Causal discovery is fundamental for multiple scientific domains, yet extracting causal information from real world data remains a significant challenge. Given the recent success on real data...
- The Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models : Abstract: Large-scale models are at the forefront of time series (TS) forecasting, dominated by two paradigms: fine-tuning text-based Large Language Models (LLM4TS) and training Time Series Foundation...
- Understanding the role of depth in the neural tangent kernel for overparameterized neural networks : Abstract: Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and ...
- Multi-modal Dynamic Proxy Learning for Personalized Multiple Clustering : Abstract: Multiple clustering aims to discover diverse latent structures from different perspectives, yet existing methods generate exhaustive clusterings without discerning user interest, necessitati...
- RobustA: Robust Anomaly Detection in Multimodal Data : Abstract: In recent years, multimodal anomaly detection methods have demonstrated remarkable performance improvements over video-only models. However, real-world multimodal data is often corrupted due...
- MG-HGNN: A Heterogeneous GNN Framework for Indoor Wi-Fi Fingerprint-Based Localization : Abstract: Received signal strength indicator (RSSI) is the primary representation of Wi-Fi fingerprints and serves as a crucial tool for indoor localization. However, existing RSSI-based positioning m...
- Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization : Abstract: Learning complex policies with Reinforcement Learning (RL) is often hindered by instability and slow convergence, a problem exacerbated by the difficulty of reward engineering. Imitation Lea...
- Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas? : Abstract: Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we devel...
- Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search : Abstract: Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a bo...
- Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training : Abstract: Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RA...
- Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis : Abstract: It introduces FractalNet, a fractal-inspired computational architectures for advanced large language model analysis that mainly challenges model diversity on a large scale in an efficient ma...
- Grounding Computer Use Agents on Human Demonstrations : Abstract: Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen elements. While large datasets exist for web and mobile...
- TNT: Improving Chunkwise Training for Test-Time Memorization : Abstract: Recurrent neural networks (RNNs) with deep test-time memorization modules, such as Titans and TTT, represent a promising, linearly-scaling paradigm distinct from Transformers. While these ex...
- Self-Evaluating LLMs for Multi-Step Tasks: Stepwise Confidence Estimation for Failure Detection : Abstract: Reliability and failure detection of large language models (LLMs) is critical for their deployment in high-stakes, multi-step reasoning tasks. Prior work explores confidence estimation for s...
- Private Sketches for Linear Regression : Abstract: Linear regression is frequently applied in a variety of domains. In order to improve the efficiency of these methods, various methods have been developed that compute summaries or \emph{sket...
- Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning : Abstract: Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RLVR and inference scaling with outcome or process reward models (O...
- Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training : Abstract: Recent curriculum techniques in the post-training stage of LLMs have been widely observed to outperform non-curriculum approaches in enhancing reasoning performance, yet a principled underst...
- Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization : Abstract: The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning...
- LoReTTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs : Abstract: Temporal Graph Neural Networks (TGNNs) are increasingly used in high-stakes domains, such as financial forecasting, recommendation systems, and fraud detection. However, their susceptibility...
- A Diffusion Model to Shrink Proteins While Maintaining Their Function : Abstract: Many proteins useful in modern medicine or bioengineering are challenging to make in the lab, fuse with other proteins in cells, or deliver to tissues in the body, because their sequences ar...
- C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning : Abstract: Large language models (LLMs) have achieved impressive results on complex reasoning tasks, but their high inference cost remains a major barrier to real-world deployment. A promising solution...
- Entangled Schr\"odinger Bridge Matching : Abstract: Simulating trajectories of multi-particle systems on complex energy landscapes is a central task in molecular dynamics (MD) and drug discovery, but remains challenging at scale due to comput...
- Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs : Abstract: Sparse Mixture-of-Experts (MoE) have been widely adopted in recent large language models since it can efficiently scale up the model capability without increasing the inference cost. However...
- Socially Aware Music Recommendation: A Multi-Modal Graph Neural Networks for Collaborative Music Consumption and Community-Based Engagement : Abstract: This study presents a novel Multi-Modal Graph Neural Network (MM-GNN) framework for socially aware music recommendation, designed to enhance personalization and foster community-based engage...
- Weightless Neural Networks for Continuously Trainable Personalized Recommendation Systems : Abstract: Given that conventional recommenders, while deeply effective, rely on large distributed systems pre-trained on aggregate user data, incorporating new data necessitates large training cycles,...
- iEEG Seizure Detection with a Sparse Hyperdimensional Computing Accelerator : Abstract: Implantable devices for reliable intracranial electroencephalography (iEEG) require efficient, accurate, and real-time detection of seizures. Dense hyperdimensional computing (HDC) proves to...
- Gravity-Awareness: Deep Learning Models and LLM Simulation of Human Awareness in Altered Gravity : Abstract: Earth's gravity has fundamentally shaped human development by guiding the brain's integration of vestibular, visual, and proprioceptive inputs into an internal model of gravity: a dynamic ne...
- Bridging Accuracy and Explainability in EEG-based Graph Attention Network for Depression Detection : Abstract: Depression is a major cause of global mental illness and significantly influences suicide rates. Timely and accurate diagnosis is essential for effective intervention. Electroencephalography...
- Token Is All You Need: Cognitive Planning through Sparse Intent Alignment : Abstract: We challenge the long-standing assumption that exhaustive scene modeling is required for high-performance end-to-end autonomous driving (E2EAD). Unlike world-model approaches that rely on co...
- Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability : Abstract: Translating the internal representations and computations of models into concepts that humans can understand is a key goal of interpretability. While recent dictionary learning methods such ...
- ConnectomeBench: Can LLMs Proofread the Connectome? : Abstract: Connectomics - the mapping of neural connections in an organism's brain - currently requires extraordinary human effort to proofread the data collected from imaging and machine-learning assi...
- Factual and Musical Evaluation Metrics for Music Language Models : Abstract: Music language models (Music LMs), like vision language models, leverage multimodal representations to answer natural language queries about musical audio recordings. Although Music LMs are ...
- MCFCN: Multi-View Clustering via a Fusion-Consensus Graph Convolutional Network : Abstract: Existing Multi-view Clustering (MVC) methods based on subspace learning focus on consensus representation learning while neglecting the inherent topological structure of data. Despite the in...
- Automatic Extraction of Road Networks by using Teacher-Student Adaptive Structural Deep Belief Network and Its Application to Landslide Disaster : Abstract: An adaptive structural learning method of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) has been developed as one of prominent deep learning models. The neuron generation-...
- Do Street View Imagery and Public Participation GIS align: Comparative Analysis of Urban Attractiveness : Abstract: As digital tools increasingly shape spatial planning practices, understanding how different data sources reflect human experiences of urban environments is essential. Street View Imagery (SV...
- Beyond Resolution: Multi - Scale Weather and Climate Data for Alpine Renewable Energy in the Digital Twin Era - First Evaluations and Recommendations : Abstract: When Austrian hydropower production plummeted by 44% in early 2025 due to reduced snowpack, it exposed a critical vulnerability: standard meteorological and climatological datasets systemati...
- Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps : Abstract: Class Activation Mapping (CAM) and its extensions have become indispensable tools for visualizing the evidence behind deep network predictions. However, by relying on a final softmax classif...
- From Prompts to Power: Measuring the Energy Footprint of LLM Inference : Abstract: The rapid expansion of Large Language Models (LLMs) has introduced unprecedented energy demands, extending beyond training to large-scale inference workloads that often dominate total lifecy...
- Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations : Abstract: Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability ...
- Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition : Abstract: For embodied agents to effectively understand and interact within the world around them, they require a nuanced comprehension of human actions grounded in physical space. Current action reco...
- Registration-Free Monitoring of Unstructured Point Cloud Data via Intrinsic Geometrical Properties : Abstract: Modern sensing technologies have enabled the collection of unstructured point cloud data (PCD) of varying sizes, which are used to monitor the geometric accuracy of 3D objects. PCD are widel...
- Optimizing Diversity and Quality through Base-Aligned Model Collaboration : Abstract: Alignment has greatly improved large language models (LLMs)' output quality at the cost of diversity, yielding highly similar outputs across generations. We propose Base-Aligned Model Collab...
- VMDT: Decoding the Trustworthiness of Video Foundation Models : Abstract: As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustwo...
- Zero-Shot Function Encoder-Based Differentiable Predictive Control : Abstract: We introduce a differentiable framework for zero-shot adaptive control over parametric families of nonlinear dynamical systems. Our approach integrates a function encoder-based neural ODE (F...
- Language Generation: Complexity Barriers and Implications for Learning : Abstract: Kleinberg and Mullainathan showed that, in principle, language generation is always possible: with sufficiently many positive examples, a learner can eventually produce sentences indistingui...
- Sign language recognition from skeletal data using graph and recurrent neural networks : Abstract: This work presents an approach for recognizing isolated sign language gestures using skeleton-based pose data extracted from video sequences. A Graph-GRU temporal network is proposed to mode...
- DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning : Abstract: Unlearning in Large Language Models (LLMs) is crucial for protecting private data and removing harmful knowledge. Most existing approaches rely on fine-tuning to balance unlearning efficienc...
- VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models : Abstract: Robotic grasping is a fundamental capability for autonomous manipulation; however, most existing methods rely on large-scale expert annotations and necessitate retraining to handle new objec...
- DiagnoLLM: A Hybrid Bayesian Neural Language Framework for Interpretable Disease Diagnosis : Abstract: Building trustworthy clinical AI systems requires not only accurate predictions but also transparent, biologically grounded explanations. We present \texttt{DiagnoLLM}, a hybrid framework th...
- Enhancing Diffusion Model Guidance through Calibration and Regularization : Abstract: Classifier-guided diffusion models have emerged as a powerful approach for conditional image generation, but they suffer from overconfident predictions during early denoising steps, causing ...
- EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph : Abstract: Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the expo...
- MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering : Abstract: In recent years, the advancement of Graph Neural Networks (GNNs) has significantly propelled progress in Multi-View Clustering (MVC). However, existing methods face the problem of coarse-gra...
- CSGaze: Context-aware Social Gaze Prediction : Abstract: A person's gaze offers valuable insights into their focus of attention, level of social engagement, and confidence. In this work, we investigate how contextual cues combined with visual scen...
- DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities : Abstract: The integration of medical images with clinical context is essential for generating accurate and clinically interpretable radiology reports. However, current automated methods often rely on ...
- Interpretable Recognition of Cognitive Distortions in Natural Language Texts : Abstract: We propose a new approach to multi-factor classification of natural language texts based on weighted structured patterns such as N-grams, taking into account the heterarchical relationships ...
- Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey : Abstract: Deep neural networks (DNNs) are widely used in perception systems for safety-critical applications, such as autonomous driving and robotics. However, DNNs remain vulnerable to various safety...
- Benchmarking of Clustering Validity Measures Revisited : Abstract: Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collecti...
- Learning solutions of parameterized stiff ODEs using Gaussian processes : Abstract: Stiff ordinary differential equations (ODEs) play an important role in many scientific and engineering applications. Often, the dependence of the solution of the ODE on additional parameters...
- Revisiting Entropy in Reinforcement Learning for Large Reasoning Models : Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a predominant approach for enhancing the reasoning capabilities of large language models (LLMs). However, the entropy of ...
- The Algorithmic Phase Transition in Symmetric Correlated Spiked Wigner Model : Abstract: We study the computational task of detecting and estimating correlated signals in a pair of spiked Wigner matrices. Our model consists of observations $$ X = \tfrac{\lambda}{\sqrt{n}} xx...
- Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts : Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer ...
- Stemming Hallucination in Language Models Using a Licensing Oracle : Abstract: Language models exhibit remarkable natural language generation capabilities but remain prone to hallucinations, generating factually incorrect information despite producing syntactically coh...
- MuonAll: Muon Variant for Efficient Finetuning of Large Language Models : Abstract: Muon optimizer has demonstrated robust results in pretraining of language models but its performance in finetuning of existing public pretrained models is not yet explored. Currently, Muon i...
- Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management : Abstract: Accurate thermospheric density prediction is crucial for reliable satellite operations in Low Earth Orbits, especially at high solar and geomagnetic activity. Physics-based models such as TI...
- A Deep Learning Model for Predicting Transformation Legality : Abstract: Compilers must check the legality of code transformations to guarantee the correctness of applying a sequence of code transformations to a given code. While such a legality check needs to be...
- Cross-Modal Fine-Tuning of 3D Convolutional Foundation Models for ADHD Classification with Low-Rank Adaptation : Abstract: Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging d...
- Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution Fingerprinting : Abstract: The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven automation. However, this expansion has also...
- Time Matters: A Novel Real-Time Long- and Short-term User Interest Model for Click-Through Rate Prediction : Abstract: Click-Through Rate (CTR) prediction is a core task in online personalization platform. A key step for CTR prediction is to learn accurate user representation to capture their interests. Gene...
- Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework : Abstract: This paper presents a comprehensive analysis of hyperparameter estimation within the empirical Bayes framework (EBF) for sparse learning. By studying the influence of hyperpriors on the solu...
- Functional Adjoint Sampler: Scalable Sampling on Infinite Dimensional Spaces : Abstract: Learning-based methods for sampling from the Gibbs distribution in finite-dimensional spaces have progressed quickly, yet theory and algorithmic design for infinite-dimensional function spac...
- Setting $\varepsilon$ is not the Issue in Differential Privacy : Abstract: This position paper argues that setting the privacy budget in differential privacy should not be viewed as an important limitation of differential privacy compared to alternative methods for...
- Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration : Abstract: Emerging continual learning applications necessitate next-generation neural processing unit (NPU) platforms to support both training and inference operations. The promising Microscaling (MX)...
- What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of reasoning tasks. Recent methods have further improved LLM performance in complex mathematical rea...
- Learning the Inverse Ryu--Takayanagi Formula with Transformers : Abstract: We study the inverse problem of holographic entanglement entropy in AdS$_3$ using a data-driven generative model. Training data consist of randomly generated geometries and their holographic...
- Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models : Abstract: Hierarchical Bayesian models based on Gaussian processes are considered useful for describing complex nonlinear statistical dependencies among variables in real-world data. However, effectiv...
- SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization : Abstract: The soft-thinking paradigm for Large Language Model (LLM) reasoning can outperform the conventional discrete-token Chain-of-Thought (CoT) reasoning in some scenarios, underscoring its resear...
- Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings : Abstract: Interpretable representation learning is a central challenge in modern machine learning, particularly in high-dimensional settings such as neuroimaging, genomics, and text analysis. Current ...
- Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis : Abstract: Chain-of-thought (CoT) prompting enables Large Language Models to solve complex problems, but deploying these models safely requires reliable confidence estimates, a capability where existin...
- Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models : Abstract: As AI moves beyond text, large language models (LLMs) increasingly power vision, audio, and document understanding; however, their high inference costs hinder real-time, scalable deployment....
- Countering Multi-modal Representation Collapse through Rank-targeted Fusion : Abstract: Multi-modal fusion methods often suffer from two types of representation collapse: feature collapse where individual dimensions lose their discriminative power (as measured by eigenspectra),...
- EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response : Abstract: Acoustic Environment Matching (AEM) is the task of transferring clean audio into a target acoustic environment, enabling engaging applications such as audio dubbing and auditory immersive vi...
- Brain-Inspired Planning for Better Generalization in Reinforcement Learning : Abstract: Existing Reinforcement Learning (RL) systems encounter significant challenges when applied to real-world scenarios, primarily due to poor generalization across environments that differ from ...
- Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains : Abstract: Supply chain disruptions and volatile demand pose significant challenges to the UK automotive industry, which relies heavily on Just-In-Time (JIT) manufacturing. While qualitative studies hi...
- QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations : Abstract: Transformer-based models have revolutionized computer vision (CV) and natural language processing (NLP) by achieving state-of-the-art performance across a range of benchmarks. However, nonli...
- Data Trajectory Alignment for LLM Domain Adaptation: A Two-Phase Synthesis Framework for Telecommunications Mathematics : Abstract: General-purpose large language models (LLMs) are increasingly deployed in verticals such as telecommunications, where adaptation is hindered by scarce, low-information-density corpora and ti...
- On the Mechanisms of Collaborative Learning in VAE Recommenders : Abstract: Variational Autoencoders (VAEs) are a powerful alternative to matrix factorization for recommendation. A common technique in VAE-based collaborative filtering (CF) consists in applying binar...
- Resource Efficient Sleep Staging via Multi-Level Masking and Prompt Learning : Abstract: Automatic sleep staging plays a vital role in assessing sleep quality and diagnosing sleep disorders. Most existing methods rely heavily on long and continuous EEG recordings, which poses si...
- Rethinking Parameter Sharing as Graph Coloring for Structured Compression : Abstract: Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectivel...
- Robust Causal Discovery under Imperfect Structural Constraints : Abstract: Robust causal discovery from observational data under imperfect prior knowledge remains a significant and largely unresolved challenge. Existing methods typically presuppose perfect priors o...
- Coupling Agent-based Modeling and Life Cycle Assessment to Analyze Trade-offs in Resilient Energy Transitions : Abstract: Transitioning to sustainable and resilient energy systems requires navigating complex and interdependent trade-offs across environmental, social, and resource dimensions. Neglecting these tr...
- Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises gro...
- Beyond Uniform Deletion: A Data Value-Weighted Framework for Certified Machine Unlearning : Abstract: As the right to be forgotten becomes legislated worldwide, machine unlearning mechanisms have emerged to efficiently update models for data deletion and enhance user privacy protection. Howe...
- FedNET: Federated Learning for Proactive Traffic Management and Network Capacity Planning : Abstract: We propose FedNET, a proactive and privacy-preserving framework for early identification of high-risk links in large-scale communication networks, that leverages a distributed multi-step tra...
- Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence : Abstract: This study introduces the Fast-Weights Homeostatic Reentry Layer (FH-RL), a neural mechanism that integrates fast-weight associative memory, homeostatic regularization, and learned reentrant...
- Neural-Initialized Newton: Accelerating Nonlinear Finite Elements via Operator Learning : Abstract: We propose a Newton-based scheme, initialized by neural operator predictions, to accelerate the parametric solution of nonlinear problems in computational solid mechanics. First, a physics i...
- Controllable Flow Matching for Online Reinforcement Learning : Abstract: Model-based reinforcement learning (MBRL) typically relies on modeling environment dynamics for data efficiency. However, due to the accumulation of model errors over long-horizon rollouts, ...
- DeepRWCap: Neural-Guided Random-Walk Capacitance Solver for IC Design : Abstract: Monte Carlo random walk methods are widely used in capacitance extraction for their mesh-free formulation and inherent parallelism. However, modern semiconductor technologies with densely pa...
- Minimum Width of Deep Narrow Networks for Universal Approximation : Abstract: Determining the minimum width of fully connected neural networks has become a fundamental problem in recent theoretical studies of deep neural networks. In this paper, we study the lower bou...
- MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression : Abstract: We introduce MI-to-Mid Distilled Compression (M2M-DC), a two-scale, shape-safe compression framework that interleaves information-guided block pruning with progressive inner slicing and stag...
- Beyond Observations: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning : Abstract: Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling ...
- Contact Wasserstein Geodesics for Non-Conservative Schrodinger Bridges : Abstract: The Schr\"odinger Bridge provides a principled framework for modeling stochastic processes between distributions; however, existing methods are limited by energy-conservation assumptions, wh...
- TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning : Abstract: Efficiently fine-tuning pre-trained models for downstream tasks is a key challenge in the era of foundation models. Parameter-efficient fine-tuning (PEFT) presents a promising solution, achi...
- DeepBooTS: Dual-Stream Residual Boosting for Drift-Resilient Time-Series Forecasting : Abstract: Time-Series (TS) exhibits pronounced non-stationarity. Consequently, most forecasting methods display compromised robustness to concept drift, despite the prevalent application of instance n...
- COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing : Abstract: Reconstruction-based methods are a dominant paradigm in time series anomaly detection (TSAD), however, their near-universal reliance on Mean Squared Error (MSE) loss results in statistically...
- On The Presence of Double-Descent in Deep Reinforcement Learning : Abstract: The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Rein...
- A Hybrid Autoencoder-Transformer Model for Robust Day-Ahead Electricity Price Forecasting under Extreme Conditions : Abstract: Accurate day-ahead electricity price forecasting (DAEPF) is critical for the efficient operation of power systems, but extreme condition and market anomalies pose significant challenges to e...
- A Closer Look at Knowledge Distillation in Spiking Neural Network Training : Abstract: Spiking Neural Networks (SNNs) become popular due to excellent energy efficiency, yet facing challenges for effective model training. Recent works improve this by introducing knowledge disti...
- Counterfactual Explanation for Multivariate Time Series Forecasting with Exogenous Variables : Abstract: Currently, machine learning is widely used across various domains, including time series data analysis. However, some machine learning models function as black boxes, making interpretability...
- Sampling and Loss Weights in Multi-Domain Training : Abstract: In the training of large deep neural networks, there is a need for vast amounts of training data. To meet this need, data is collected from multiple domains, such as Wikipedia and GitHub. Th...
- Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning : Abstract: Transformers have shown strong ability to model long-term dependencies and are increasingly adopted as world models in model-based reinforcement learning (RL) under partial observability. Ho...
- Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings : Abstract: Deep neural networks often under-perform on tabular data due to their sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions. These limitations hinder ...
- AutoHood3D: A Multi-Modal Benchmark for Automotive Hood Design and Fluid-Structure Interaction : Abstract: This study presents a new high-fidelity multi-modal dataset containing 16000+ geometric variants of automotive hoods useful for machine learning (ML) applications such as engineering compone...
- FiCABU: A Fisher-Based, Context-Adaptive Machine Unlearning Processor for Edge AI : Abstract: Machine unlearning, driven by privacy regulations and the "right to be forgotten", is increasingly needed at the edge, yet server-centric or retraining-heavy methods are impractical under ti...
- Conformal Prediction-Driven Adaptive Sampling for Digital Twins of Water Distribution Networks : Abstract: Digital Twins (DTs) for Water Distribution Networks (WDNs) require accurate state estimation with limited sensors. Uniform sampling often wastes resources across nodes with different uncerta...
- An MLCommons Scientific Benchmarks Ontology : Abstract: Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative app...
- wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation : Abstract: As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced ...
- Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift : Abstract: Time series foundation models (TSFMs) have shown strong results on public benchmarks, prompting comparisons to a "BERT moment" for time series. Their effectiveness in industrial settings, ho...
- Fooling Algorithms in Non-Stationary Bandits using Belief Inertia : Abstract: We study the problem of worst case regret in piecewise stationary multi armed bandits. While the minimax theory for stationary bandits is well established, understanding analogous limits in ...
- Unveiling the Training Dynamics of ReLU Networks through a Linear Lens : Abstract: Deep neural networks, particularly those employing Rectified Linear Units (ReLU), are often perceived as complex, high-dimensional, non-linear systems. This complexity poses a significant ch...
- SSTODE: Ocean-Atmosphere Physics-Informed Neural ODEs for Sea Surface Temperature Prediction : Abstract: Sea Surface Temperature (SST) is crucial for understanding upper-ocean thermal dynamics and ocean-atmosphere interactions, which have profound economic and social impacts. While data-driven ...
- Physics-Guided Machine Learning for Uncertainty Quantification in Turbulence Models : Abstract: Predicting the evolution of turbulent flows is central across science and engineering. Most studies rely on simulations with turbulence models, whose empirical simplifications introduce epis...
- Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games : Abstract: Inverse Game Theory (IGT) methods based on the entropy-regularized Quantal Response Equilibrium (QRE) offer a tractable approach for competitive settings, but critically assume the agents' r...
- KLASS: KL-Guided Fast Inference in Masked Diffusion Models : Abstract: Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlen...
- Distributionally Robust Self Paced Curriculum Reinforcement Learning : Abstract: A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributio...
- AI-assisted workflow enables rapid, high-fidelity breast cancer clinical trial eligibility prescreening : Abstract: Clinical trials play an important role in cancer care and research, yet participation rates remain low. We developed MSK-MATCH (Memorial Sloan Kettering Multi-Agent Trial Coordination Hub), ...
- TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification : Abstract: Transformer-based models have shown promising performance on tabular data compared to their classical counterparts such as neural networks and Gradient Boosted Decision Trees (GBDTs) in scen...
- Distributionally Robust Multimodal Machine Learning : Abstract: We consider the problem of distributionally robust multimodal machine learning. Existing approaches often rely on merging modalities on the feature level (early fusion) or heuristic uncertai...
- GastroDL-Fusion: A Dual-Modal Deep Learning Framework Integrating Protein-Ligand Complexes and Gene Sequences for Gastrointestinal Disease Drug Discovery : Abstract: Accurate prediction of protein-ligand binding affinity plays a pivotal role in accelerating the discovery of novel drugs and vaccines, particularly for gastrointestinal (GI) diseases such as...
- Compressing Chemistry Reveals Functional Groups : Abstract: We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle...
- QiVC-Net: Quantum-Inspired Variational Convolutional Network, with Application to Biosignal Classification : Abstract: This work introduces the quantum-inspired variational convolution (QiVC) framework, a novel learning paradigm that integrates principles of probabilistic inference, variational optimization,...
- Near-Exponential Savings for Mean Estimation with Active Learning : Abstract: We study the problem of efficiently estimating the mean of a $k$-class random variable, $Y$, using a limited number of labels, $N$, in settings where the analyst has access to auxiliary info...
- Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder : Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models (LLMs) by decomposing token activations into combinations of human-understandable features. ...
- Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs : Abstract: In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of s...
- An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning : Abstract: Federated learning (FL) enables collaborative model training without exposing clients' private data, but its deployment is often constrained by the communication cost of transmitting gradien...
- MARAuder's Map: Motion-Aware Real-time Activity Recognition with Layout-Based Trajectories : Abstract: Ambient sensor-based human activity recognition (HAR) in smart homes remains challenging due to the need for real-time inference, spatially grounded reasoning, and context-aware temporal mod...
- SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control : Abstract: Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameteriz...
- Beyond the Lower Bound: Bridging Regret Minimization and Best Arm Identification in Lexicographic Bandits : Abstract: In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting...
- Catching Contamination Before Generation: Spectral Kill Switches for Agents : Abstract: Agentic language models compose multi step reasoning chains, yet intermediate steps can be corrupted by inconsistent context, retrieval errors, or adversarial inputs, which makes post hoc ev...
- Measuring Model Performance in the Presence of an Intervention : Abstract: AI models are often evaluated based on their ability to predict the outcome of interest. However, in many AI for social impact applications, the presence of an intervention that affects the ...
- MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling : Abstract: Training large language models with FP8 formats offers significant efficiency gains. However, the reduced numerical precision of FP8 poses challenges for stable and accurate training. Curren...
- In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading : Abstract: In today's landscape, Mixture of Experts (MoE) is a crucial architecture that has been used by many of the most advanced models. One of the major challenges of MoE models is that they usuall...
- AiEDA: An Open-Source AI-Aided Design Library for Design-to-Vector : Abstract: Recent research has demonstrated that artificial intelligence (AI) can assist electronic design automation (EDA) in improving both the quality and efficiency of chip design. But current AI f...
- CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering : Abstract: An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute va...
- Predicting the Future by Retrieving the Past : Abstract: Deep learning models such as MLP, Transformer, and TCN have achieved remarkable success in univariate time series forecasting, typically relying on sliding window samples from historical dat...
- EMOD: A Unified EEG Emotion Representation Framework Leveraging V-A Guided Contrastive Learning : Abstract: Emotion recognition from EEG signals is essential for affective computing and has been widely explored using deep learning. While recent deep learning approaches have achieved strong perform...
- Adaptation and Fine-tuning with TabPFN for Travelling Salesman Problem : Abstract: Tabular Prior-Data Fitted Network (TabPFN) is a foundation model designed for small to medium-sized tabular data, which has attracted much attention recently. This paper investigates the app...
- FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge : Abstract: Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits t...
- Physics-Informed Neural Networks for Real-Time Gas Crossover Prediction in PEM Electrolyzers: First Application with Multi-Membrane Validation : Abstract: Green hydrogen production via polymer electrolyte membrane (PEM) water electrolysis is pivotal for energy transition, yet hydrogen crossover through membranes threatens safety and economic v...
- From Kernels to Attention: A Transformer Framework for Density and Score Estimation : Abstract: We introduce a unified attention-based framework for joint score and density estimation. Framing the problem as a sequence-to-sequence task, we develop a permutation- and affine-equivariant ...
- Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients : Abstract: Patients with chronic obstructive pulmonary disease (COPD) have an increased risk of hospitalizations, strongly associated with decreased survival, yet predicting the timing of these events ...
- Next-Latent Prediction Transformers Learn Compact World Models : Abstract: Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive ...
- Explainable Deep Learning-based Classification of Wolff-Parkinson-White Electrocardiographic Signals : Abstract: Wolff-Parkinson-White (WPW) syndrome is a cardiac electrophysiology (EP) disorder caused by the presence of an accessory pathway (AP) that bypasses the atrioventricular node, faster ventricu...
- Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference : Abstract: Anomaly troubleshooting for large model distributed inference (LMDI) remains a critical challenge. Resolving anomalies such as inference performance degradation or latency jitter in distribu...
- Are Time-Indexed Foundation Models the Future of Time Series Imputation? : Abstract: Foundation models for time series imputation remain largely unexplored. Recently, two such models, TabPFN-TS and MoTM, have emerged. These models share a common philosophy that places them w...
- Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables : Abstract: Flexible electronics offer unique advantages for conformable, lightweight, and disposable healthcare wearables. However, their limited gate count, large feature sizes, and high static power ...
- MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference : Abstract: The escalating context length in Large Language Models (LLMs) creates a severe performance bottleneck around the Key-Value (KV) cache, whose memory-bound nature leads to significant GPU unde...
- Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving : Abstract: Generative reasoning with large language models (LLMs) often involves long decoding sequences, leading to substantial memory and latency overheads from accumulating key-value (KV) caches. Wh...
- ITPP: Learning Disentangled Event Dynamics in Marked Temporal Point Processes : Abstract: Marked Temporal Point Processes (MTPPs) provide a principled framework for modeling asynchronous event sequences by conditioning on the history of past events. However, most existing MTPP mo...
- Advancing Ocean State Estimation with efficient and scalable AI : Abstract: Accurate and efficient global ocean state estimation remains a grand challenge for Earth system science, hindered by the dual bottlenecks of computational scalability and degraded data fidel...
- Physics-Informed Design of Input Convex Neural Networks for Consistency Optimal Transport Flow Matching : Abstract: We propose a consistency model based on the optimal-transport flow. A physics-informed design of partially input-convex neural networks (PICNN) plays a central role in constructing the flow ...
- How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy : Abstract: Attention mechanism is a significant part of Transformer models. It helps extract features from embedded vectors by adding global information and its expressivity has been proved to be power...
- Function Based Isolation Forest (FuBIF): A Unifying Framework for Interpretable Isolation-Based Anomaly Detection : Abstract: Anomaly Detection (AD) is evolving through algorithms capable of identifying outliers in complex datasets. The Isolation Forest (IF), a pivotal AD technique, exhibits adaptability limitation...
- CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding : Abstract: Backdoor attacks in machine learning have drawn significant attention for their potential to compromise models stealthily, yet most research has focused on homogeneous data such as images. I...
- Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin : Abstract: Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets. We present an end-to-end system that scales long-sequence modelin...
- Event-driven physics-informed operator learning for reliability analysis : Abstract: Reliability analysis of engineering systems under uncertainty poses significant computational challenges, particularly for problems involving high-dimensional stochastic inputs, nonlinear sy...
- Approximating Shapley Explanations in Reinforcement Learning : Abstract: Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical ...
- Adapting Web Agents with Synthetic Supervision : Abstract: Web agents struggle to adapt to new websites due to the scarcity of environment specific tasks and demonstrations. Recent works have explored synthetic data generation to address this challe...
- Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices : Abstract: We study the sequential decision-making problem for automated weaning of mechanical circulatory support (MCS) devices in cardiogenic shock patients. MCS devices are percutaneous micro-axial ...
- On the Convergence and Stability of Distributed Sub-model Training : Abstract: As learning models continue to grow in size, enabling on-device local training of these models has emerged as a critical challenge in federated learning. A popular solution is sub-model trai...
- Enhancing Robustness of Graph Neural Networks through p-Laplacian : Abstract: With the increase of data in day-to-day life, businesses and different stakeholders need to analyze the data for better pre- dictions. Traditionally, relational data has been a source of var...
- Models Got Talent: Identifying High Performing Wearable Human Activity Recognition Models Without Training : Abstract: A promising alternative to the computationally expensive Neural Architecture Search (NAS) involves the development of \textit{Zero Cost Proxies (ZCPs)}, which correlate well to trained perfo...
- LLM Attention Transplant for Transfer Learning of Tabular Data Across Disparate Domains : Abstract: Transfer learning of tabular data is non-trivial due to heterogeneity in the feature space across disparate domains. The limited success of traditional deep learning in tabular knowledge tra...
- Learning Gaussian DAG Models without Condition Number Bounds : Abstract: We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has $n$ nodes and maximum in-degree $d$. Prior work ...
- Local K-Similarity Constraint for Federated Learning with Label Noise : Abstract: Federated learning on clients with noisy labels is a challenging problem, as such clients can infiltrate the global model, impacting the overall generalizability of the system. Existing meth...
- Resilience Inference for Supply Chains with Hypergraph Neural Network : Abstract: Supply chains are integral to global economic stability, yet disruptions can swiftly propagate through interconnected networks, resulting in substantial economic impacts. Accurate and timely...
- Sparse Linear Regression is Easy on Random Supports : Abstract: Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X \in \mathbb{R}^{N \times d}$ and measurements o...
- Adaptive Multi-view Graph Contrastive Learning via Fractional-order Neural Diffusion Networks : Abstract: Graph contrastive learning (GCL) learns node and graph representations by contrasting multiple views of the same graph. Existing methods typically rely on fixed, handcrafted views-usually a ...
- Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment : Abstract: This paper focuses on dynamic origin-destination matrix estimation (DODE), a crucial calibration process necessary for the effective application of microscopic traffic simulations. The funda...
- Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals : Abstract: Human-computer interaction increasingly demands systems that recognize not only explicit user inputs but also implicit emotional states. While substantial progress has been made in affective...
- Scaling Laws and In-Context Learning: A Unified Theoretical Framework : Abstract: In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding o...
- Mixtures of SubExperts for Large Language Continual Learning : Abstract: Adapting Large Language Models (LLMs) to a continuous stream of tasks is a critical yet challenging endeavor. While Parameter-Efficient Fine-Tuning (PEFT) methods have become a standard for ...
- Constraint-Informed Active Learning for End-to-End ACOPF Optimization Proxies : Abstract: This paper studies optimization proxies, machine learning (ML) models trained to efficiently predict optimal solutions for AC Optimal Power Flow (ACOPF) problems. While promising, optimizati...
- Test-Time Iterative Error Correction for Efficient Diffusion Models : Abstract: With the growing demand for high-quality image generation on resource-constrained devices, efficient diffusion models have received increasing attention. However, such models suffer from app...
- MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios : Abstract: Model-based reinforcement learning (MBRL) is a crucial approach to enhance the generalization capabilities and improve the sample efficiency of RL algorithms. However, current MBRL methods f...
- Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra : Abstract: Retrieving molecular structures from tandem mass spectra is a crucial step in rapid compound identification. Existing retrieval methods, such as traditional mass spectral library matching, s...
- CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems : Abstract: Deep learning algorithms are becoming an essential component of many artificial intelligence (AI) driven applications, many of which run on resource-constrained and energy-constrained system...
- LLM$^3$-DTI: A Large Language Model and Multi-modal data co-powered framework for Drug-Target Interaction prediction : Abstract: Drug-target interaction (DTI) prediction is of great significance for drug discovery and drug repurposing. With the accumulation of a large volume of valuable data, data-driven methods have ...
- COTN: A Chaotic Oscillatory Transformer Network for Complex Volatile Systems under Extreme Conditions : Abstract: Accurate prediction of financial and electricity markets, especially under extreme conditions, remains a significant challenge due to their intrinsic nonlinearity, rapid fluctuations, and ch...
- Achieving Fairness Without Harm via Selective Demographic Experts : Abstract: As machine learning systems become increasingly integrated into human-centered domains such as healthcare, ensuring fairness while maintaining high predictive performance is critical. Existi...
- Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention : Abstract: Recent advances in Transformer-based Neural Operators have enabled significant progress in data-driven solvers for Partial Differential Equations (PDEs). Most current research has focused on...
- 3dSAGER: Geospatial Entity Resolution over 3D Objects (Technical Report) : Abstract: Urban environments are continuously mapped and modeled by various data collection platforms, including satellites, unmanned aerial vehicles and street cameras. The growing availability of 3D...
- Kaggle Chronicles: 15 Years of Competitions, Community and Data Science Innovation : Abstract: Since 2010, Kaggle has been a platform where data scientists from around the world come together to compete, collaborate, and push the boundaries of Data Science. Over these 15 years, it has...
- DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation : Abstract: Recent reasoning-first models (e.g., OpenAI o1, DeepSeek R1) have spurred a resurgence of interest in RLVR. Nevertheless, advances are dominated by mathematics (e.g., AIME), with competitive...
- Scalable Verification of Neural Control Barrier Functions Using Linear Bound Propagation : Abstract: Control barrier functions (CBFs) are a popular tool for safety certification of nonlinear dynamical control systems. Recently, CBFs represented as neural networks have shown great promise du...
- Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets : Abstract: Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (mo...
- Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization : Abstract: The optimization of urban traffic is threatened by the complexity of achieving a balance between transport efficiency and the maintenance of privacy, as well as the equitable distribution of...
- Adaptive Regularization for Large-Scale Sparse Feature Embedding Models : Abstract: The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heav...
- Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding : Abstract: Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve ...
- CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models : Abstract: Test-time Reinforcement Learning (TTRL) has shown promise in adapting foundation models for complex tasks at test-time, resulting in large performance improvements. TTRL leverages an elegant...
- How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation : Abstract: Existing graph neural networks typically rely on heuristic choices for hidden dimensions and propagation depths, which often lead to severe information loss during propagation, known as over...
- FLEX: Continuous Agent Evolution via Forward Learning from Experience : Abstract: Autonomous agents driven by Large Language Models (LLMs) have revolutionized reasoning and problem-solving but remain static after training, unable to grow with experience as intelligent bei...
- A Risk-Neutral Neural Operator for Arbitrage-Free SPX-VIX Term Structures : Abstract: We propose ARBITER, a risk-neutral neural operator for learning joint SPX-VIX term structures under no-arbitrage constraints. ARBITER maps market states to an operator that outputs implied v...
- MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains : Abstract: Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated o...
- Reconstruction and Secrecy under Approximate Distance Queries : Abstract: Consider the task of locating an unknown target point using approximate distance queries: in each round, a reconstructor selects a query point and receives a noisy version of its distance to...
- Error Estimate and Convergence Analysis for Data Valuation : Abstract: Data valuation quantifies data importance, but existing methods cannot ensure validity in a single training process. The neural dynamic data valuation (NDDV) method [3] addresses this limita...
- DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning : Abstract: Recently, optimizers that explicitly treat weights as matrices, rather than flattened vectors, have demonstrated their effectiveness. This perspective naturally leads to structured approxima...
- Explainable AI For Early Detection Of Sepsis : Abstract: Sepsis is a life-threatening condition that requires rapid detection and treatment to prevent progression to severe sepsis, septic shock, or multi-organ failure. Despite advances in medical ...
- Learning Time-Varying Graph Signals via Koopman : Abstract: A wide variety of real-world data, such as sea measurements, e.g., temperatures collected by distributed sensors and multiple unmanned aerial vehicles (UAV) trajectories, can be naturally re...
- Route Experts by Sequence, not by Token : Abstract: Mixture-of-Experts (MoE) architectures scale large language models (LLMs) by activating only a subset of experts per token, but the standard TopK routing assigns the same fixed number of exp...
- Probably Approximately Global Robustness Certification : Abstract: We propose and investigate probabilistic guarantees for the adversarial robustness of classification algorithms. While traditional formal verification approaches for robustness are intractab...
- Efficient Approximation of Volterra Series for High-Dimensional Systems : Abstract: The identification of high-dimensional nonlinear dynamical systems via the Volterra series has significant potential, but has been severely hindered by the curse of dimensionality. Tensor Ne...
- TriShGAN: Enhancing Sparsity and Robustness in Multivariate Time Series Counterfactuals Explanation : Abstract: In decision-making processes, stakeholders often rely on counterfactual explanations, which provide suggestions about what should be changed in the queried instance to alter the outcome of a...
- Bayesian Uncertainty Quantification with Anchored Ensembles for Robust EV Power Consumption Prediction : Abstract: Accurate EV power estimation underpins range prediction and energy management, yet practitioners need both point accuracy and trustworthy uncertainty. We propose an anchored-ensemble Long Sh...
- Practical Policy Distillation for Reinforcement Learning in Radio Access Networks : Abstract: Adopting artificial intelligence (AI) in radio access networks (RANs) presents several challenges, including limited availability of link-level measurements (e.g., CQI reports), stringent re...
- Breaking the Dyadic Barrier: Rethinking Fairness in Link Prediction Beyond Demographic Parity : Abstract: Link prediction is a fundamental task in graph machine learning with applications, ranging from social recommendation to knowledge graph completion. Fairness in this setting is critical, as ...
- Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality : Abstract: In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov's Accelerated Gradient (NAG) method achieves the optimal accelerated convergence. Exte...
- Adaptive Initial Residual Connections for GNNs with Theoretical Guarantees : Abstract: Message passing is the core operation in graph neural networks, where each node updates its embeddings by aggregating information from its neighbors. However, in deep architectures, this pro...
- Explainable Probabilistic Machine Learning for Predicting Drilling Fluid Loss of Circulation in Marun Oil Field : Abstract: Lost circulation remains a major and costly challenge in drilling operations, often resulting in wellbore instability, stuck pipe, and extended non-productive time. Accurate prediction of fl...
- Beyond Fixed Depth: Adaptive Graph Neural Networks for Node Classification Under Varying Homophily : Abstract: Graph Neural Networks (GNNs) have achieved significant success in addressing node classification tasks. However, the effectiveness of traditional GNNs degrades on heterophilic graphs, where ...
- A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series : Abstract: Accurate forecasting of complex high-dimensional dynamical systems from observational data is essential for several applications across science and engineering. A key challenge, however, is ...
- Non-Rival Data as Rival Products: An Encapsulation-Forging Approach for Data Synthesis : Abstract: The non-rival nature of data creates a dilemma for firms: sharing data unlocks value but risks eroding competitive advantage. Existing data synthesis methods often exacerbate this problem by...
- Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning : Abstract: Road network representation learning (RNRL) has attracted increasing attention from both researchers and practitioners as various spatiotemporal tasks are emerging. Recent advanced methods l...
- CaberNet: Causal Representation Learning for Cross-Domain HVAC Energy Prediction : Abstract: Cross-domain HVAC energy prediction is essential for scalable building energy management, particularly because collecting extensive labeled data for every new building is both costly and imp...
- Neyman-Pearson Classification under Both Null and Alternative Distributions Shift : Abstract: We consider the problem of transfer learning in Neyman-Pearson classification, where the objective is to minimize the error w.r.t. a distribution $\mu_1$, subject to the constraint that the ...
- Improving Asset Allocation in a Fast Moving Consumer Goods B2B Company: An Interpretable Machine Learning Framework for Commercial Cooler Assignment Based on Multi-Tier Growth Targets : Abstract: In the fast-moving consumer goods (FMCG) industry, deciding where to place physical assets, such as commercial beverage coolers, can directly impact revenue growth and execution efficiency. ...
- Dual-Pathway Fusion of EHRs and Knowledge Graphs for Predicting Unseen Drug-Drug Interactions : Abstract: Drug-drug interactions (DDIs) remain a major source of preventable harm, and many clinically important mechanisms are still unknown. Existing models either rely on pharmacologic knowledge gr...
- An Adaptive Machine Learning Triage Framework for Predicting Alzheimer's Disease Progression : Abstract: Accurate predictions of conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) can enable effective personalized therapy. While cognitive tests and clinical data are rou...
- Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization : Abstract: Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies ha...
- Peeling Context from Cause for Multimodal Molecular Property Prediction : Abstract: Deep models are used for molecular property prediction, yet they are often difficult to interpret and may rely on spurious context rather than causal structure, which reduces reliability und...
- ML-EcoLyzer: Quantifying the Environmental Cost of Machine Learning Inference Across Frameworks and Hardware : Abstract: Machine learning inference occurs at a massive scale, yet its environmental impact remains poorly quantified, especially on low-resource hardware. We present ML-EcoLyzer, a cross-framework t...
- Magnitude-Modulated Equivariant Adapter for Parameter-Efficient Fine-Tuning of Equivariant Graph Neural Networks : Abstract: Pretrained equivariant graph neural networks based on spherical harmonics offer efficient and accurate alternatives to computationally expensive ab-initio methods, yet adapting them to new t...
- Sensor Calibration Model Balancing Accuracy, Real-time, and Efficiency : Abstract: Most on-device sensor calibration studies benchmark models only against three macroscopic requirements (i.e., accuracy, real-time, and resource efficiency), thereby hiding deployment bottlen...
- MobileLLM-Pro Technical Report : Abstract: Efficient on-device language models around 1 billion parameters are essential for powering low-latency AI applications on mobile and wearable devices. However, achieving strong performance i...
- Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation : Abstract: Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal l...
- Rank-1 LoRAs Encode Interpretable Reasoning Signals : Abstract: Reasoning models leverage inference-time compute to significantly enhance the performance of language models on difficult logical tasks, and have become a dominating paradigm in frontier LLM...
- Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling : Abstract: Over-smoothing remains a fundamental challenge in deep Graph Neural Networks (GNNs), where repeated message passing causes node representations to become indistinguishable. While existing so...
- Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning : Abstract: As large language models continue to develop and expand, the extensive public data they rely on faces the risk of depletion. Consequently, leveraging private data within organizations to enh...
- AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs : Abstract: Graph-based retrieval-augmented generation (Graph-based RAG) has demonstrated significant potential in enhancing Large Language Models (LLMs) with structured knowledge. However, existing met...
- Deep one-gate per layer networks with skip connections are universal classifiers : Abstract: This paper shows how a multilayer perceptron with two hidden layers, which has been designed to classify two classes of data points, can easily be transformed into a deep neural network with...
- Daily Forecasting for Annual Time Series Datasets Using Similarity-Based Machine Learning Methods: A Case Study in the Energy Market : Abstract: The policy environment of countries changes rapidly, influencing macro-level indicators such as the Energy Security Index. However, this index is only reported annually, limiting its respons...
- Diversified Flow Matching with Translation Identifiability : Abstract: Diversified distribution matching (DDM) finds a unified translation function mapping a diverse collection of conditional source distributions to their target counterparts. DDM was proposed t...
- Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement : Abstract: Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Itera...
- Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models : Abstract: Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuri...
- Adaptive Sample-Level Framework Motivated by Distributionally Robust Optimization with Variance-Based Radius Assignment for Enhanced Neural Network Generalization Under Distribution Shift : Abstract: Distribution shifts and minority subpopulations frequently undermine the reliability of deep neural networks trained using Empirical Risk Minimization (ERM). Distributionally Robust Optimiza...
- Data-driven jet fuel demand forecasting: A case study of Copenhagen Airport : Abstract: Accurate forecasting of jet fuel demand is crucial for optimizing supply chain operations in the aviation market. Fuel distributors specifically require precise estimates to avoid inventory ...
- Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction : Abstract: Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as mate...
- Distillation-Accelerated Uncertainty Modeling for Multi-Objective RTA Interception : Abstract: Real-Time Auction (RTA) Interception aims to filter out invalid or irrelevant traffic to enhance the integrity and reliability of downstream data. However, two key challenges remain: (i) the...
- Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels : Abstract: While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide...
- Prompting Neural-Guided Equation Discovery Based on Residuals : Abstract: Neural-guided equation discovery systems use a data set as prompt and predict an equation that describes the data set without extensive search. However, if the equation does not meet the use...
- CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling : Abstract: Reinforcement learning (RL) post-training has become a trending paradigm for enhancing the capabilities of large language models (LLMs). Most existing RL systems for LLMs operate in a fully ...
- FedSparQ: Adaptive Sparse Quantization with Error Feedback for Robust & Efficient Federated Learning : Abstract: Federated Learning (FL) enables collaborative model training across decentralized clients while preserving data privacy by keeping raw data local. However, FL suffers from significant commun...
- GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning : Abstract: Inspired by the remarkable success of foundation models in language and vision, Graph Foundation Models (GFMs) hold significant promise for broad applicability across diverse graph tasks and...
- Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning : Abstract: Federated Learning (FL) enables decentralized model training across multiple clients while optionally preserving data privacy. However, communication efficiency remains a critical bottleneck...
- Optimizing Predictive Maintenance in Intelligent Manufacturing: An Integrated FNO-DAE-GNN-PPO MDP Framework : Abstract: In the era of smart manufacturing, predictive maintenance (PdM) plays a pivotal role in improving equipment reliability and reducing operating costs. In this paper, we propose a novel Markov...
- FlowNet: Modeling Dynamic Spatio-Temporal Systems via Flow Propagation : Abstract: Accurately modeling complex dynamic spatio-temporal systems requires capturing flow-mediated interdependencies and context-sensitive interaction dynamics. Existing methods, predominantly gra...
Research Sources: 869 | Generated: 11/11/2025
