AI RESEARCH PAPERS & ACADEMIC SOURCES
- Tau Anomaly Detection in PET Imaging via Bilateral-Guided Deterministic Diffusion Model : Abstract: The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajecto...
- Counting Hallucinations in Diffusion Models : Abstract: Diffusion probabilistic models (DPMs) have demonstrated remarkable progress in generative tasks, such as image and video synthesis. However, they still often produce hallucinated samples (ha...
- VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning : Abstract: Recent progress in diffusion models significantly advances various image generation tasks. However, the current mainstream approach remains focused on building task-specific models, which ha...
- Establishing Reality-Virtuality Interconnections in Urban Digital Twins for Superior Intelligent Road Inspection and Simulation : Abstract: Road inspection is crucial for maintaining road serviceability and ensuring traffic safety, as road defects gradually develop and compromise functionality. Traditional inspection methods, wh...
- GeoTexDensifier: Geometry-Texture-Aware Densification for High-Quality Photorealistic 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) has recently attracted wide attentions in various areas such as 3D navigation, Virtual Reality (VR) and 3D simulation, due to its photorealistic and efficient re...
- Deep priors for satellite image restoration with accurate uncertainties : Abstract: Satellite optical images, upon their on-ground receipt, offer a distorted view of the observed scene. Their restoration, including denoising, deblurring, and sometimes super-resolution, is r...
- TimeWalker: Personalized Neural Space for Lifelong Head Avatars : Abstract: We present TimeWalker, a novel framework that models realistic, full-scale 3D head avatars of a person on lifelong scale. Unlike current human head avatar pipelines that capture identity at ...
- An Efficient and Harmonized Framework for Balanced Cross-Domain Feature Integration : Abstract: Despite significant advancements in image generation using advanced generative frameworks, cross-image integration of content and style remains a key challenge. Current generative models, wh...
- ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention : Abstract: Photo exposure correction is widely investigated, but fewer studies focus on correcting under- and over-exposed images simultaneously. Three issues remain open to handle and correct both und...
- We Can Always Catch You: Detecting Adversarial Patched Objects WITH or WITHOUT Signature : Abstract: Recently, object detection has proven vulnerable to adversarial patch attacks. The attackers holding a specially crafted patch can hide themselves from state-of-the-art detectors, e.g., YOLO...
- RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics : Abstract: Spatial tracing, as a fundamental embodied interaction ability for robots, is inherently challenging as it requires multi-step metric-grounded reasoning compounded with complex spatial refer...
- Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging : Abstract: Prenatal ultrasound is the cornerstone for detecting congenital anomalies of the kidneys and urinary tract, but diagnosis is limited by operator dependence and suboptimal imaging conditions....
- Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving : Abstract: Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit bia...
- Leveraging Compression to Construct Transferable Bitrate Ladders : Abstract: Over the past few years, per-title and per-shot video encoding techniques have demonstrated significant gains as compared to conventional techniques such as constant CRF encoding and the fix...
- SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework : Abstract: This paper introduces SLIM-VDB, a new lightweight semantic mapping system with probabilistic semantic fusion for closed-set or open-set dictionaries. Advances in data structures from the com...
- JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation : Abstract: Understanding videos inherently requires reasoning over both visual and auditory information. To properly evaluate Omni-Large Language Models (Omni-LLMs), which are capable of processing mul...
- Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering : Abstract: Large-scale digitization initiatives have unlocked massive collections of historical newspapers, yet effective computational access remains hindered by OCR corruption, multilingual orthograp...
- JPEG-Inspired Cloud-Edge Holography : Abstract: Computer-generated holography (CGH) presents a transformative solution for near-eye displays in augmented and virtual reality. Recent advances in deep learning have greatly improved CGH in r...
- Resolution-Independent Neural Operators for Multi-Rate Sparse-View CT : Abstract: Sparse-view Computed Tomography (CT) reconstructs images from a limited number of X-ray projections to reduce radiation and scanning time, which makes reconstruction an ill-posed inverse pro...
- Navigation Around Unknown Space Objects Using Visible-Thermal Image Fusion : Abstract: As the popularity of on-orbit operations grows, so does the need for precise navigation around unknown resident space objects (RSOs) such as other spacecraft, orbital debris, and asteroids. ...
- AutoMV: An Automatic Multi-Agent System for Music Video Generation : Abstract: Music-to-Video (M2V) generation for full-length songs faces significant challenges. Existing methods produce short, disjointed clips, failing to align visuals with musical structure, beats, ...
- Pre-training vision models for the classification of alerts from wide-field time-domain surveys : Abstract: Modern wide-field time-domain surveys facilitate the study of transient, variable and moving phenomena by conducting image differencing and relaying alerts to their communities. Machine lear...
- Aion: Towards Hierarchical 4D Scene Graphs with Temporal Flow Dynamics : Abstract: Autonomous navigation in dynamic environments requires spatial representations that capture both semantic structure and temporal evolution. 3D Scene Graphs (3DSGs) provide hierarchical multi...
- ReGlove: A Soft Pneumatic Glove for Activities of Daily Living Assistance via Wrist-Mounted Vision : Abstract: This paper presents ReGlove, a system that converts low-cost commercial pneumatic rehabilitation gloves into vision-guided assistive orthoses. Chronic upper-limb impairment affects millions ...
- A Reproducible Workflow for Scraping, Structuring, and Segmenting Legacy Archaeological Artifact Images : Abstract: This technical note presents a reproducible workflow for converting a legacy archaeological image collection into a structured and segmentation ready dataset. The case study focuses on the L...
- Benchmarking Tesla's Traffic Light and Stop Sign Control: Field Dataset and Behavior Insights : Abstract: Understanding how Advanced Driver-Assistance Systems (ADAS) interact with Traffic Control Devices (TCDs) is critical for assessing their influence on traffic operations, yet this interaction...
- LitePT: Lighter Yet Stronger Point Transformer : Abstract: Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of di...
- Towards Scalable Pre-training of Visual Tokenizers for Generation : Abstract: The quality of the latent space in visual tokenizers (e.g., VAEs) is crucial for modern generative models. However, the standard reconstruction-based training paradigm produces a latent spac...
- Recurrent Video Masked Autoencoders : Abstract: We present Recurrent Video Masked-Autoencoders (RVM): a novel video representation learning approach that uses a transformer-based recurrent neural network to aggregate dense image features ...
- I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners : Abstract: Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited scene dataset, restricting genera...
- LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction : Abstract: Recent feed-forward reconstruction models like VGGT and $π^3$ achieve impressive reconstruction quality but cannot process streaming videos due to quadratic memory complexity, limiting their...
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation : Abstract: In this paper, we present JoVA, a unified framework for joint video-audio generation. Despite recent encouraging advances, existing methods face two critical limitations. First, most existin...
- AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection : Abstract: Industrial anomaly detection (IAD) is difficult due to the scarcity of normal reference samples and the subtle, localized nature of many defects. Single-pass vision-language models (VLMs) of...
- Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency : Abstract: Recent advances in diffusion-based generation techniques enable AI models to produce highly realistic videos, heightening the need for reliable detection mechanisms. However, existing detect...
- Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All : Abstract: This paper presents a new dataset for Novel View Synthesis, generated from a high-quality, animated film with stunning realism and intricate detail. Our dataset captures a variety of dynamic...
- MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning : Abstract: Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal con...
- SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning : Abstract: Spatial transcriptomics (ST) is an emerging technology that enables researchers to investigate the molecular relationships underlying tissue morphology. However, acquiring ST data remains pr...
- DBT-DINO: Towards Foundation model based analysis of Digital Breast Tomosynthesis : Abstract: Foundation models have shown promise in medical imaging but remain underexplored for three-dimensional imaging modalities. No foundation model currently exists for Digital Breast Tomosynthes...
- LongVie 2: Multimodal Controllable Ultra-Long Video World Model : Abstract: Building video world models upon pretrained video generation systems represents an important yet challenging step toward general spatiotemporal intelligence. A world model should possess thr...
- Lighting in Motion: Spatiotemporal HDR Lighting Estimation : Abstract: We present Lighting in Motion (LiMo), a diffusion-based approach to spatiotemporal lighting estimation. LiMo targets both realistic high-frequency detail prediction and accurate illuminance ...
- MMhops-R1: Multimodal Multi-hop Reasoning : Abstract: The ability to perform multi-modal multi-hop reasoning by iteratively integrating information across various modalities and external knowledge is critical for addressing complex real-world c...
- 3D Human-Human Interaction Anomaly Detection : Abstract: Human-centric anomaly detection (AD) has been primarily studied to specify anomalous behaviors in a single person. However, as humans by nature tend to act in a collaborative manner, behavio...
- TARA: Simple and Efficient Time Aware Retrieval Adaptation of MLLMs for Video Understanding : Abstract: Our objective is to build a general time-aware video-text embedding model for retrieval. To that end, we propose a simple and efficient recipe, dubbed TARA (Time Aware Retrieval Adaptation),...
- Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model : Abstract: Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native,...
- Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation : Abstract: We propose a multimodal-driven framework for high-fidelity long-term digital human animation termed $\textbf{Soul}$, which generates semantically coherent videos from a single-frame portrait...
- Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$ : Abstract: Native 4K (2160$\times$3840) video generation remains a critical challenge due to the quadratic computational explosion of full-attention as spatiotemporal resolution increases, making it di...
- PoseAnything: Universal Pose-guided Video Generation with Part-aware Temporal Coherence : Abstract: Pose-guided video generation refers to controlling the motion of subjects in generated video through a sequence of poses. It enables precise control over subject motion and has important app...
- Test-Time Modification: Inverse Domain Transformation for Robust Perception : Abstract: Generative foundation models contain broad visual knowledge and can produce diverse image variations, making them particularly promising for advancing domain generalization tasks. While they...
- IMILIA: interpretable multiple instance learning for inflammation prediction in IBD from H&E whole slide images : Abstract: As the therapeutic target for Inflammatory Bowel Disease (IBD) shifts toward histologic remission, the accurate assessment of microscopic inflammation has become increasingly central for eva...
- A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification : Abstract: Accurate and timely identification of plant leaf diseases is essential for resilient and sustainable agriculture, yet most deep learning approaches rely on large annotated datasets and compu...
- RecTok: Reconstruction Distillation along Rectified Flow : Abstract: Visual tokenizers play a crucial role in diffusion models. The dimensionality of latent space governs both reconstruction fidelity and the semantic expressiveness of the latent feature. Howe...
- Learning to Generate Cross-Task Unexploitable Examples : Abstract: Unexploitable example generation aims to transform personal images into their unexploitable (unlearnable) versions before they are uploaded online, thereby preventing unauthorized exploitati...
- USTM: Unified Spatial and Temporal Modeling for Continuous Sign Language Recognition : Abstract: Continuous sign language recognition (CSLR) requires precise spatio-temporal modeling to accurately recognize sequences of gestures in videos. Existing frameworks often rely on CNN-based spa...
- Computer vision training dataset generation for robotic environments using Gaussian splatting : Abstract: This paper introduces a novel pipeline for generating large-scale, highly realistic, and automatically labeled datasets for computer vision tasks in robotic environments. Our approach addres...
- Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs : Abstract: We address image-to-video generation with explicit user control over the final frame's disoccluded regions. Current image-to-video pipelines produce plausible motion but struggle to generate...
- Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys" : Abstract: Automatic polyp segmentation is crucial for improving the clinical identification of colorectal cancer (CRC). While Deep Learning (DL) techniques have been extensively researched for this pr...
- Automated User Identification from Facial Thermograms with Siamese Networks : Abstract: The article analyzes the use of thermal imaging technologies for biometric identification based on facial thermograms. It presents a comparative analysis of infrared spectral ranges (NIR, SW...
- KlingAvatar 2.0 Technical Report : Abstract: Avatar video generation models have achieved remarkable progress in recent years. However, prior work exhibits limited efficiency in generating long-duration high-resolution videos, sufferin...
- ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement : Abstract: While existing generation and unified models excel at general image generation, they struggle with tasks requiring deep reasoning, planning, and precise data-to-visual mapping abilities beyo...
- CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images : Abstract: The rapid advancement of generative models has increased the demand for generated image detectors capable of generalizing across diverse and evolving generation techniques. However, existing...
- Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? : Abstract: Recent advances in video generation have produced vivid content that are often indistinguishable from real videos, making AI-generated video detection an emerging societal challenge. Prior A...
- CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing : Abstract: Instruction-based image editing with diffusion models has achieved impressive results, yet existing methods struggle with fine-grained instructions specifying precise attributes such as colo...
- Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection : Abstract: Vision Language Models (VLMs) excel at visual question answering (VQA) but remain limited to snapshot vision, reasoning from static images. In contrast, embodied agents require ambulatory vi...
- STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits : Abstract: This paper presents STARCaster, an identity-aware spatio-temporal video diffusion model that addresses both speech-driven portrait animation and free-viewpoint talking portrait synthesis, gi...
- Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance : Abstract: We present Ego-EXTRA, a video-language Egocentric Dataset for EXpert-TRAinee assistance. Ego-EXTRA features 50 hours of unscripted egocentric videos of subjects performing procedural activit...
- POLAR: A Portrait OLAT Dataset and Generative Framework for Illumination-Aware Face Modeling : Abstract: Face relighting aims to synthesize realistic portraits under novel illumination while preserving identity and geometry. However, progress remains constrained by the limited availability of l...
- CoRA: A Collaborative Robust Architecture with Hybrid Fusion for Efficient Perception : Abstract: Collaborative perception has garnered significant attention as a crucial technology to overcome the perceptual limitations of single-agent systems. Many state-of-the-art (SOTA) methods have ...
- MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion : Abstract: Vision-language models enable the understanding and reasoning of complex traffic scenarios through multi-source information fusion, establishing it as a core technology for autonomous drivin...
- Seeing the Whole Picture: Distribution-Guided Data-Free Distillation for Semantic Segmentation : Abstract: Semantic segmentation requires a holistic understanding of the physical world, as it assigns semantic labels to spatially continuous and structurally coherent objects rather than to isolated...
- StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion : Abstract: The problem of depth completion involves predicting a dense depth image from a single sparse depth map and an RGB image. Unsupervised depth completion methods have been proposed for various ...
- LeafTrackNet: A Deep Learning Framework for Robust Leaf Tracking in Top-Down Plant Phenotyping : Abstract: High resolution phenotyping at the level of individual leaves offers fine-grained insights into plant development and stress responses. However, the full potential of accurate leaf tracking ...
- FID-Net: A Feature-Enhanced Deep Learning Network for Forest Infestation Detection : Abstract: Forest pests threaten ecosystem stability, requiring efficient monitoring. To overcome the limitations of traditional methods in large-scale, fine-grained detection, this study focuses on ac...
- Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models : Abstract: Multimodal biomedical Vision-Language Models (VLMs) exhibit immense potential in the field of Continual Learning (CL). However, they confront a core dilemma: how to preserve fine-grained int...
- Towards Test-time Efficient Visual Place Recognition via Asymmetric Query Processing : Abstract: Visual Place Recognition (VPR) has advanced significantly with high-capacity foundation models like DINOv2, achieving remarkable performance. Nonetheless, their substantial computational cos...
- Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models : Abstract: Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-t...
- Comprehensive Evaluation of Rule-Based, Machine Learning, and Deep Learning in Human Estimation Using Radio Wave Sensing: Accuracy, Spatial Generalization, and Output Granularity Trade-offs : Abstract: This study presents the first comprehensive comparison of rule-based methods, traditional machine learning models, and deep learning models in radio wave sensing with frequency modulated con...
- SneakPeek: Future-Guided Instructional Streaming Video Generation : Abstract: Instructional video generation is an emerging task that aims to synthesize coherent demonstrations of procedural activities from textual descriptions. Such capability has broad implications ...
- What Happens Next? Next Scene Prediction with a Unified Video Model : Abstract: Recent unified models for joint understanding and generation have significantly advanced visual generation capabilities. However, their focus on conventional tasks like text-to-video generat...
- JoDiffusion: Jointly Diffusing Image with Pixel-Level Annotations for Semantic Segmentation Promotion : Abstract: Given the inherently costly and time-intensive nature of pixel-level annotation, the generation of synthetic datasets comprising sufficiently diverse synthetic images paired with ground-trut...
- TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading : Abstract: Accurate medical image analysis can greatly assist clinical diagnosis, but its effectiveness relies on high-quality expert annotations Obtaining pixel-level labels for medical images, partic...
- Light Field Based 6DoF Tracking of Previously Unobserved Objects : Abstract: Object tracking is an important step in robotics and reautonomous driving pipelines, which has to generalize to previously unseen and complex objects. Existing high-performing methods often ...
- Few-Step Distillation for Text-to-Image Generation: A Practical Guide : Abstract: Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the fir...
- Scaling Up AI-Generated Image Detection via Generator-Aware Prototypes : Abstract: The pursuit of a universal AI-generated image (AIGI) detector often relies on aggregating data from numerous generators to improve generalization. However, this paper identifies a paradoxica...
- VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference : Abstract: This paper presents VLCache, a cache reuse framework that exploits both Key-Value (KV) cache and encoder cache from prior multimodal inputs to eliminate costly recomputation when the same mu...
- SCAdapter: Content-Style Disentanglement for Diffusion Style Transfer : Abstract: Diffusion models have emerged as the leading approach for style transfer, yet they struggle with photo-realistic transfers, often producing painting-like results or missing detailed stylisti...
- UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction : Abstract: Building extraction from remote sensing images is a challenging task due to the complex structure variations of the buildings. Existing methods employ convolutional or self-attention blocks ...
- Sharpness-aware Dynamic Anchor Selection for Generalized Category Discovery : Abstract: Generalized category discovery (GCD) is an important and challenging task in open-world learning. Specifically, given some labeled data of known classes, GCD aims to cluster unlabeled data t...
- Predictive Sample Assignment for Semantically Coherent Out-of-Distribution Detection : Abstract: Semantically coherent out-of-distribution detection (SCOOD) is a recently proposed realistic OOD detection setting: given labeled in-distribution (ID) data and mixed in-distribution and out-...
- Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification : Abstract: 3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current researc...
- Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection : Abstract: In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed obj...
- Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal : Abstract: Joint editing of audio and visual content is crucial for precise and controllable content creation. This new task poses challenges due to the limitations of paired audio-visual data before a...
- Learning Common and Salient Generative Factors Between Two Image Datasets : Abstract: Recent advancements in image synthesis have enabled high-quality image generation and manipulation. Most works focus on: 1) conditional manipulation, where an image is modified conditioned o...
- DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning : Abstract: Although multi-modal large language models (MLLMs) have shown strong capabilities across diverse domains, their application in generating fine-grained 3D perception and prediction outputs in...
- L-STEC: Learned Video Compression with Long-term Spatio-Temporal Enhanced Context : Abstract: Neural Video Compression has emerged in recent years, with condition-based frameworks outperforming traditional codecs. However, most existing methods rely solely on the previous frame's fea...
- Fast 2DGS: Efficient Image Representation with Deep Gaussian Prior : Abstract: As generative models become increasingly capable of producing high-fidelity visual content, the demand for efficient, interpretable, and editable image representations has grown substantiall...
- FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning : Abstract: Despite rapid progress in multimodal large language models (MLLMs) and emerging omni-modal architectures, current benchmarks remain limited in scope and integration, suffering from incomplet...
- GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation : Abstract: Physics-aware driving world model is essential for drive planning, out-of-distribution data synthesis, and closed-loop evaluation. However, existing methods often rely on a single diffusion ...
- Spinal Line Detection for Posture Evaluation through Train-ing-free 3D Human Body Reconstruction with 2D Depth Images : Abstract: The spinal angle is an important indicator of body balance. It is important to restore the 3D shape of the human body and estimate the spine center line. Existing mul-ti-image-based body res...
- $\beta$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment : Abstract: CLIP achieves strong zero-shot image-text retrieval by aligning global vision and text representations, yet it falls behind on fine-grained tasks even when fine-tuned on long, detailed capti...
- Progressive Conditioned Scale-Shift Recalibration of Self-Attention for Online Test-time Adaptation : Abstract: Online test-time adaptation aims to dynamically adjust a network model in real-time based on sequential input samples during the inference stage. In this work, we find that, when applying a ...
- Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning : Abstract: The proliferation of synthetic facial imagery has intensified the need for robust Open-World DeepFake Attribution (OW-DFA), which aims to attribute both known and unknown forgeries using lab...
- InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation : Abstract: Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speec...
- CogDoc: Towards Unified thinking in Documents : Abstract: Current document reasoning paradigms are constrained by a fundamental trade-off between scalability (processing long-context documents) and fidelity (capturing fine-grained, multimodal detai...
- Cross-modal Fundus Image Registration under Large FoV Disparity : Abstract: Previous work on cross-modal fundus image registration (CMFIR) assumes small cross-modal Field-of-View (FoV) disparity. By contrast, this paper is targeted at a more challenging scenario wit...
- D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation : Abstract: Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning, while modular systems ignore cross-component interdependencies and synergies. ...
- Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching : Abstract: Instance-level image retrieval aims to find images containing the same object as a given query, despite variations in size, position, or appearance. To address this challenging task, we prop...
- No Cache Left Idle: Accelerating diffusion model via Extreme-slimming Caching : Abstract: Diffusion models achieve remarkable generative quality, but computational overhead scales with step count, model depth, and sequence length. Feature caching is effective since adjacent times...
- Geometry-Aware Scene-Consistent Image Generation : Abstract: We study geometry-aware scene-consistent image generation: given a reference scene image and a text condition specifying an entity to be generated in the scene and its spatial relation to th...
- Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation : Abstract: This research introduces a transformative framework for integrating Vision-Enhanced Large Language Models (LLMs) with advanced transformer-based architectures to tackle challenges in high-re...
- Automatic Wire-Harness Color Sequence Detector : Abstract: Wire harness inspection process remains a labor-intensive process prone to errors in the modern Electronics Manufacturing Services (EMS) industry. This paper introduces a semiautomated machi...
- StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis : Abstract: Despite the rapid progress of deep learning in video action recognition (VAR) in recent years, privacy leakage in videos remains a critical concern. Current state-of-the-art privacy-preservi...
- From Tokens to Photons: Test-Time Physical Prompting for Vison-Language Models : Abstract: To extend the application of vision-language models (VLMs) from web images to sensor-mediated physical environments, we propose Multi-View Physical-prompt for Test-Time Adaptation (MVP), a f...
- Anatomy Guided Coronary Artery Segmentation from CCTA Using Spatial Frequency Joint Modeling : Abstract: Accurate coronary artery segmentation from coronary computed tomography angiography is essential for quantitative coronary analysis and clinical decision support. Nevertheless, reliable segm...
- Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention : Abstract: Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some...
- More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models : Abstract: Reinforcement learning from verifiable rewards (RLVR) has recently been extended from text-only LLMs to vision-language models (VLMs) to elicit long-chain multimodal reasoning. However, RLVR...
- From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields : Abstract: Accurately modeling light transport is essential for realistic image synthesis. Photon mapping provides physically grounded estimates of complex global illumination effects such as caustics ...
- Endless World: Real-Time 3D-Aware Long Video Generation : Abstract: Producing long, coherent video sequences with stable 3D structure remains a major challenge, particularly in streaming scenarios. Motivated by this, we introduce Endless World, a real-time f...
- BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation : Abstract: Bokeh and monocular depth estimation are tightly coupled through the same lens imaging geometry, yet current methods exploit this connection in incomplete ways. High-quality bokeh rendering ...
- ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States : Abstract: Generating articulated assets is crucial for robotics, digital twins, and embodied intelligence. Existing generative models often rely on single-view inputs representing closed states, resul...
- Speedrunning ImageNet Diffusion : Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potent...
- M4Human: A Large-Scale Multimodal mmWave Radar Benchmark for Human Mesh Reconstruction : Abstract: Human mesh reconstruction (HMR) provides direct insights into body-environment interaction, which enables various immersive applications. While existing large-scale HMR datasets rely heavily...
- V-Warper: Appearance-Consistent Video Diffusion Personalization via Value Warping : Abstract: Video personalization aims to generate videos that faithfully reflect a user-provided subject while following a text prompt. However, existing approaches often rely on heavy video-based fine...
- STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative : Abstract: While recent advancements in generative models have achieved remarkable visual fidelity in video synthesis, creating coherent multi-shot narratives remains a significant challenge. To addres...
- TCLeaf-Net: a transformer-convolution framework with global-local attention for robust in-field lesion-level plant leaf disease detection : Abstract: Timely and accurate detection of foliar diseases is vital for safeguarding crop growth and reducing yield losses. Yet, in real-field conditions, cluttered backgrounds, domain shifts, and lim...
- WeDetect: Fast Open-Vocabulary Object Detection as Retrieval : Abstract: Open-vocabulary object detection aims to detect arbitrary classes via text prompts. Methods without cross-modal fusion layers (non-fusion) offer faster inference by treating recognition as a...
- MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding : Abstract: While deep learning methods have achieved impressive success in many vision benchmarks, it remains difficult to understand and explain the representations and decisions of these models. Thou...
- OMUDA: Omni-level Masking for Unsupervised Domain Adaptation in Semantic Segmentation : Abstract: Unsupervised domain adaptation (UDA) enables semantic segmentation models to generalize from a labeled source domain to an unlabeled target domain. However, existing UDA methods still strugg...
- RealDrag: The First Dragging Benchmark with Real Target Image : Abstract: The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critica...
- Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection : Abstract: Designing high-performance object detection architectures is a complex task, where traditional manual design is time-consuming and labor-intensive, and Neural Architecture Search (NAS) is co...
- Feature Aggregation for Efficient Continual Learning of Complex Facial Expressions : Abstract: As artificial intelligence (AI) systems become increasingly embedded in our daily life, the ability to recognize and adapt to human emotions is essential for effective human-computer interac...
- MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models : Abstract: Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by a...
- Moment and Highlight Detection via MLLM Frame Segmentation : Abstract: Detecting video moments and highlights from natural-language queries have been unified by transformer-based methods. Other works use generative Multimodal LLM (MLLM) to predict moments and/o...
- Ultra-Low Bitrate Perceptual Image Compression with Shallow Encoder : Abstract: Ultra-low bitrate image compression (below 0.05 bits per pixel) is increasingly critical for bandwidth-constrained and computation-limited encoding scenarios such as edge devices. Existing f...
- ProImage-Bench: Rubric-Based Evaluation for Professional Image Generation : Abstract: We study professional image generation, where a model must synthesize information-dense, scientifically precise illustrations from technical descriptions rather than merely produce visually ...
- Fine-Grained Zero-Shot Learning with Attribute-Centric Representations : Abstract: Recognizing unseen fine-grained categories demands a model that can distinguish subtle visual differences. This is typically achieved by transferring visual-attribute relationships from seen...
- CineLOG: A Training Free Approach for Cinematic Long Video Generation : Abstract: Controllable video synthesis is a central challenge in computer vision, yet current models struggle with fine grained control beyond textual prompts, particularly for cinematic attributes li...
- A Hybrid Deep Learning Framework for Emotion Recognition in Children with Autism During NAO Robot-Mediated Interaction : Abstract: Understanding emotional responses in children with Autism Spectrum Disorder (ASD) during social interaction remains a critical challenge in both developmental psychology and human-robot inte...
- A Multi-Year Urban Streetlight Imagery Dataset for Visual Monitoring and Spatio-Temporal Drift Detection : Abstract: We present a large-scale, longitudinal visual dataset of urban streetlights captured by 22 fixed-angle cameras deployed across Bristol, U.K., from 2021 to 2025. The dataset contains over 526...
- SMRABooth: Subject and Motion Representation Alignment for Customized Video Generation : Abstract: Customized video generation aims to produce videos that faithfully preserve the subject's appearance from reference images while maintaining temporally consistent motion from reference video...
- Audio-Visual Camera Pose Estimationn with Passive Scene Sounds and In-the-Wild Video : Abstract: Understanding camera motion is a fundamental problem in embodied perception and 3D scene understanding. While visual methods have advanced rapidly, they often struggle under visually degrade...
- Open Horizons: Evaluating Deep Models in the Wild : Abstract: Open-world deployment requires models to recognize both known categories and remain reliable when novel classes appear. We present a unified experimental study spanning open-set recognition ...
- EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography : Abstract: Echocardiography is the most widely used imaging modality in cardiology, yet its interpretation remains labor-intensive and inherently multimodal, requiring view recognition, quantitative me...
- RePack: Representation Packing of Vision Foundation Model Features Enhances Diffusion Transformer : Abstract: The superior representation capability of pre-trained vision foundation models (VFMs) has been harnessed for enhancing latent diffusion models (LDMs). These approaches inject the rich semant...
- Enhancing deep learning performance on burned area delineation from SPOT-6/7 imagery for emergency management : Abstract: After a wildfire, delineating burned areas (BAs) is crucial for quantifying damages and supporting ecosystem recovery. Current BA mapping approaches rely on computer vision models trained on...
- Adaptive federated learning for ship detection across diverse satellite imagery sources : Abstract: We investigate the application of Federated Learning (FL) for ship detection across diverse satellite datasets, offering a privacy-preserving solution that eliminates the need for data shari...
- CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction : Abstract: Accurate capture of human-object interaction from ubiquitous sensors like RGB cameras is important for applications in human understanding, gaming, and robot learning. However, inferring 4D ...
- A Comparative Analysis of Semiconductor Wafer Map Defect Detection with Image Transformer : Abstract: Predictive maintenance is an important sector in modern industries which improves fault detection and cost reduction processes. By using machine learning algorithms in the whole process, the...
- Contextual Peano Scan and Fast Image Segmentation Using Hidden and Evidential Markov Chains : Abstract: Transforming bi-dimensional sets of image pixels into mono-dimensional sequences with a Peano scan (PS) is an established technique enabling the use of hidden Markov chains (HMCs) for unsupe...
- TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder : Abstract: 3D object detection is essential in autonomous driving, providing vital information about moving objects and obstacles. Detecting objects in distant regions with only a few LiDAR points is s...
- Smartphone monitoring of smiling as a behavioral proxy of well-being in everyday life : Abstract: Subjective well-being is a cornerstone of individual and societal health, yet its scientific measurement has traditionally relied on self-report methods prone to recall bias and high partici...
- Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models : Abstract: Large vision-language models (LVLMs) are vulnerable to typographic attacks, where misleading text within an image overrides visual understanding. Existing evaluation protocols and defenses, ...
- Microscopic Vehicle Trajectory Datasets from UAV-collected Video for Heterogeneous, Area-Based Urban Traffic : Abstract: This paper offers openly available microscopic vehicle trajectory (MVT) datasets collected using unmanned aerial vehicles (UAVs) in heterogeneous, area-based urban traffic conditions. Tradit...
- Hot H\'em: S\`ai G\`on Gi\~ua C\'ai N\'ong H\^ong C\`ong B\`ang -- Saigon in Unequal Heat : Abstract: Pedestrian heat exposure is a critical health risk in dense tropical cities, yet standard routing algorithms often ignore micro-scale thermal variation. Hot Hém is a GeoAI workflow that esti...
- Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors : Abstract: Deep learning has advanced two fundamentally different paradigms for instance segmentation: specialized models optimized through task-specific fine-tuning and generalist foundation models ca...
- Pseudo-Label Refinement for Robust Wheat Head Segmentation via Two-Stage Hybrid Training : Abstract: This extended abstract details our solution for the Global Wheat Full Semantic Segmentation Competition. We developed a systematic self-training framework. This framework combines a two-stag...
- Temporal-Anchor3DLane: Enhanced 3D Lane Detection with Multi-Task Losses and LSTM Fusion : Abstract: Monocular 3D lane detection remains challenging due to depth ambiguity, occlusion, and temporal instability across frames. Anchor-based approaches such as Anchor3DLane have demonstrated stro...
- DeBERTa-KC: A Transformer-Based Classifier for Knowledge Construction in Online Learning Discourse : Abstract: The rapid expansion of online courses and social media has generated large volumes of unstructured learner-generated text. Understanding how learners construct knowledge in these spaces is c...
- Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions : Abstract: With the widespread application of Large Language Models (LLMs), it has become a significant concern to ensure their safety and prevent harmful responses. While current safe-alignment method...
- DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models : Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note t...
- Towards Interactive Intelligence for Digital Humans : Abstract: We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we pres...
- Fine-tuned LLM-based Code Migration Framework : Abstract: The study presents the outcomes of research and experimental validation in the domain of automated codebase migration, with a focus on addressing challenges in transitioning SQL-based system...
- SIGMA: An AI-Empowered Training Stack on Early-Life Hardware : Abstract: An increasing variety of AI accelerators is being considered for large-scale training. However, enabling large-scale training on early-life AI accelerators faces three core challenges: frequ...
- Heart Disease Prediction using Case Based Reasoning (CBR) : Abstract: This study provides an overview of heart disease prediction using an intelligent system. Predicting disease accurately is crucial in the medical field, but traditional methods relying solely...
- ERA-IT: Aligning Semantic Models with Revealed Economic Preference for Real-Time and Explainable Patent Valuation : Abstract: Valuing intangible assets under uncertainty remains a critical challenge in the strategic management of technological innovation due to the information asymmetry inherent in high-dimensional...
- Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space : Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced cross-modal understanding and reasoning by incorporating Chain-of-Thought (CoT) reasoning in the s...
- Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings : Abstract: Polyp detectors trained on clean datasets often underperform in real-world endoscopy, where illumination changes, motion blur, and occlusions degrade image quality. Existing approaches strug...
- The Morphemic Origin of Zipf's Law: A Factorized Combinatorial Framework : Abstract: We present a simple structure based model of how words are formed from morphemes. The model explains two major empirical facts: the typical distribution of word lengths and the appearance of...
- VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding : Abstract: Long-form video understanding remains challenging due to the extended temporal structure and dense multimodal cues. Despite recent progress, many existing approaches still rely on hand-craft...
- From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving : Abstract: Current end-to-end autonomous driving systems operate at a level of intelligence akin to following simple steering commands. However, achieving genuinely intelligent autonomy requires a para...
- VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering : Abstract: Large vision-language models (LVLMs) exhibit impressive ability to jointly reason over visual and textual inputs. However, they often produce outputs that are linguistically fluent but factu...
- Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech : Abstract: Alzheimer's Disease (AD) is a progressive neurodegenerative condition that adversely affects cognitive abilities. Language-related changes can be automatically identified through the analysi...
- Towards Effective Model Editing for LLM Personalization : Abstract: Personalization is becoming indispensable for LLMs to align with individual user preferences and needs. Yet current approaches are often computationally expensive, data-intensive, susceptibl...
- A stylometric analysis of speaker attribution from speech transcripts : Abstract: Forensic scientists often need to identify an unknown speaker or writer in cases such as ransom calls, covert recordings, alleged suicide notes, or anonymous online communications, among man...
- Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation : Abstract: Safety alignment mechanisms in large language models prevent responses to harmful queries through learned refusal behavior, yet these same mechanisms impede legitimate research applications ...
- PrahokBART: A Pre-trained Sequence-to-Sequence Model for Khmer Natural Language Generation : Abstract: This work introduces {\it PrahokBART}, a compact pre-trained sequence-to-sequence model trained from scratch for Khmer using carefully curated Khmer and English corpora. We focus on improvin...
- Advancing Bangla Machine Translation Through Informal Datasets : Abstract: Bangla is the sixth most widely spoken language globally, with approximately 234 million native speakers. However, progress in open-source Bangla machine translation remains limited. Most on...
- Scaling Laws for Code: Every Programming Language Matters : Abstract: Code large language models (Code LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages...
- Large language models are not about language : Abstract: Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language i...
- Integrating Causal Reasoning into Automated Fact-Checking : Abstract: In fact-checking applications, a common reason to reject a claim is to detect the presence of erroneous cause-effect relationships between the events at play. However, current automated fact...
- AIR: Post-training Data Selection for Reasoning via Attention Head Influence : Abstract: LLMs achieve remarkable multi-step reasoning capabilities, yet effectively transferring these skills via post-training distillation remains challenging. Existing data selection methods, rang...
- An Open and Reproducible Deep Research Agent for Long-Form Question Answering : Abstract: We present an open deep research system for long-form question answering, selected as a winning system in the text-to-text track of the MMU-RAG competition at NeurIPS 2025. The system combin...
- Authors Should Annotate : Abstract: The status quo for labeling text is third-party annotation, but there are many cases where information directly from the document's source would be preferable over a third-person proxy, espe...
- QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management : Abstract: We introduce QwenLong-L1.5, a model that achieves superior long-context reasoning capabilities through systematic post-training innovations. The key technical breakthroughs of QwenLong-L1.5 ...
- What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation : Abstract: In this work, we conduct systematic research in a challenging area: the automatic evaluation of book-length stories (>100K tokens). Our study focuses on two key questions: (1) understanding ...
- Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions : Abstract: Persona-assigned large language models (LLMs) are used in domains such as education, healthcare, and sociodemographic simulation. Yet, they are typically evaluated only in short, single-roun...
- Curi\'o-Edu 7B: Examining Data Selection Impacts in LLM Continued Pretraining : Abstract: Continued pretraining extends a language model's capabilities by further exposing it to additional data, often tailored to a specific linguistic or domain context. This strategy has emerged ...
- NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents : Abstract: Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to ...
- CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning : Abstract: Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by "Context ...
- LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases : Abstract: Legal relations form a highly consequential analytical framework of civil law system, serving as a crucial foundation for resolving disputes and realizing values of the rule of law in judici...
- Which Pieces Does Unigram Tokenization Really Need? : Abstract: The Unigram tokenization algorithm offers a probabilistic alternative to the greedy heuristics of Byte-Pair Encoding. Despite its theoretical elegance, its implementation in practice is comp...
- StruProKGR: A Structural and Probabilistic Framework for Sparse Knowledge Graph Reasoning : Abstract: Sparse Knowledge Graphs (KGs) are commonly encountered in real-world applications, where knowledge is often incomplete or limited. Sparse KG reasoning, the task of inferring missing knowledg...
- NagaNLP: Bootstrapping NLP for Low-Resource Nagamese Creole with Human-in-the-Loop Synthetic Data : Abstract: The vast majority of the world's languages, particularly creoles like Nagamese, remain severely under-resourced in Natural Language Processing (NLP), creating a significant barrier to their ...
- The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting : Abstract: Culture is the bedrock of human interaction; it dictates how we perceive and respond to everyday interactions. As the field of human-computer interaction grows via the rise of generative Lar...
- Large language models have learned to use language : Abstract: Acknowledging that large language models have learned to use language can open doors to breakthrough language science. Achieving these breakthroughs may require abandoning some long-held ide...
- Can GPT replace human raters? Validity and reliability of machine-generated norms for metaphors : Abstract: As Large Language Models (LLMs) are increasingly being used in scientific research, the issue of their trustworthiness becomes crucial. In psycholinguistics, LLMs have been recently employed...
- F5-TTS-RO: Extending F5-TTS to Romanian TTS via Lightweight Input Adaptation : Abstract: This work introduces a lightweight input-level adapter for the F5-TTS model that enables Romanian Language support. To preserve the existing capabilities of the model (voice cloning, English...
- Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics : Abstract: We introduce MARKET-BENCH, a benchmark that evaluates large language models (LLMs) on introductory quantitative trading tasks by asking them to construct executable backtesters from natural-...
- BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding : Abstract: The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the standard attention mechani...
- Benchmarking Contextual Understanding for In-Car Conversational Systems : Abstract: In-Car Conversational Question Answering (ConvQA) systems significantly enhance user experience by enabling seamless voice interactions. However, assessing their accuracy and reliability rem...
- Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models : Abstract: Producing trustworthy and reliable Large Language Models (LLMs) has become increasingly important as their usage becomes more widespread. Calibration seeks to achieve this by improving the a...
- AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis : Abstract: Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations i...
- A PyTorch Framework for Scalable Non-Crossing Quantile Regression : Abstract: Quantile regression is fundamental to distributional modeling, yet independent estimation of multiple quantiles frequently produces crossing -- where estimated quantile functions violate mon...
- Multipole Attention for Efficient Long Context Reasoning : Abstract: Large Reasoning Models (LRMs) have shown promising accuracy improvements on complex problem-solving tasks. While these models have attained high accuracy by leveraging additional computation...
- Compact Neural Network Algorithm for Electrocardiogram Classification : Abstract: In this paper, we present a powerful, compact electrocardiogram (ECG) classification algorithm for cardiac arrhythmia diagnosis that addresses the current reliance on deep learning and convo...
- A Physics-Embedded Dual-Learning Imaging Framework for Electrical Impedance Tomography : Abstract: Electrical Impedance Tomography (EIT) is a promising noninvasive imaging technique that reconstructs the spatial conductivity distribution from boundary voltage measurements. However, it pos...
- Navigating AI to Unpack Youth Privacy Concerns: An In-Depth Exploration and Systematic Review : Abstract: This systematic literature review investigates perceptions, concerns, and expectations of young digital citizens regarding privacy in artificial intelligence (AI) systems, focusing on social...
- Self-test loss functions for learning weak-form operators and gradient flows : Abstract: The construction of loss functions presents a major challenge in data-driven modeling involving weak-form operators in PDEs and gradient flows, particularly due to the need to select test fu...
- QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain : Abstract: We tackle the problem of quantifying the number of objects by a generative text-to-image model. Rather than retraining such a model for each new image domain of interest, which leads to high...
- On the physics of nested Markov models: a generalized probabilistic theory perspective : Abstract: Determining potential probability distributions with a given causal graph is vital for causality studies. To bypass the difficulty in characterizing latent variables in a Bayesian network, t...
- WALINET: A water and lipid identification convolutional Neural Network for nuisance signal removal in 1H MR Spectroscopic Imaging : Abstract: Purpose. Proton Magnetic Resonance Spectroscopic Imaging (1H-MRSI) provides non-invasive spectral-spatial mapping of metabolism. However, long-standing problems in whole-brain 1H-MRSI are sp...
- Deep-ER: Deep Learning ECCENTRIC Reconstruction for fast high-resolution neurometabolic imaging : Abstract: Introduction: Altered neurometabolism is an important pathological mechanism in many neurological diseases and brain cancer, which can be mapped non-invasively by Magnetic Resonance Spectros...
- CIC: Circular Image Compression : Abstract: Learned image compression (LIC) is currently the cutting-edge method. However, the inherent difference between testing and training images of LIC results in performance degradation to some e...
- "All of Me": Mining Users' Attributes from their Public Spotify Playlists : Abstract: In the age of digital music streaming, playlists on platforms like Spotify have become an integral part of individuals' musical experiences. People create and publicly share their own playli...
- An Anytime Algorithm for Good Arm Identification : Abstract: In good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as a good arm, if it exists. Few works have studied GAI in ...
- The prediction of the quality of results in Logic Synthesis using Transformer and Graph Neural Networks : Abstract: In the logic synthesis stage, structure transformations in the synthesis tool need to be combined into optimization sequences and act on the circuit to meet the specified circuit area and de...
- CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer : Abstract: Interval and large invasive breast cancers, which are associated with worse prognosis than other cancers, are usually detected at a late stage due to false negative assessments of screening ...
- Comparative Analysis of Wave Scattering Numerical Modeling Using the Boundary Element Method and Physics-Informed Neural Networks : Abstract: This study compares the Boundary Element Method (BEM) and Physics-Informed Neural Networks (PINNs) for solving the two-dimensional Helmholtz equation in wave scattering problems. The objecti...
- Bilevel ZOFO: Efficient LLM Fine-Tuning and Meta-Training : Abstract: Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO) optimizers presents significant computational challenges. Parameter-Efficient Fine-Tuning (PE...
- Defending Collaborative Filtering Recommenders via Adversarial Robustness Based Edge Reweighting : Abstract: User based collaborative filtering (CF) relies on a user and user similarity graph, making it vulnerable to profile injection (shilling) attacks that manipulate neighborhood relations to pro...
- The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels : Abstract: Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining...
- Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks : Abstract: Financial fraud refers to the act of obtaining financial benefits through dishonest means. Such behavior not only disrupts the order of the financial market but also harms economic and socia...
- Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract Interpretation : Abstract: Graph convolutional neural networks (GCNs) are powerful tools for learning graph-based knowledge representations from training data. However, they are vulnerable to small perturbations in th...
- Vertical Semi-Federated Learning for Efficient Online Advertising : Abstract: Traditional vertical federated learning schema suffers from two main issues: 1) restricted applicable scope to overlapped samples and 2) high system challenge of real-time federated serving,...
- Adaptive Risk Mitigation in Demand Learning : Abstract: We study dynamic pricing of a product with an unknown demand distribution over a finite horizon. Departing from the standard no-regret learning environment in which prices can be adjusted at...
- SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful Work : Abstract: The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability. ...
- Universality of high-dimensional scaling limits of stochastic gradient descent : Abstract: We consider statistical tasks in high dimensions whose loss depends on the data only through its projection into a fixed-dimensional subspace spanned by the parameter vectors and certain gro...
- Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models : Abstract: Representing continuous time is a critical and under-explored challenge in modeling temporal event sequences with large language models (LLMs). Various strategies like byte-level representat...
- Do-Undo: Generating and Reversing Physical Actions in Vision-Language Models : Abstract: We introduce the Do-Undo task and benchmark to address a critical gap in vision-language models: understanding and generating physically plausible scene transformations driven by real-world ...
- Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization : Abstract: A well-engineered prompt can increase the performance of large language models; automatic prompt optimization techniques aim to increase performance without requiring human effort to tune th...
- A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees : Abstract: This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of...
- Pancakes: Consistent Multi-Protocol Image Segmentation Across Biomedical Domains : Abstract: A single biomedical image can be meaningfully segmented in multiple ways, depending on the desired application. For instance, a brain MRI can be segmented according to tissue types, vascular...
- Adaptive Sampling for Hydrodynamic Stability : Abstract: An adaptive sampling approach for efficient detection of bifurcation boundaries in parametrized fluid flow problems is presented herein. The study extends the machine-learning approach of Si...
- Actively Learning Joint Contours of Multiple Computer Experiments : Abstract: Contour location$\unicode{x2014}$the process of sequentially training a surrogate model to identify the design inputs that result in a pre-specified response value from a single computer exp...
- Enhancing lithological interpretation from petrophysical well log of IODP expedition 390/393 using machine learning : Abstract: Enhanced lithological interpretation from well logs plays a key role in geological resource exploration and mapping, as well as in geo-environmental modeling studies. Core and cutting inform...
- A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments : Abstract: Mental rotation -- the ability to compare objects seen from different viewpoints -- is a fundamental example of mental simulation and spatial world modelling in humans. Here we propose a mec...
- From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis : Abstract: We inspect the deductive connection between the neural scaling law and Zipf's law -- two statements discussed in machine learning and quantitative linguistics. The neural scaling law describ...
- Real-Time AI-Driven Milling Digital Twin Towards Extreme Low-Latency : Abstract: Digital twin (DT) enables smart manufacturing by leveraging real-time data, AI models, and intelligent control systems. This paper presents a state-of-the-art analysis on the emerging field ...
- MineTheGap: Automatic Mining of Biases in Text-to-Image Models : Abstract: Text-to-Image (TTI) models generate images based on text prompts, which often leave certain aspects of the desired image ambiguous. When faced with these ambiguities, TTI models have been sh...
- rNCA: Self-Repairing Segmentation Masks : Abstract: Accurately predicting topologically correct masks remains a difficult task for general segmentation models, which often produce fragmented or disconnected outputs. Fixing these artifacts typ...
- Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles : Abstract: Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controlle...
- AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning : Abstract: Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume...
- Better LMO-based Momentum Methods with Second-Order Information : Abstract: The use of momentum in stochastic optimization algorithms has shown empirical success across a range of machine learning tasks. Recently, a new class of stochastic momentum algorithms has em...
- Rethinking Physics-Informed Regression Beyond Training Loops and Bespoke Architectures : Abstract: We revisit the problem of physics-informed regression, and propose a method that directly computes the state at the prediction point, simultaneously with the derivative and curvature informa...
- MicroPhaseNO: Adapting an Earthquake-Trained Phase Neural Operator for Microseismic Phase Picking : Abstract: Seismic phase picking is very often used for microseismic monitoring and subsurface imaging. Traditional manual processing is not feasible for either real-time applications or large arrays. ...
- Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks : Abstract: Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper present...
- Weight Space Correlation Analysis: Quantifying Feature Utilization in Deep Learning Models : Abstract: Deep learning models in medical imaging are susceptible to shortcut learning, relying on confounding metadata (e.g., scanner model) that is often encoded in image embeddings. The crucial que...
- Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences : Abstract: We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergenc...
- Towards Practical Large-scale Dynamical Heterogeneous Graph Embedding: Cold-start Resilient Recommendation : Abstract: Deploying dynamic heterogeneous graph embeddings in production faces key challenges of scalability, data freshness, and cold-start. This paper introduces a practical, two-stage solution that...
- ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning : Abstract: To combine the advantages of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), recent methods have integrated ''hints'' into post-training, which are prefix segments of complete ...
- PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations : Abstract: Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learn...
- DiRe: Diversity-promoting Regularization for Dataset Condensation : Abstract: In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with si...
- Progressive Refinement of E-commerce Search Ranking Based on Short-Term Activities of the Buyer : Abstract: In e-commerce shopping, aligning search results with a buyer's immediate needs and preferences presents a significant challenge, particularly in adapting search results throughout the buyer'...
- Motus: A Unified Latent Action World Model : Abstract: While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents uni...
- Comprehensive Deployment-Oriented Assessment for Cross-Environment Generalization in Deep Learning-Based mmWave Radar Sensing : Abstract: This study presents the first comprehensive evaluation of spatial generalization techniques, which are essential for the practical deployment of deep learning-based radio-frequency (RF) sens...
- General OOD Detection via Model-aware and Subspace-aware Variable Priority : Abstract: Out-of-distribution (OOD) detection is essential for determining when a supervised model encounters inputs that differ meaningfully from its training distribution. While widely studied in cl...
- VoroLight: Learning Quality Volumetric Voronoi Meshes from General Inputs : Abstract: We present VoroLight, a differentiable framework for 3D shape reconstruction based on Voronoi meshing. Our approach generates smooth, watertight surfaces and topologically consistent volumet...
- Continuous Edit Distance, Geodesics and Barycenters of Time-varying Persistence Diagrams : Abstract: We introduce the Continuous Edit Distance (CED), a geodesic and elastic distance for time-varying persistence diagrams (TVPDs). The CED extends edit-distance ideas to TVPDs by combining loca...
- Evaluating Singular Value Thresholds for DNN Weight Matrices based on Random Matrix Theory : Abstract: This study evaluates thresholds for removing singular values from singular value decomposition-based low-rank approximations of deep neural network weight matrices. Each weight matrix is mod...
- PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders : Abstract: Linear Autoencoders (LAEs) have shown strong performance in state-of-the-art recommender systems. However, this success remains largely empirical, with limited theoretical understanding. In ...
- Qonvolution: Towards Learning High-Frequency Signals with Queried Convolution : Abstract: Accurately learning high-frequency signals is a challenge in computer vision and graphics, as neural networks often struggle with these signals due to spectral bias or optimization difficult...
- KANEL\'E: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation : Abstract: Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a c...
- HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility : Abstract: We introduce a high-throughput neural network accelerator that embeds most network layers directly in hardware, minimizing data transfer and memory usage while preserving a degree of flexibi...
- An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused Learning : Abstract: High penetration of renewable energy sources (RES) introduces significant uncertainty and intermittency into microgrid operations, posing challenges to economic and reliable scheduling. To a...
- Flow-matching Operators for Residual-Augmented Probabilistic Learning of Partial Differential Equations : Abstract: Learning probabilistic surrogates for PDEs remains challenging in data-scarce regimes: neural operators require large amounts of high-fidelity data, while generative approaches typically sac...
- Transport Reversible Jump Markov Chain Monte Carlo with proposals generated by Variational Inference with Normalizing Flows : Abstract: We present a framework using variational inference with normalizing flows (VI-NFs) to generate proposals of reversible jump Markov chain Monte Carlo (RJMCMC) for efficient trans-dimensional ...
- Limits To (Machine) Learning : Abstract: Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a univers...
- Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity : Abstract: Control policies in deep reinforcement learning are often implemented with fixed-capacity multilayer perceptrons trained by backpropagation, which lack structural plasticity and depend on gl...
- Practical Hybrid Quantum Language Models with Observable Readout on Real Hardware : Abstract: Hybrid quantum-classical models represent a crucial step toward leveraging near-term quantum devices for sequential data processing. We present Quantum Recurrent Neural Networks (QRNNs) and ...
- Efficient Vision-Language Reasoning via Adaptive Token Pruning : Abstract: Real-world deployment of Vision-Language Models (VLMs) is hindered by high computational demands, as existing architectures inefficiently process all tokens uniformly. We introduce Adaptive ...
- Robust Variational Bayes by Min-Max Median Aggregation : Abstract: We propose a robust and scalable variational Bayes (VB) framework designed to effectively handle contamination and outliers in dataset. Our approach partitions the data into $m$ disjoint sub...
- Modeling Authorial Style in Urdu Novels Using Character Interaction Graphs and Graph Neural Networks : Abstract: Authorship analysis has traditionally focused on lexical and stylistic cues within text, while higher-level narrative structure remains underexplored, particularly for low-resource languages...
- CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF : Abstract: Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query ...
- ceLLMate: Sandboxing Browser AI Agents : Abstract: Browser-using agents (BUAs) are an emerging class of autonomous agents that interact with web browsers in human-like ways, including clicking, scrolling, filling forms, and navigating across...
- Scalable Quantum Error Mitigation with Neighbor-Informed Learning : Abstract: Noise in quantum hardware is the primary obstacle to realizing the transformative potential of quantum computing. Quantum error mitigation (QEM) offers a promising pathway to enhance computa...
- Mind the Jumps: A Scalable Robust Local Gaussian Process for Multidimensional Response Surfaces with Discontinuities : Abstract: Modeling response surfaces with abrupt jumps and discontinuities remains a major challenge across scientific and engineering domains. Although Gaussian process models excel at capturing smoo...
- Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization : Abstract: Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Si...
- Supervised Contrastive Frame Aggregation for Video Representation Learning : Abstract: We propose a supervised contrastive learning framework for video representation learning that leverages temporally global context. We introduce a video to image aggregation strategy that spa...
- HyperEdit: Unlocking Instruction-based Text Editing in LLMs via Hypernetworks : Abstract: Instruction-based text editing is increasingly critical for real-world applications such as code editors (e.g., Cursor), but Large Language Models (LLMs) continue to struggle with this task....
- Animus3D: Text-driven 3D Animation via Motion Score Distillation : Abstract: We present Animus3D, a text-driven 3D animation framework that generates motion field given a static 3D asset and text prompt. Previous methods mostly leverage the vanilla Score Distillation...
- Generative Spatiotemporal Data Augmentation : Abstract: We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transfo...
- Understanding Overparametrization in Survival Models through Double-Descent : Abstract: Classical statistical learning theory predicts a U-shaped relationship between test loss and model capacity, driven by the bias-variance trade-off. Recent advances in modern machine learning...
- Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval : Abstract: Modern vector databases enable efficient retrieval over high-dimensional neural embeddings, powering applications from web search to retrieval-augmented generation. However, classical theory...
- Efficient Level-Crossing Probability Calculation for Gaussian Process Modeled Data : Abstract: Almost all scientific data have uncertainties originating from different sources. Gaussian process regression (GPR) models are a natural way to model data with Gaussian-distributed uncertain...
- Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees : Abstract: Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular g...
- Data-driven modelling of autonomous and forced dynamical systems : Abstract: The paper demonstrates that invariant foliations are accurate, data-efficient and practical tools for data-driven modelling of physical systems. Invariant foliations can be fitted to data th...
- ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics : Abstract: Infographic Visual Question Answering (InfographicVQA) evaluates a model's ability to read and reason over data-rich, layout-heavy visuals that combine text, charts, icons, and design elemen...
- ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems : Abstract: Diverse emerging VR applications integrate streaming of high fidelity 360 video content that requires ample amounts of computation and data rate. Scalable 360 video tiling enables having ela...
- Towards a pretrained deep learning estimator of the Linfoot informational correlation : Abstract: We develop a supervised deep-learning approach to estimate mutual information between two continuous random variables. As labels, we use the Linfoot informational correlation, a transformati...
- Unified Control for Inference-Time Guidance of Denoising Diffusion Models : Abstract: Aligning diffusion model outputs with downstream objectives is essential for improving task-specific performance. Broadly, inference-time training-free approaches for aligning diffusion mode...
- Extending the application of dynamic Bayesian networks in calculating market risk: Standard and stressed expected shortfall : Abstract: In the last five years, expected shortfall (ES) and stressed ES (SES) have become key required regulatory measures of market risk in the banking sector, especially following events such as t...
- GrowTAS: Progressive Expansion from Small to Large Subnets for Efficient ViT Architecture Search : Abstract: Transformer architecture search (TAS) aims to automatically discover efficient vision transformers (ViTs), reducing the need for manual design. Existing TAS methods typically train an over-p...
- Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates : Abstract: Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRM...
- Robust Outlier Detection and Low-Latency Concept Drift Adaptation for Data Stream Regression: A Dual-Channel Architecture : Abstract: Outlier detection and concept drift detection represent two challenges in data analysis. Most studies address these issues separately. However, joint detection mechanisms in regression remai...
- Hellinger loss function for Generative Adversarial Networks : Abstract: We propose Hellinger-type loss functions for training Generative Adversarial Networks (GANs), motivated by the boundedness, symmetry, and robustness properties of the Hellinger distance. We ...
- Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy : Abstract: Fall recovery is a critical skill for humanoid robots in dynamic environments such as RoboCup, where prolonged downtime often decides the match. Recent techniques using deep reinforcement le...
- Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking : Abstract: Reasoning-augmented vision language models (VLMs) generate explicit chains of thought that promise greater capability and transparency but also introduce new failure modes: models may reach ...
- Keep the Lights On, Keep the Lengths in Check: Plug-In Adversarial Detection for Time-Series LLMs in Energy Forecasting : Abstract: Accurate time-series forecasting is increasingly critical for planning and operations in low-carbon power systems. Emerging time-series large language models (TS-LLMs) now deliver this capab...
- Modeling Dabrafenib Response Using Multi-Omics Modality Fusion and Protein Network Embeddings Based on Graph Convolutional Networks : Abstract: Cancer cell response to targeted therapy arises from complex molecular interactions, making single omics insufficient for accurate prediction. This study develops a model to predict Dabrafen...
- Citation-Grounded Code Comprehension: Preventing LLM Hallucination Through Hybrid Retrieval and Graph-Augmented Context : Abstract: Large language models have become essential tools for code comprehension, enabling developers to query unfamiliar codebases through natural language interfaces. However, LLM hallucination, g...
- A Novel Patch-Based TDA Approach for Computed Tomography : Abstract: The development of machine learning (ML) models based on computed tomography (CT) imaging modality has been a major focus of recent research in the medical imaging domain. Incorporating robu...
- AI-Augmented Pollen Recognition in Optical and Holographic Microscopy for Veterinary Imaging : Abstract: We present a comprehensive study on fully automated pollen recognition across both conventional optical and digital in-line holographic microscopy (DIHM) images of sample slides. Visually re...
- SPDMark: Selective Parameter Displacement for Robust Video Watermarking : Abstract: The advent of high-quality video generation models has amplified the need for robust watermarking schemes that can be used to reliably detect and track the provenance of generated videos. Ex...
- BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models : Abstract: Autoregressive video models are promising for world modeling via next-frame prediction, but they suffer from exposure bias: a mismatch between training on clean contexts and inference on sel...
- VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs : Abstract: Large language models (LLMs) are increasingly being used to generate synthetic datasets for the evaluation and training of downstream models. However, prior work has noted that such generate...
- CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos : Abstract: Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorte...
- Exploring Spatial-Temporal Representation via Star Graph for mmWave Radar-based Human Activity Recognition : Abstract: Human activity recognition (HAR) requires extracting accurate spatial-temporal features with human movements. A mmWave radar point cloud-based HAR system suffers from sparsity and variable-s...
- Adversarial Attacks Against Deep Learning-Based Radio Frequency Fingerprint Identification : Abstract: Radio frequency fingerprint identification (RFFI) is an emerging technique for the lightweight authentication of wireless Internet of things (IoT) devices. RFFI exploits deep learning models...
- Policy Gradient Algorithms for Age-of-Information Cost Minimization : Abstract: Recent developments in cyber-physical systems have increased the importance of maximizing the freshness of the information about the physical environment. However, optimizing the access poli...
- Interval Fisher's Discriminant Analysis and Visualisation : Abstract: In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms,...
- MPath: Multimodal Pathology Report Generation from Whole Slide Images : Abstract: Automated generation of diagnostic pathology reports directly from whole slide images (WSIs) is an emerging direction in computational pathology. Translating high-resolution tissue patterns ...
- CLARGA: Multimodal Graph Representation Learning over Arbitrary Sets of Modalities : Abstract: We introduce CLARGA, a general-purpose multimodal fusion architecture for multimodal representation learning that works with any number and type of modalities without changing the underlying...
- mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description : Abstract: Realistic signal generation and dataset augmentation are essential for advancing mmWave radar applications such as activity recognition and pose estimation, which rely heavily on diverse, an...
- The Art of Storytelling in Authoritarian Regimes: Crafting State Narratives on Chinese Social Media : Abstract: This article examines how authoritarian regimes construct state narratives about politically consequential events. Building on the narrative policy framework and existing research on authori...
- Evolving Deep Learning Optimizers : Abstract: We present a genetic algorithm framework for automatically discovering deep learning optimization algorithms. Our approach encodes optimizers as genomes that specify combinations of primitiv...
- Love First, Know Later: Persona-Based Romantic Compatibility Through LLM Text World Engines : Abstract: We propose Love First, Know Later: a paradigm shift in computational matching that simulates interactions first, then assesses compatibility. Instead of comparing static profiles, our framew...
- Reinforcement Learning for Latent-Space Thinking in LLMs : Abstract: Chain-of-Thought (CoT) reasoning typically utilizes the discrete language space for thinking, which is inherently inefficient, as many generated tokens only enforce linguistic rules that are...
- Directional Textual Inversion for Personalized Text-to-Image Generation : Abstract: Textual Inversion (TI) is an efficient approach to text-to-image personalization but often fails on complex prompts. We trace these failures to embedding norm inflation: learned tokens drift...
- A Scientific Reasoning Model for Organic Synthesis Procedure Generation : Abstract: Solving computer-aided synthesis planning is essential for enabling fully automated, robot-assisted synthesis workflows and improving the efficiency of drug discovery. A key challenge, howev...
- StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion : Abstract: Stuttering detection breaks down when disfluencies overlap. Existing parametric models struggle to distinguish complex, simultaneous disfluencies (e.g., a 'block' with a 'prolongation') due ...
- LightTopoGAT: Enhancing Graph Attention Networks with Topological Features for Efficient Graph Classification : Abstract: Graph Neural Networks have demonstrated significant success in graph classification tasks, yet they often require substantial computational resources and struggle to capture global graph pro...
- Scalable Formal Verification via Autoencoder Latent Space Abstraction : Abstract: Finite Abstraction methods provide a powerful formal framework for proving that systems satisfy their specifications. However, these techniques face scalability challenges for high-dimension...
- Image Diffusion Preview with Consistency Solver : Abstract: The slow inference process of image diffusion models significantly degrades interactive user experiences. To address this, we introduce Diffusion Preview, a novel paradigm employing rapid, l...
- Async Control: Stress-testing Asynchronous Control Measures for LLM Agents : Abstract: LLM-based software engineering agents are increasingly used in real-world development tasks, often with access to sensitive data or security-critical codebases. Such agents could intentional...
- Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource : Abstract: Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We i...
- On-Device Continual Learning for Unsupervised Visual Anomaly Detection in Dynamic Manufacturing : Abstract: In modern manufacturing, Visual Anomaly Detection (VAD) is essential for automated inspection and consistent product quality. Yet, increasingly dynamic and flexible production environments i...
- Element-wise Modulation of Random Matrices for Efficient Neural Layers : Abstract: Fully connected layers are a primary source of memory and computational overhead in deep neural networks due to their dense, often redundant parameterization. While various compression techn...
- DP-EMAR: A Differentially Private Framework for Autonomous Model Weight Repair in Federated IoT Systems : Abstract: Federated Learning (FL) enables decentralized model training without sharing raw data, but model weight distortion remains a major challenge in resource constrained IoT networks. In multi ti...
- XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders : Abstract: In data-driven applications relying on tabular data, where interpretability is key, machine learning models such as decision trees and linear regression are applied. Although neural networks...
- Multiclass Graph-Based Large Margin Classifiers: Unified Approach for Support Vectors and Neural Networks : Abstract: While large margin classifiers are originally an outcome of an optimization framework, support vectors (SVs) can be obtained from geometric approaches. This article presents advances in the ...
- Dual-Phase Federated Deep Unlearning via Weight-Aware Rollback and Reconstruction : Abstract: Federated Unlearning (FUL) focuses on client data and computing power to offer a privacy-preserving solution. However, high computational demands, complex incentive mechanisms, and dispariti...
- On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models : Abstract: Large Language Models (LLMs) are prone to memorizing training data, which poses serious privacy risks. Two of the most prominent concerns are training data extraction and Membership Inferenc...
- Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT Networks : Abstract: The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applicat...
- FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs : Abstract: Machine unlearning (MU) seeks to eliminate the influence of specific training examples from deployed models. As large language models (LLMs) become widely used, managing risks arising from i...
- KD-PINN: Knowledge-Distilled PINNs for ultra-low-latency real-time neural PDE solvers : Abstract: This work introduces Knowledge-Distilled Physics-Informed Neural Networks (KD-PINN), a framework that transfers the predictive accuracy of a high-capacity teacher model to a compact student ...
- B\'ezierFlow: B\'ezier Stochastic Interpolant Schedulers for Few-Step Generation : Abstract: We introduce BézierFlow, a lightweight training approach for few-step generation with pretrained diffusion and flow models. BézierFlow achieves a 2-3x performance improvement for sampling wi...
- Learning to Retrieve with Weakened Labels: Robust Training under Label Noise : Abstract: Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. Howev...
- ModSSC: A Modular Framework for Semi-Supervised Classification on Heterogeneous Data : Abstract: Semi-supervised classification leverages both labeled and unlabeled data to improve predictive performance, but existing software support is fragmented across methods and modalities. We intr...
- Evaluating Adversarial Attacks on Federated Learning for Temperature Forecasting : Abstract: Deep learning and federated learning (FL) are becoming powerful partners for next-generation weather forecasting. Deep learning enables high-resolution spatiotemporal forecasts that can surp...
- Noise-Resilient Quantum Aggregation on NISQ for Federated ADAS Learning : Abstract: Advanced Driver Assistance Systems (ADAS) increasingly employ Federated Learning (FL) to collaboratively train models across distributed vehicular nodes while preserving data privacy. Yet, c...
- Enhancing Node-Level Graph Domain Adaptation by Alleviating Local Dependency : Abstract: Recent years have witnessed significant advancements in machine learning methods on graphs. However, transferring knowledge effectively from one graph to another remains a critical challenge...
- Quanvolutional Neural Networks for Spectrum Peak-Finding : Abstract: The analysis of spectra, such as Nuclear Magnetic Resonance (NMR) spectra, for the comprehensive characterization of peaks is a challenging task for both experts and machines, especially wit...
- LikeBench: Evaluating Subjective Likability in LLMs for Personalization : Abstract: A personalized LLM should remember user facts, apply them correctly, and adapt over time to provide responses that the user prefers. Existing LLM personalization benchmarks are largely cente...
- Multi-fidelity aerodynamic data fusion by autoencoder transfer learning : Abstract: Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This lim...
- Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments : Abstract: This paper addresses the challenges of low scheduling efficiency, unbalanced resource allocation, and poor adaptability in ETL (Extract-Transform-Load) processes under heterogeneous data env...
- Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection : Abstract: Detecting fraud in financial transactions typically relies on tabular models that demand heavy feature engineering to handle high-dimensional data and offer limited interpretability, making ...
- Alada: Alternating Adaptation of Momentum Method for Memory-Efficient Matrix Optimization : Abstract: This work proposes Alada, an adaptive momentum method for stochastic optimization over large-scale matrices. Alada employs a rank-one factorization approach to estimate the second moment of ...
- Deep Learning-Driven Inversion Framework for Shear Modulus Estimation in Magnetic Resonance Elastography (DIME) : Abstract: The Multimodal Direct Inversion (MMDI) algorithm is widely used in Magnetic Resonance Elastography (MRE) to estimate tissue shear stiffness. However, MMDI relies on the Helmholtz equation, w...
- CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks : Abstract: While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression pa...
- Application of Deep Learning in Biological Data Compression : Abstract: Cryogenic electron microscopy (Cryo-EM) has become an essential tool for capturing high-resolution biological structures. Despite its advantage in visualizations, the large storage size of C...
- Understanding When Graph Convolutional Networks Help: A Diagnostic Study on Label Scarcity and Structural Properties : Abstract: Graph Convolutional Networks (GCNs) have become a standard approach for semi-supervised node classification, yet practitioners lack clear guidance on when GCNs provide meaningful improvement...
- SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision : Abstract: Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging du...
- LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization : Abstract: In modern financial markets, investors increasingly seek personalized and adaptive portfolio strategies that reflect their individual risk preferences and respond to dynamic market condition...
- Machine Learning Architectures for the Estimation of Predicted Occupancy Grids in Road Traffic : Abstract: This paper introduces a novel machine learning architecture for an efficient estimation of the probabilistic space-time representation of complex traffic scenarios. A detailed representation...
- Next-generation reservoir computing validated by classification task : Abstract: An emerging computing paradigm, so-called next-generation reservoir computing (NG-RC) is investigated. True to its namesake, NG-RC requires no actual reservoirs for input data mixing but rat...
- Predicted-occupancy grids for vehicle safety applications based on autoencoders and the Random Forest algorithm : Abstract: In this paper, a probabilistic space-time representation of complex traffic scenarios is predicted using machine learning algorithms. Such a representation is significant for all active vehi...
- Probability Estimation for Predicted-Occupancy Grids in Vehicle Safety Applications Based on Machine Learning : Abstract: This paper presents a method to predict the evolution of a complex traffic scenario with multiple objects. The current state of the scenario is assumed to be known from sensors and the predi...
- Wait, Wait, Wait... Why Do Reasoning Models Loop? : Abstract: Reasoning models (e.g., DeepSeek-R1) generate long chains of thought to solve harder problems, but they often loop, repeating the same text at low temperatures or with greedy decoding. We st...
- Distillation of Discrete Diffusion by Exact Conditional Distribution Matching : Abstract: Discrete diffusion models (DDMs) are a powerful class of generative models for categorical data, but they typically require many function evaluations for a single sample, making inference ex...
- Unsupervised learning of multiscale switching dynamical system models from multimodal neural data : Abstract: Neural population activity often exhibits regime-dependent non-stationarity in the form of switching dynamics. Learning accurate switching dynamical system models can reveal how behavior is ...
- Improving Recursive Transformers with Mixture of LoRAs : Abstract: Parameter sharing in recursive transformers reduces model size but collapses layer-wise expressivity. We propose Mixture of LoRAs (MoL), a lightweight conditional-computation mechanism that ...
- GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients : Abstract: Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations can lead to drastically altered model predi...
- Optimal Resource Allocation for ML Model Training and Deployment under Concept Drift : Abstract: We study how to allocate resources for training and deployment of machine learning (ML) models under concept drift and limited budgets. We consider a setting in which a model provider distri...
- TRACER: Transfer Learning based Real-time Adaptation for Clinical Evolving Risk : Abstract: Clinical decision support tools built on electronic health records often experience performance drift due to temporal population shifts, particularly when changes in the clinical environment...
- Credit Risk Estimation with Non-Financial Features: Evidence from a Synthetic Istanbul Dataset : Abstract: Financial exclusion constrains entrepreneurship, increases income volatility, and widens wealth gaps. Underbanked consumers in Istanbul often have no bureau file because their earnings and p...
- OLR-WAA: Adaptive and Drift-Resilient Online Regression with Dynamic Weighted Averaging : Abstract: Real-world datasets frequently exhibit evolving data distributions, reflecting temporal variations and underlying shifts. Overlooking this phenomenon, known as concept drift, can substantial...
- Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models : Abstract: Large Language Models (LLMs) achieve state-of-the-art performance across a wide range of applications, but their massive scale poses significant challenges for both efficiency and interpreta...
- SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization : Abstract: Decentralized federated learning (DFL) faces critical challenges from statistical heterogeneity and communication overhead. While NTK-based methods achieve faster convergence, transmitting f...
- Solving a Machine Learning Regression Problem Based on the Theory of Random Functions : Abstract: This paper studies a machine learning regression problem as a multivariate approximation problem using the framework of the theory of random functions. An ab initio derivation of a regressio...
- Multi-Trajectory Physics-Informed Neural Networks for HJB Equations with Hard-Zero Terminal Inventory: Optimal Execution on Synthetic & SPY Data : Abstract: We study optimal trade execution with a hard-zero terminal inventory constraint, modeled via Hamilton-Jacobi-Bellman (HJB) equations. Vanilla PINNs often under-enforce this constraint and pr...
- Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning : Abstract: Recent advances in vision-language models (VLMs) reasoning have been largely attributed to the rise of reinforcement Learning (RL), which has shifted the community's focus away from the supe...
- On Approaches to Building Surrogate ODE Models for Diffusion Bridges : Abstract: Diffusion and Schrödinger Bridge models have established state-of-the-art performance in generative modeling but are often hampered by significant computational costs and complex training pr...
- Torch Geometric Pool: the Pytorch library for pooling in Graph Neural Networks : Abstract: We introduce Torch Geometric Pool (tgp), a library for hierarchical pooling in Graph Neural Networks. Built upon Pytorch Geometric, Torch Geometric Pool (tgp) provides a wide variety of pool...
- Spectral Sentinel: Scalable Byzantine-Robust Decentralized Federated Learning via Sketched Random Matrix Theory on Blockchain : Abstract: Decentralized federated learning (DFL) enables collaborative model training without centralized trust, but it remains vulnerable to Byzantine clients that poison gradients under heterogeneou...
- Causal inference and model explainability tools for retail : Abstract: Most major retailers today have multiple divisions focused on various aspects, such as marketing, supply chain, online customer experience, store customer experience, employee productivity, ...
- Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics : Abstract: Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear...
- Differentiable Energy-Based Regularization in GANs: A Simulator-Based Exploration of VQE-Inspired Auxiliary Losses : Abstract: This paper presents an exploratory, simulator-based proof of concept investigating whether differentiable energy terms derived from parameterized quantum circuits can serve as auxiliary regu...
- On the Accuracy of Newton Step and Influence Function Data Attributions : Abstract: Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretabi...
- Optimal Mistake Bounds for Transductive Online Learning : Abstract: We resolve a 30-year-old open problem concerning the power of unlabeled data in online learning by tightly quantifying the gap between transductive and standard online learning. In the stand...
- Effective Fine-Tuning with Eigenvector Centrality Based Pruning : Abstract: In social media networks a small number of highly influential users can drive large scale changes in discourse across multiple communities. Small shifts in the behavior of these users are of...
- Empirical Mode Decomposition and Graph Transformation of the MSCI World Index: A Multiscale Topological Analysis for Graph Neural Network Modeling : Abstract: This study applies Empirical Mode Decomposition (EMD) to the MSCI World index and converts the resulting intrinsic mode functions (IMFs) into graph representations to enable modeling with gr...
- Policy Optimization for Dynamic Heart Transplant Allocation : Abstract: Heart transplantation is a viable path for patients suffering from advanced heart failure, but this lifesaving option is severely limited due to donor shortage. Although the current allocati...
- AI-Driven Early Warning Systems for Student Success: Discovering Static Feature Dominance in Temporal Prediction Models : Abstract: Early identification of at-risk students is critical for effective intervention in online learning environments. This study extends temporal prediction analysis to Week 20 (50% of course dur...
- GoMS: Graph of Molecule Substructure Network for Molecule Property Prediction : Abstract: While graph neural networks have shown remarkable success in molecular property prediction, current approaches like the Equivariant Subgraph Aggregation Networks (ESAN) treat molecules as ba...
- Sparse Concept Anchoring for Interpretable and Controllable Neural Representations : Abstract: We introduce Sparse Concept Anchoring, a method that biases latent space to position a targeted subset of concepts while allowing others to self-organize, using only minimal supervision (lab...
- Optimized Architectures for Kolmogorov-Arnold Networks : Abstract: Efforts to improve Kolmogorov-Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes...
- Knowledge-Guided Masked Autoencoder with Linear Spectral Mixing and Spectral-Angle-Aware Reconstruction : Abstract: Integrating domain knowledge into deep learning has emerged as a promising direction for improving model interpretability, generalization, and data efficiency. In this work, we present a nov...
- Learning Dynamics in Memristor-Based Equilibrium Propagation : Abstract: Memristor-based in-memory computing has emerged as a promising paradigm to overcome the constraints of the von Neumann bottleneck and the memory wall by enabling fully parallelisable and ene...
- Can Graphs Improve Tabular Foundation Models? : Abstract: Tabular data are central to many real-world systems. While recent tabular transformers and in-context learners such as SAINT, TP-BERTa, TabPFN, TabICL, and MITRA incorporate limited inter-ro...
- DeepVekua: Geometric-Spectral Representation Learning for Physics-Informed Fields : Abstract: We present DeepVekua, a hybrid architecture that unifies geometric deep learning with spectral analysis to solve partial differential equations (PDEs) in sparse data regimes. By learning a d...
- Anchoring Values in Temporal and Group Dimensions for Flow Matching Model Alignment : Abstract: Group Relative Policy Optimization (GRPO) has proven highly effective in enhancing the alignment capabilities of Large Language Models (LLMs). However, current adaptations of GRPO for the fl...
- The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining : Abstract: Domain-adaptive pretraining (DAPT) offers a practical path to specializing large language models for high-value domains without full retraining. We conduct an early-stage scaling-law analysi...
- Synthetic Swarm Mosquito Dataset for Acoustic Classification: A Proof of Concept : Abstract: Mosquito-borne diseases pose a serious global health threat, causing over 700,000 deaths annually. This work introduces a proof-of-concept Synthetic Swarm Mosquito Dataset for Acoustic Class...
- Uncertainty Quantification for Machine Learning: One Size Does Not Fit All : Abstract: Proper quantification of predictive uncertainty is essential for the use of machine learning in safety-critical applications. Various uncertainty measures have been proposed for this purpose...
- Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data : Abstract: We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural ``Ville even...
- TwinFormer: A Dual-Level Transformer for Long-Sequence Time-Series Forecasting : Abstract: TwinFormer is a hierarchical Transformer for long-sequence time-series forecasting. It divides the input into non-overlapping temporal patches and processes them in two stages: (1) a Local I...
- Balancing Accuracy and Speed: A Multi-Fidelity Ensemble Kalman Filter with a Machine Learning Surrogate Model : Abstract: Currently, more and more machine learning (ML) surrogates are being developed for computationally expensive physical models. In this work we investigate the use of a Multi-Fidelity Ensemble ...
- Optimized Learned Count-Min Sketch : Abstract: Count-Min Sketch (CMS) is a memory-efficient data structure for estimating the frequency of elements in a multiset. Learned Count-Min Sketch (LCMS) enhances CMS with a machine learning model...
- EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training : Abstract: Large-scale EEG foundation models have shown strong generalization across a range of downstream tasks, but their training remains resource-intensive due to the volume and variable quality of...
- MolGuidance: Advanced Guidance Strategies for Conditional Molecular Generation with Flow Matching : Abstract: Key objectives in conditional molecular generation include ensuring chemical validity, aligning generated molecules with target properties, promoting structural diversity, and enabling effic...
- HydroDiffusion: Diffusion-Based Probabilistic Streamflow Forecasting with a State Space Backbone : Abstract: Recent advances have introduced diffusion models for probabilistic streamflow forecasting, demonstrating strong early flood-warning skill. However, current implementations rely on recurrent ...
- On the Approximation Power of SiLU Networks: Exponential Rates and Depth Efficiency : Abstract: This article establishes a comprehensive theoretical framework demonstrating that SiLU (Sigmoid Linear Unit) activation networks achieve exponential approximation rates for smooth functions ...
- BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models : Abstract: The scale of transformer model pre-training is constrained by the increasing computation and communication cost. Low-rank bottleneck architectures offer a promising solution to significantly...
- High-Dimensional Tensor Discriminant Analysis: Low-Rank Discriminant Structure, Representation Synergy, and Theoretical Guarantees : Abstract: High-dimensional tensor-valued predictors arise in modern applications, increasingly as learned representations from neural networks. Existing tensor classification methods rely on sparsity ...
- Neural CDEs as Correctors for Learned Time Series Models : Abstract: Learned time-series models, whether continuous- or discrete-time, are widely used to forecast the states of a dynamical system. Such models generate multi-step forecasts either directly, by ...
- GraphPerf-RT: A Graph-Driven Performance Model for Hardware-Aware Scheduling of OpenMP Codes : Abstract: Performance prediction for OpenMP workloads on heterogeneous embedded SoCs is challenging due to complex interactions between task DAG structure, control-flow irregularity, cache and branc...
- CLOAK: Contrastive Guidance for Latent Diffusion-Based Data Obfuscation : Abstract: Data obfuscation is a promising technique for mitigating attribute inference attacks by semi-trusted parties with access to time-series data emitted by sensors. Recent advances leverage cond...
- SigTime: Learning and Visually Explaining Time Series Signatures : Abstract: Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful p...
- Physics-informed neural networks to solve inverse problems in unbounded domains : Abstract: Inverse problems are extensively studied in applied mathematics, with applications ranging from acoustic tomography for medical diagnosis to geophysical exploration. Physics informed neural ...
- Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning : Abstract: Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting...
- DFedReweighting: A Unified Framework for Objective-Oriented Reweighting in Decentralized Federated Learning : Abstract: Decentralized federated learning (DFL) has recently emerged as a promising paradigm that enables multiple clients to collaboratively train machine learning model through iterative rounds of ...
- EnviroLLM: Resource Tracking and Optimization for Local AI : Abstract: Large language models (LLMs) are increasingly deployed locally for privacy and accessibility, yet users lack tools to measure their resource usage, environmental impact, and efficiency metri...
- Learning to Extract Context for Context-Aware LLM Inference : Abstract: User prompts to large language models (LLMs) are often ambiguous or under-specified, and subtle contextual cues shaped by user intentions, prior knowledge, and risk factors strongly influenc...
- Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors : Abstract: Activation monitoring, which probes a model's internal states using lightweight classifiers, is an emerging tool for AI safety. However, its worst-case robustness under a misalignment threat...
- Phase transitions reveal hierarchical structure in deep neural networks : Abstract: Training Deep Neural Networks relies on the model converging on a high-dimensional, non-convex loss landscape toward a good minimum. Yet, much of the phenomenology of training remains ill un...
- Tiny Recursive Models on ARC-AGI-1: Inductive Biases, Identity Conditioning, and Test-Time Compute : Abstract: Tiny Recursive Models (TRM) were proposed as a parameter-efficient alternative to large language models for solving Abstraction and Reasoning Corpus (ARC) style tasks. The original work repo...
- Exploring Topological Bias in Heterogeneous Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) are characterized by their capacity of processing graph-structured data. However, due to the sparsity of labels under semi-supervised learning, they have been fo...
- Meta-Continual Mobility Forecasting for Proactive Handover Prediction : Abstract: Short-term mobility forecasting is a core requirement for proactive handover (HO) in cellular networks. Real-world mobility is highly non-stationary: abrupt turns, rapid speed changes, and u...
- Amortized Causal Discovery with Prior-Fitted Networks : Abstract: In recent years, differentiable penalized likelihood methods have gained popularity, optimizing the causal structure by maximizing its likelihood with respect to the data. However, recent re...
- Large Language Models as Generalist Policies for Network Optimization : Abstract: Designing control policies to ensure robust network services is essential to modern digital infrastructure. However, the dominant paradigm for network optimization relies on designing specia...
- D-STEER - Preference Alignment Techniques Learn to Behave, not to Believe -- Beneath the Surface, DPO as Steering Vector Perturbation in Activation Space : Abstract: Direct Preference Optimization (DPO) has become a standard recipe for aligning large language models, yet it is still unclear what kind of change it actually induces inside the network. This...
- Hybrid twinning using PBDW and DeepONet for the effective state estimation and prediction on partially known systems : Abstract: The accurate estimation of the state of complex uncertain physical systems requires reconciling theoretical models, with inherent imperfections, with noisy experimental data. In this work, w...
- On the Design of One-step Diffusion via Shortcutting Flow Paths : Abstract: Recent advances in few-step diffusion models have demonstrated their efficiency and effectiveness by shortcutting the probabilistic paths of diffusion models, especially in training one-step...
- Human-computer interactions predict mental health : Abstract: Scalable assessments of mental illness, the leading driver of disability worldwide, remain a critical roadblock toward accessible and equitable care. Here, we show that human-computer intera...
- KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment : Abstract: Wireless sensing has recently found widespread applications in diverse environments, including homes, offices, and public spaces. By analyzing patterns in channel state information (CSI), it...
- MAISI: Medical AI for Synthetic Imaging : Abstract: Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovativ...
- Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models : Abstract: Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing ...
- No Screening is More Efficient with Multiple Objects : Abstract: We study efficient mechanism design for allocating multiple heterogeneous objects. The aim is to maximize the residual surplus, the total value generated from an allocation minus the costs o...
- Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations : Abstract: Deep Gaussian process models typically employ discrete hierarchies, but recent advancements in differential Gaussian processes (DiffGPs) have extended these models to infinite depths. Howeve...
- Dy-mer: An Explainable DNA Sequence Representation Scheme using Dictionary Learning : Abstract: DNA sequences encode critical genetic information, yet their variable length and discrete nature impede direct utilization in deep learning models. Existing DNA representation schemes conver...
- Efficient Neural Common Neighbor for Temporal Graph Link Prediction : Abstract: Temporal graphs are widespread in real-world applications such as social networks, as well as trade and transportation networks. Predicting dynamic links within these evolving graphs is a ke...
- Fast Wrong-way Cycling Detection in CCTV Videos: Sparse Sampling is All You Need : Abstract: Effective monitoring of unusual transportation behaviors, such as wrong-way cycling (i.e., riding a bicycle or e-bike against designated traffic flow), is crucial for optimizing law enforcem...
- A Comprehensive Survey on Self-Supervised Learning for Recommendation : Abstract: Recommender systems play a crucial role in tackling the challenge of information overload by delivering personalized recommendations based on individual user preferences. Deep learning techn...
- PADS: Plug-and-Play 3D Human Pose Analysis via Diffusion Generative Modeling : Abstract: Diffusion models have demonstrated impressive capabilities in modeling complex data distributions and are increasingly applied in various generative tasks. In this work, we propose Pose Anal...
- Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach : Abstract: In this paper we present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained ...
- Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making : Abstract: Economic decision-making depends not only on structured signals such as prices and taxes, but also on unstructured language, including peer dialogue and media narratives. While multi-agent r...
- AI Copilots for Reproducibility in Science: A Case Study : Abstract: Open science initiatives seek to make research outputs more transparent, accessible, and reusable, but ensuring that published findings can be independently reproduced remains a persistent c...
- DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders : Abstract: Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation -- keeping users in the dark for a prolonged period. ...
- Feedforward 3D Editing via Text-Steerable Image-to-3D : Abstract: Recent progress in image-to-3D has opened up immense possibilities for design, AR/VR, and robotics. However, to use AI-generated 3D assets in real applications, a critical requirement is the...
- Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance : Abstract: As the online learning landscape evolves, the need for personalization is increasingly evident. Although educational resources are burgeoning, educators face challenges selecting materials t...
- Large-Language Memorization During the Classification of United States Supreme Court Cases : Abstract: Large-language models (LLMs) have been shown to respond in a variety of ways for classification tasks outside of question-answering. LLM responses are sometimes called "hallucinations" since...
- World Models Can Leverage Human Videos for Dexterous Manipulation : Abstract: Dexterous manipulation is challenging because it requires understanding how subtle hand motion influences the environment through contact with objects. We introduce DexWM, a Dexterous Manipu...
- From Code to Field: Evaluating the Robustness of Convolutional Neural Networks for Disease Diagnosis in Mango Leaves : Abstract: The validation and verification of artificial intelligence (AI) models through robustness assessment are essential to guarantee the reliable performance of intelligent systems facing real-wo...
- Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models : Abstract: Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and ve...
- DA-SSL: self-supervised domain adaptor to leverage foundational models in turbt histopathology slides : Abstract: Recent deep learning frameworks in histopathology, particularly multiple instance learning (MIL) combined with pathology foundational models (PFMs), have shown strong performance. However, P...
- ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding : Abstract: Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computati...
- DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication : Abstract: In this paper, we propose a Differentially Private Stochastic Gradient Push with Compressed communication (termed DP-CSGP) for decentralized learning over directed graphs. Different from exi...
- Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability : Abstract: Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to ea...
- Memory in the Age of AI Agents : Abstract: Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the fie...
- Verifying Rumors via Stance-Aware Structural Modeling : Abstract: Verifying rumors on social media is critical for mitigating the spread of false information. The stances of conversation replies often provide important cues to determine a rumor's veracity....
- Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS : Abstract: Machine learning based intrusion detection systems are increasingly targeted by black box adversarial attacks, where attackers craft evasive inputs using indirect feedback such as binary out...
- SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping : Abstract: Large language models (LLM) have achieved remarkable performance across a wide range of tasks. However, their substantial parameter sizes pose significant challenges for deployment on edge d...
- Non-Resolution Reasoning: A Framework for Preserving Semantic Ambiguity in Language Models : Abstract: Premature semantic collapse -- the forced early commitment to a single meaning -- remains a core architectural limitation of current language models. Softmax-driven competition and greedy de...
- SSAS: Cross-subject EEG-based Emotion Recognition through Source Selection with Adversarial Strategy : Abstract: Electroencephalographic (EEG) signals have long been applied in the field of affective brain-computer interfaces (aBCIs). Cross-subject EEG-based emotion recognition has demonstrated signifi...
- From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents : Abstract: While Large Language Model (LLM) agents show great potential for automated UI navigation such as automated UI testing and AI assistants, their efficiency has been largely overlooked. Our mot...
- End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery : Abstract: Purpose: Intraoperative navigation in spine surgery demands millimeter-level accuracy. Current systems based on intraoperative radiographic imaging and bone-anchored markers are invasive, ra...
- Detecting Emotion Drift in Mental Health Text Using Pre-Trained Transformers : Abstract: This study investigates emotion drift: the change in emotional state across a single text, within mental health-related messages. While sentiment analysis typically classifies an entire mess...
- Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3) : Abstract: This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given...
- FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models : Abstract: We introduce FIN-bench-v2, a unified benchmark suite for evaluating large language models in Finnish. FIN-bench-v2 consolidates Finnish versions of widely used benchmarks together with an up...
- Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models : Abstract: Securing digital text is becoming increasingly relevant due to the widespread use of large language models. Individuals' fear of losing control over data when it is being used to train such ...
- Face Identity Unlearning for Retrieval via Embedding Dispersion : Abstract: Face recognition systems rely on learning highly discriminative and compact identity clusters to enable accurate retrieval. However, as with other surveillance-oriented technologies, such sy...
- ALIGN-FL: Architecture-independent Learning through Invariant Generative component sharing in Federated Learning : Abstract: We present ALIGN-FL, a novel approach to distributed learning that addresses the challenge of learning from highly disjoint data distributions through selective sharing of generative compone...
- No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction : Abstract: In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model o...
- MiniLingua: A Small Open-Source LLM for European Languages : Abstract: Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models wit...
- Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration : Abstract: This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While ...
- LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models : Abstract: Diffusion models (DMs) have achieved remarkable success in image and video generation. However, they still struggle with (1) physical alignment and (2) out-of-distribution (OOD) instruction ...
- CORE: Contrastive Masked Feature Reconstruction on Graphs : Abstract: In the rapidly evolving field of self-supervised learning on graphs, generative and contrastive methodologies have emerged as two dominant approaches. Our study focuses on masked feature rec...
- Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models : Abstract: Speculative Decoding is a prominent technique for accelerating the autoregressive inference of large language models (LLMs) by employing a fast draft model to propose candidate token sequenc...
- WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory : Abstract: The Automatic Identification System (AIS) enables data-driven maritime surveillance but suffers from reliability issues and irregular intervals. We address vessel destination estimation usin...
- PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning : Abstract: Machine-learning (ML) models in polymer science typically treat a polymer as a single, perfectly defined molecular graph, even though real materials consist of stochastic ensembles of chains...
- Carrot, stick, or both? Price incentives for sustainable food choice in competitive environments : Abstract: Meat consumption is a major driver of global greenhouse gas emissions. While pricing interventions have shown potential to reduce meat intake, previous studies have focused on highly constra...
- SACn: Soft Actor-Critic with n-step Returns : Abstract: Soft Actor-Critic (SAC) is widely used in practical applications and is now one of the most relevant off-policy online model-free reinforcement learning (RL) methods. The technique of n-step...
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis : Abstract: The development of clinical-grade artificial intelligence in pathology is limited by the scarcity of diverse, high-quality annotated datasets. Generative models offer a potential solution bu...
- Intrinsic Image Fusion for Multi-View 3D Material Reconstruction : Abstract: We introduce Intrinsic Image Fusion, a method that reconstructs high-quality physically based materials from multi-view images. Material reconstruction is highly underconstrained and typical...
- DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass : Abstract: Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining t...
- From Overfitting to Reliability: Introducing the Hierarchical Approximate Bayesian Neural Network : Abstract: In recent years, neural networks have revolutionized various domains, yet challenges such as hyperparameter tuning and overfitting remain significant hurdles. Bayesian neural networks offer ...
- Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing : Abstract: Large language models (LLMs) have demonstrated strong performance on a variety of natural language processing (NLP) tasks. However, they often struggle with long-text sequences due to the ``...
- Diffusion-Based Restoration for Multi-Modal 3D Object Detection in Adverse Weather : Abstract: Multi-modal 3D object detection is important for reliable perception in robotics and autonomous driving. However, its effectiveness remains limited under adverse weather conditions due to we...
- TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning : Abstract: Reinforcement learning with verifiable rewards (RLVR) has proven effective in training large reasoning models (LRMs) by leveraging answer-verifiable signals to guide policy optimization, whi...
- Harmonizing Generalization and Specialization: Uncertainty-Informed Collaborative Learning for Semi-supervised Medical Image Segmentation : Abstract: Vision foundation models have demonstrated strong generalization in medical image segmentation by leveraging large-scale, heterogeneous pretraining. However, they often struggle to generaliz...
- OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning : Abstract: Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across div...
- Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation : Abstract: Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, i...
- UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era : Abstract: Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring. Most existing CD methods rely on supervi...
- A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval : Abstract: Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy. Its core relies on a coarse-to-fine hierarc...
- LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators : Abstract: Bilateral negotiation is a complex, context-sensitive task in which human negotiators dynamically adjust anchors, pacing, and flexibility to exploit power asymmetries and informal cues. We i...
- GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training : Abstract: Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densif...
- Scaling Bidirectional Spans and Span Violations in Attention Mechanism : Abstract: The canonical $O(N^2)$ Transformer remains the empirical performance frontier in sequence modeling, and its training can be further optimized by addressing geometric inefficiency. We propose...
- Calibrating Uncertainty for Zero-Shot Adversarial CLIP : Abstract: CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Previous work of adversarial fine-tuning largely focuses on matching the predicted logits ...
- Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning : Abstract: This paper proposes two new algorithms for the lane keeping system (LKS) in autonomous vehicles (AVs) operating under snowy road conditions. These algorithms use deep reinforcement learning ...
- Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping : Abstract: Accurately mapping legal terminology across languages remains a significant challenge, especially for language pairs like Chinese and Japanese, which share a large number of homographs with ...
- Content Adaptive based Motion Alignment Framework for Learned Video Compression : Abstract: Recent advances in end-to-end video compression have shown promising results owing to their unified end-to-end learning optimization. However, such generalized frameworks often lack content-...
- Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion : Abstract: The exponential growth of video content has created an urgent need for efficient multimodal moment retrieval systems. However, existing approaches face three critical challenges: (1) fixed-w...
- Investigating Data Pruning for Pretraining Biological Foundation Models at Scale : Abstract: Biological foundation models (BioFMs), pretrained on large-scale biological sequences, have recently shown strong potential in providing meaningful representations for diverse downstream bio...
- MADTempo: An Interactive System for Multi-Event Temporal Video Retrieval with Query Augmentation : Abstract: The rapid expansion of video content across online platforms has accelerated the need for retrieval systems capable of understanding not only isolated visual moments but also the temporal st...
- Cisco Integrated AI Security and Safety Framework Report : Abstract: Artificial intelligence (AI) systems are being readily and rapidly adopted, increasingly permeating critical domains: from consumer platforms and enterprise software to networked systems wit...
- CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs : Abstract: Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber threat intelligence (CTI). Fine-tuning is mostly done t...
- Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence : Abstract: Advancing artificial intelligence for physical sciences requires representations that are both interpretable and compatible with the underlying laws of nature. We introduce METASTRINGS, a sy...
- SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition : Abstract: Automated road sign recognition is a critical task for intelligent transportation systems, but traditional deep learning methods struggle with the sheer number of sign classes and the imprac...
- Optimal Labeler Assignment and Sampling for Active Learning in the Presence of Imperfect Labels : Abstract: Active Learning (AL) has garnered significant interest across various application domains where labeling training data is costly. AL provides a framework that helps practitioners query infor...
- Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM : Abstract: Large language models (LLMs) excel on multiple-choice clinical diagnosis benchmarks, yet it is unclear how much of this performance reflects underlying probabilistic reasoning. We study this...
- Information-Consistent Language Model Recommendations through Group Relative Policy Optimization : Abstract: Large Language Models (LLMs) are increasingly deployed in business-critical domains such as finance, education, healthcare, and customer support, where users expect consistent and reliable r...
- Selective Conformal Risk Control : Abstract: Reliable uncertainty quantification is essential for deploying machine learning systems in high-stakes domains. Conformal prediction provides distribution-free coverage guarantees but often ...
- SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding : Abstract: We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn suc...
- PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks : Abstract: Vertical Federated Learning (VFL) enables collaborative model training across organizations that share common user samples but hold disjoint feature spaces. Despite its potential, VFL is sus...
- Network Level Evaluation of Hangup Susceptibility of HRGCs using Deep Learning and Sensing Techniques: A Goal Towards Safer Future : Abstract: Steep profiled Highway Railway Grade Crossings (HRGCs) pose safety hazards to vehicles with low ground clearance, which may become stranded on the tracks, creating risks of train vehicle col...
- Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners : Abstract: Large-scale multimodal foundation models, particularly Contrastive Captioners (CoCa), have achieved state-of-the-art results by unifying contrastive alignment with generative captioning. Whi...
- Lemon: A Unified and Scalable 3D Multimodal Model for Universal Spatial Understanding : Abstract: Scaling large multimodal models (LMMs) to 3D understanding poses unique challenges: point cloud data is sparse and irregular, existing models rely on fragmented architectures with modality-s...
- On the continuity of flows : Abstract: Flow matching has emerged as a powerful framework for generative modeling through continuous normalizing flows. We investigate a potential topological constraint: when the prior distribution...
- Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects : Abstract: Agent memory has been touted as a dimension of growth for LLM-based applications, enabling agents that can accumulate experience, adapt across sessions, and move beyond single-shot question ...
- Decoding Human and AI Persuasion in National College Debate: Analyzing Prepared Arguments Through Aristotle's Rhetorical Principles : Abstract: Debate has been widely adopted as a strategy to enhance critical thinking skills in English Language Arts (ELA). One important skill in debate is forming effective argumentation, which requi...
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, LLaMA : Abstract: Prompt engineering has emerged as a critical factor influencing large language model (LLM) performance, yet the impact of pragmatic elements such as linguistic tone and politeness remains un...
- OPAL: Operator-Programmed Algorithms for Landscape-Aware Black-Box Optimization : Abstract: Black-box optimization often relies on evolutionary and swarm algorithms whose performance is highly problem dependent. We view an optimizer as a short program over a small vocabulary of sea...
- From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs : Abstract: Transformers exhibit a notable property of \emph{size generalization}, demonstrating an ability to extrapolate from smaller token sets to significantly longer ones. This behavior has been do...
- A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness : Abstract: The requirements for a falsifiable and non-trivial theory of consciousness significantly constrain such theories. Specifically, recent research on the Unfolding Argument and the Substitution...
- Liquid Reasoning Transformers: A Sudoku-Based Prototype for Chess-Scale Algorithmic Tasks : Abstract: The Liquid Reasoning Transformer (LRT) is a transformer architecture designed for inference with adaptive depths using iterative changes, discard-based correction, and a learned stopping mec...
- Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems : Abstract: Recent advances in agentic AI have shifted the focus from standalone Large Language Models (LLMs) to integrated systems that combine LLMs with tools, memory, and other agents to perform comp...
- Unveiling Statistical Significance of Online Regression over Multiple Datasets : Abstract: Despite extensive focus on techniques for evaluating the performance of two learning algorithms on a single dataset, the critical challenge of developing statistical tests to compare multipl...
- OLC-WA: Drift Aware Tuning-Free Online Classification with Weighted Average : Abstract: Real-world data sets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly dimi...
- State over Tokens: Characterizing the Role of Reasoning Tokens : Abstract: Large Language Models (LLMs) can generate reasoning tokens before their final answer to boost performance on complex tasks. While these sequences seem like human thought processes, empirical...
- Designing The Drive: Enhancing User Experience through Adaptive Interfaces in Autonomous Vehicles : Abstract: With the recent development and integration of autonomous vehicles (AVs) in transportation systems of the modern world, the emphasis on customizing user interfaces to optimize the overall us...
- Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models (ASTA) : Abstract: Voice-based interaction has emerged as a natural and intuitive modality for controlling IoT devices. However, speech-driven edge devices face a fundamental trade-off between cloud-based solu...
- CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence : Abstract: Recent advances in large multimodal models suggest that explicit reasoning mechanisms play a critical role in improving model reliability, interpretability, and cross-modal alignment. While ...
- Federated Learning with Feedback Alignment : Abstract: Federated Learning (FL) enables collaborative training across multiple clients while preserving data privacy, yet it struggles with data heterogeneity, where clients' data are not distribute...
- Intelligent Scientific Literature Explorer using Machine Learning (ISLE) : Abstract: The rapid acceleration of scientific publishing has created substantial challenges for researchers attempting to discover, contextualize, and interpret relevant literature. Traditional keywo...
- Robust Motion Generation using Part-level Reliable Data from Videos : Abstract: Extracting human motion from large-scale web videos offers a scalable solution to the data scarcity issue in character animation. However, some human parts in many video frames cannot be see...
- Co-Exploration and Co-Exploitation via Shared Structure in Multi-Task Bandits : Abstract: We propose a novel Bayesian framework for efficient exploration in contextual multi-task multi-armed bandit settings, where the context is only observed partially and dependencies between re...
- Theoretical Foundations of Prompt Engineering: From Heuristics to Expressivity : Abstract: Prompts can switch a model's behavior even when the weights are fixed, yet this phenomenon is rarely treated as a clean theoretical object rather than a heuristic. We study the family of fun...
- Quantum Implicit Neural Representations for 3D Scene Reconstruction and Novel View Synthesis : Abstract: Implicit neural representations (INRs) have become a powerful paradigm for continuous signal modeling and 3D scene reconstruction, yet classical networks suffer from a well-known spectral bi...
- Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches : Abstract: We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) att...
- Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling : Abstract: Subject-driven image generation has advanced from single- to multi-subject composition, while neglecting distinction, the ability to identify and generate the correct subject when inputs con...
- DynaGen: Unifying Temporal Knowledge Graph Reasoning with Dynamic Subgraphs and Generative Regularization : Abstract: Temporal Knowledge Graph Reasoning (TKGR) aims to complete missing factual elements along the timeline. Depending on the temporal position of the query, the task is categorized into interpol...
- PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks : Abstract: Deep neural networks possess strong representational capacity yet remain vulnerable to overfitting, primarily because neurons tend to co-adapt in ways that, while capturing complex and fine-...
- Anatomy-Guided Representation Learning Using a Transformer-Based Network for Thyroid Nodule Segmentation in Ultrasound Images : Abstract: Accurate thyroid nodule segmentation in ultrasound images is critical for diagnosis and treatment planning. However, ambiguous boundaries between nodules and surrounding tissues, size variat...
- DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model : Abstract: Multimodal Large Language Models have achieved impressive performance on a variety of vision-language tasks, yet their fine-grained visual perception and precise spatial reasoning remain lim...
- ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists : Abstract: Recent advances in Generative AI (GAI) have led to new opportunities for creativity support. However, this technology has raised ethical concerns in the visual artists community. This paper ...
- Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives : Abstract: We study syllogistic reasoning in LLMs from the logical and natural language perspectives. In process, we explore fundamental reasoning capabilities of the LLMs and the direction this resear...
- Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery : Abstract: Large Language Models (LLMs) excel at extracting common patterns from large-scale corpora, yet they struggle with rare, low-resource, or previously unseen scenarios-such as niche hardware de...
- Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models : Abstract: In this paper, we propose a method for generating layouts for image-based advertisements by leveraging a Vision-Language Model (VLM). Conventional advertisement layout techniques have predom...
- Detecting Prompt Injection Attacks Against Application Using Classifiers : Abstract: Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection datas...
- Coupled Variational Reinforcement Learning for Language Model General Reasoning : Abstract: While reinforcement learning have achieved impressive progress in language model reasoning, they are constrained by the requirement for verifiable rewards. Recent verifier-free RL methods ad...
- StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding : Abstract: Online video understanding is essential for applications like public surveillance and AI glasses. However, applying Multimodal Large Language Models (MLLMs) to this domain is challenging due...
- Skillful Subseasonal-to-Seasonal Forecasting of Extreme Events with a Multi-Sphere Coupled Probabilistic Model : Abstract: Accurate subseasonal-to-seasonal (S2S) prediction of extreme events is critical for resource planning and disaster mitigation under accelerating climate change. However, such predictions rem...
- Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? : Abstract: Large Language Models (LLMs) are increasingly being studied for Software Vulnerability Detection (SVD) and Repair (SVR). Individual LLMs have demonstrated code understanding abilities, but t...
- Noise-robust Contrastive Learning for Critical Transition Detection in Dynamical Systems : Abstract: Detecting critical transitions in complex, noisy time-series data is a fundamental challenge across science and engineering. Such transitions may be anticipated by the emergence of a low-dim...
- Can You Keep a Secret? Exploring AI for Care Coordination in Cognitive Decline : Abstract: The increasing number of older adults who experience cognitive decline places a burden on informal caregivers, whose support with tasks of daily living determines whether older adults can re...
- Explainable Artificial Intelligence for Economic Time Series: A Comprehensive Review and a Systematic Taxonomy of Methods and Concepts : Abstract: Explainable Artificial Intelligence (XAI) is increasingly required in computational economics, where machine-learning forecasters can outperform classical econometric models but remain diffi...
- Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public : Abstract: Artificial intelligence (AI) is increasingly permeating healthcare, from physician assistants to consumer applications. Since AI algorithm's opacity challenges human interaction, explainable...
- Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers : Abstract: With the advent of machine learning and quantum computing, the 21st century has gone from a place of relative algorithmic security, to one of speculative unease and possibly, cyber catastrop...
- AI-Driven Real-Time Kick Classification in Olympic Taekwondo Using Sensor Fusion : Abstract: Olympic Taekwondo has faced challenges in spectator engagement due to static, defensive gameplay and contentious scoring. Current Protector and Scoring Systems (PSS) rely on impact sensors a...
- Exploring the Design Space of Transition Matching : Abstract: Transition Matching (TM) is an emerging paradigm for generative modeling that generalizes diffusion and flow-matching models as well as continuous-state autoregressive models. TM, similar to...
- Dynamical modeling of nonlinear latent factors in multiscale neural activity with real-time inference : Abstract: Real-time decoding of target variables from multiple simultaneously recorded neural time-series modalities, such as discrete spiking activity and continuous field potentials, is important ac...
- Cross-Modal Representational Knowledge Distillation for Enhanced Spike-Informed LFP Modeling : Abstract: Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity ...
- Rough Sets for Explainability of Spectral Graph Clustering : Abstract: Graph Spectral Clustering methods (GSC) allow representing clusters of diverse shapes, densities, etc. However, the results of such algorithms, when applied e.g. to text documents, are hard ...
- A Graph Attention Network-Based Framework for Reconstructing Missing LiDAR Beams : Abstract: Vertical beam dropout in spinning LiDAR sensors triggered by hardware aging, dust, snow, fog, or bright reflections removes entire vertical slices from the point cloud and severely degrades ...
- SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema : Abstract: Although Large language Model (LLM)-powered information extraction (IE) systems have shown impressive capabilities, current fine-tuning paradigms face two major limitations: high training co...
- Dynamic Homophily with Imperfect Recall: Modeling Resilience in Adversarial Networks : Abstract: The purpose of this study is to investigate how homophily, memory constraints, and adversarial disruptions collectively shape the resilience and adaptability of complex networks. To achieve ...
- UniMark: Artificial Intelligence Generated Content Identification Toolkit : Abstract: The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regulatory demands. However, existing identification tools suffer from frag...
- Fractional Differential Equation Physics-Informed Neural Network and Its Application in Battery State Estimation : Abstract: Accurate estimation of the State of Charge (SOC) is critical for ensuring the safety, reliability, and performance optimization of lithium-ion battery systems. Conventional data-driven neura...
- V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval : Abstract: Streaming video large language models (LLMs) are increasingly used for real-time multimodal tasks such as video captioning, question answering, conversational agents, and augmented reality. ...
- GRC-Net: Gram Residual Co-attention Net for epilepsy prediction : Abstract: Prediction of epilepsy based on electroencephalogram (EEG) signals is a rapidly evolving field. Previous studies have traditionally applied 1D processing to the entire EEG signal. However, w...
- Accurate de novo sequencing of the modified proteome with OmniNovo : Abstract: Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteom...
- Stochastic Volatility Modelling with LSTM Networks: A Hybrid Approach for S&P 500 Index Volatility Forecasting : Abstract: Accurate volatility forecasting is essential in banking, investment, and risk management, because expectations about future market movements directly influence current decisions. This study ...
- Adversarially Probing Cross-Family Sound Symbolism in 27 Languages : Abstract: The phenomenon of sound symbolism, the non-arbitrary mapping between word sounds and meanings, has long been demonstrated through anecdotal experiments like Bouba Kiki, but rarely tested at ...
- Semantic Distance Measurement based on Multi-Kernel Gaussian Processes : Abstract: Semantic distance measurement is a fundamental problem in computational linguistics, providing a quantitative characterization of similarity or relatedness between text segments, and underpi...
- Comparison of different segmentation algorithms on brain volume and fractal dimension in infant brain MRIs : Abstract: Accurate segmentation of infant brain MRI is essential for quantifying developmental changes in structure and complexity. However, ongoing myelination and reduced tissue contrast make automa...
- Training Versatile Coding Agents in Synthetic Environments : Abstract: Prior works on training software engineering agents have explored utilizing existing resources such as issues on GitHub repositories to construct software engineering tasks and corresponding...
- Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving : Abstract: Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed fo...
- Not All Transparency Is Equal: Source Presentation Effects on Attention, Interaction, and Persuasion in Conversational Search : Abstract: Conversational search systems increasingly provide source citations, yet how citation or source presentation formats influence user engagement remains unclear. We conducted a crowdsourcing u...
- ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB : Abstract: Distracted driving contributes to fatal crashes worldwide. To address this, researchers are using driver activity recognition (DAR) with impulse radio ultra-wideband (IR-UWB) radar, which of...
- Epistemoverse: Toward an AI-Driven Knowledge Metaverse for Intellectual Heritage Preservation : Abstract: Large language models (LLMs) have often been characterized as "stochastic parrots" that merely reproduce fragments of their training data. This study challenges that assumption by demonstrat...
- Thermal RGB Fusion for Micro-UAV Wildfire Perimeter Tracking with Minimal Comms : Abstract: This study introduces a lightweight perimeter tracking method designed for micro UAV teams operating over wildfire environments under limited bandwidth conditions. Thermal image frames gener...
- Diffusion Language Model Inference with Monte Carlo Tree Search : Abstract: Diffusion language models (DLMs) have recently emerged as a compelling alternative to autoregressive generation, offering parallel generation and improved global coherence. During inference,...
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings : Abstract: So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bott...
- MeltwaterBench: Deep learning for spatiotemporal downscaling of surface meltwater : Abstract: The Greenland ice sheet is melting at an accelerated rate due to processes that are not fully understood and hard to measure. The distribution of surface meltwater can help understand these ...
- BaRISTA: Brain Scale Informed Spatiotemporal Representation of Human Intracranial Neural Activity : Abstract: Intracranial recordings have opened a unique opportunity to simultaneously measure activity across multiregional networks in the human brain. Recent works have focused on developing transfor...
- A Benchmark Dataset for Spatially Aligned Road Damage Assessment in Small Uncrewed Aerial Systems Disaster Imagery : Abstract: This paper presents the largest known benchmark dataset for road damage assessment and road alignment, and provides 18 baseline models trained on the CRASAR-U-DRIODs dataset's post-disaster ...
- MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models : Abstract: We introduce MixtureKit, a modular open-source framework for constructing, training, and analyzing Mixture-of-Experts (MoE) models from arbitrary pre-trained or fine-tuned models. MixtureKit...
- A neuro-symbolic framework for accountability in public-sector AI : Abstract: Automated eligibility systems increasingly determine access to essential public benefits, but the explanations they generate often fail to reflect the legal rules that authorize those decisi...
- Congestion Reduction in EV Charger Placement Using Traffic Equilibrium Models : Abstract: Growing EV adoption can worsen traffic conditions if chargers are sited without regard to their impact on congestion. We study how to strategically place EV chargers to reduce congestion usi...
- Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring : Abstract: Large Vision-Language Models (LVLMs) are vulnerable to a growing array of multimodal jailbreak attacks, necessitating defenses that are both generalizable to novel threats and efficient for ...
- The Instability of Safety: How Random Seeds and Temperature Expose Inconsistent LLM Refusal Behavior : Abstract: Current safety evaluations of large language models rely on single-shot testing, implicitly assuming that model responses are deterministic and representative of the model's safety alignment...
- Instruction-Tuning Open-Weight Language Models for BPMN Model Generation : Abstract: Domain models are central to software engineering, as they enable a shared understanding, guide implementation, and support automated analyses and model-driven development. Yet, despite thes...
- AI as a Teaching Partner: Early Lessons from Classroom Codesign with Secondary Teachers : Abstract: This report presents a comprehensive account of the Colleague AI Classroom pilot, a collaborative design (co-design) study that brought generative AI technology directly into real classrooms...
- Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus : Abstract: The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of "Long-Tail" training data. While fleets collect petabytes of video logs, identifying rare safety-critic...
- Hold Onto That Thought: Assessing KV Cache Compression On Reasoning : Abstract: Large language models (LLMs) have demonstrated remarkable performance on long-context tasks, but are often bottlenecked by memory constraints. Namely, the KV cache, which is used to signific...
- V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions : Abstract: While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice...
- Evidence-Driven Decision Support for AI Model Selection in Research Software Engineering : Abstract: The rapid proliferation of artificial intelligence (AI) models and methods presents growing challenges for research software engineers and researchers who must select, integrate, and maintai...
- Semantic search for 100M+ galaxy images using AI-generated captions : Abstract: Finding scientifically interesting phenomena through slow, manual labeling campaigns severely limits our ability to explore the billions of galaxy images produced by telescopes. In this work...
- Designing The Internet of Agents: A Framework for Trustworthy, Transparent, and Collaborative Human-Agent Interaction (HAX) : Abstract: The rise of generative and autonomous agents marks a fundamental shift in computing, demanding a rethinking of how humans collaborate with probabilistic, partially autonomous systems. We pre...
- Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations : Abstract: Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables inf...
- A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach : Abstract: Motion planning for high-level autonomous driving is constrained by a fundamental trade-off between the transparent, yet brittle, nature of pipeline methods and the adaptive, yet opaque, "bl...
- How AI Agents Follow the Herd of AI? Network Effects, History, and Machine Optimism : Abstract: Understanding decision-making in multi-AI-agent frameworks is crucial for analyzing strategic interactions in network-effect-driven contexts. This study investigates how AI agents navigate n...
- DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition : Abstract: Zero-shot skeleton-based action recognition (ZS-SAR) is fundamentally constrained by prevailing approaches that rely on aligning skeleton features with static, class-level semantics. This co...
- Unveiling User Perceptions in the Generative AI Era: A Sentiment-Driven Evaluation of AI Educational Apps' Role in Digital Transformation of e-Teaching : Abstract: The rapid integration of generative artificial intelligence into education has driven digital transformation in e-teaching, yet user perceptions of AI educational apps remain underexplored. ...
- The Agentic Regulator: Risks for AI in Finance and a Proposed Agent-based Framework for Governance : Abstract: Generative and agentic artificial intelligence is entering financial markets faster than existing governance can adapt. Current model-risk frameworks assume static, well-specified algorithms...
- Mapping AI Risk Mitigations: Evidence Scan and Preliminary AI Risk Mitigation Taxonomy : Abstract: Organizations and governments that develop, deploy, use, and govern AI must coordinate on effective risk mitigation. However, the landscape of AI risk mitigation frameworks is fragmented, us...
- Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction : Abstract: Cultivating higher-order cognitive abilities -- such as knowledge integration, critical thinking, and creativity -- in modern STEM education necessitates a pedagogical shift from passive kno...
- MONET -- Virtual Cell Painting of Brightfield Images and Time Lapses Using Reference Consistent Diffusion : Abstract: Cell painting is a popular technique for creating human-interpretable, high-contrast images of cell morphology. There are two major issues with cell paint: (1) it is labor-intensive and (2) ...
- Gene regulatory network inference algorithm based on spectral signed directed graph convolution : Abstract: Accurately reconstructing Gene Regulatory Networks (GRNs) is crucial for understanding gene functions and disease mechanisms. Single-cell RNA sequencing (scRNA-seq) technology provides vast ...
- FloraForge: LLM-Assisted Procedural Generation of Editable and Analysis-Ready 3D Plant Geometric Models For Agricultural Applications : Abstract: Accurate 3D plant models are crucial for computational phenotyping and physics-based simulation; however, current approaches face significant limitations. Learning-based reconstruction metho...
- Vibe Coding in Practice: Flow, Technical Debt, and Guidelines for Sustainable Use : Abstract: Vibe Coding (VC) is a form of software development assisted by generative AI, in which developers describe the intended functionality or logic via natural language prompts, and the AI system...
- Towards Accessible Physical AI: LoRA-Based Fine-Tuning of VLA Models for Real-World Robot Control : Abstract: Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in robotic manipulation,enabling robots to execute natural language commands through end-to-end learning from vi...
- A fine-grained look at causal effects in causal spaces : Abstract: The notion of causal effect is fundamental across many scientific disciplines. Traditionally, quantitative researchers have studied causal effects at the level of variables; for example, how...
- Beyond Automation: Rethinking Work, Creativity, and Governance in the Age of Generative AI : Abstract: The accelerating advancement of generative artificial intelligence (AI) systems is reshaping the nature, distribution and meaning of work, creativity, and economic security. This paper inves...
- Should AI Become an Intergenerational Civil Right? : Abstract: Artificial Intelligence (AI) is rapidly becoming a foundational layer of social, economic, and cognitive infrastructure. At the same time, the training and large-scale deployment of AI syste...
- Advancing Autonomous Driving System Testing: Demands, Challenges, and Future Directions : Abstract: Autonomous driving systems (ADSs) promise improved transportation efficiency and safety, yet ensuring their reliability in complex real-world environments remains a critical challenge. Effec...
- Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship" : Abstract: Over-aligning image generation models to a generalized aesthetic preference conflicts with user intent, particularly when ``anti-aesthetic" outputs are requested for artistic or critical pur...
- An Experience Report on a Pedagogically Controlled, Curriculum-Constrained AI Tutor for SE Education : Abstract: The integration of artificial intelligence (AI) into education continues to evoke both promise and skepticism. While past waves of technological optimism often fell short, recent advances in...
- Understanding Structural Representation in Foundation Models for Polymers : Abstract: From the relative scarcity of training data to the lack of standardized benchmarks, the development of foundation models for polymers face significant and multi-faceted challenges. At the co...
- It's About Time: The Temporal and Modal Dynamics of Copilot Usage : Abstract: We analyze 37.5 million deidentified conversations with Microsoft's Copilot between January and September 2025. Unlike prior analyses of AI usage, we focus not just on what people do with AI...
- WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving : Abstract: End-to-end autonomous driving systems based on vision-language-action (VLA) models integrate multimodal sensor inputs and language instructions to generate planning and control signals. Whil...
- Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops : Abstract: Agriculture supports over 80% of the population in the Tigray region of Ethiopia, where infrastructural disruptions limit access to expert crop disease diagnosis. We present an offline-first...
- Using Socio-economic Indicators, Smart Transit Systems, and Urban Simulator to Accelerate ZEV Adoption and Reduce VMT : Abstract: Globally, on-road transportation accounts for 15% of greenhouse gas (GHG) emissions and an estimated 385,000 premature deaths from PM2.5. Cities play a critical role in meeting IPCC targets,...
- Industrial AI Robustness Card: Evaluating and Monitoring Time Series Models : Abstract: Industrial AI practitioners face vague robustness requirements in emerging regulations and standards but lack concrete, implementation ready protocols. This paper introduces the Industrial A...
- On the Dangers of Bootstrapping Generation for Continual Learning and Beyond : Abstract: The use of synthetically generated data for training models is becoming a common practice. While generated data can augment the training data, repeated training on synthetic data raises conc...
- Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation : Abstract: Smart farming has emerged as a key technology for advancing modern agriculture through automation and intelligent control. However, systems relying on RGB cameras for perception and robotic ...
- Expert Assessment: The Systemic Environmental Risks of Artficial Intelligence : Abstract: Artificial intelligence (AI) is often presented as a key tool for addressing societal challenges, such as climate change. At the same time, AI's environmental footprint is expanding increasi...
- Hierarchical Task Offloading and Trajectory Optimization in Low-Altitude Intelligent Networks Via Auction and Diffusion-based MARL : Abstract: The low-altitude intelligent networks (LAINs) emerge as a promising architecture for delivering low-latency and energy-efficient edge intelligence in dynamic and infrastructure-limited envir...
- An Operator-Consistent Graph Neural Network for Learning Diffusion Dynamics on Irregular Meshes : Abstract: Classical numerical methods solve partial differential equations (PDEs) efficiently on regular meshes, but many of them become unstable on irregular domains. In practice, multiphysics intera...
- Generative Stochastic Optimal Transport: Guided Harmonic Path-Integral Diffusion : Abstract: We introduce Guided Harmonic Path-Integral Diffusion (GH-PID), a linearly-solvable framework for guided Stochastic Optimal Transport (SOT) with a hard terminal distribution and soft, applica...
- Adaptive Path Integral Diffusion: AdaPID : Abstract: Diffusion-based samplers -- Score Based Diffusions, Bridge Diffusions and Path Integral Diffusions -- match a target at terminal time, but the real leverage comes from choosing the schedule ...
- TopicProphet: Prophesies on Temporal Topic Trends and Stocks : Abstract: Stocks can't be predicted. Despite many hopes, this premise held itself true for many years due to the nature of quantitative stock data lacking causal logic along with rapid market changes ...
- GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search : Abstract: Graph Neural Networks (GNNs) have emerged as the state-of-the-art graph learning method. However, achieving efficient GNN inference on edge devices poses significant challenges, limiting the...
- Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry : Abstract: Enforcing exact symmetry in machine learning models often yields significant gains in scientific applications, serving as a powerful inductive bias. However, recent work suggests that relyin...
- Rep Smarter, Not Harder: AI Hypertrophy Coaching with Wearable Sensors and Edge Neural Networks : Abstract: Optimizing resistance training for hypertrophy requires balancing proximity to muscular failure, often quantified by Repetitions in Reserve (RiR), with fatigue management. However, subjectiv...
- Explainable AI for Smart Greenhouse Control: Interpretability of Temporal Fusion Transformer in the Internet of Robotic Things : Abstract: The integration of the Internet of Robotic Things (IoRT) in smart greenhouses has revolutionised precision agriculture by enabling efficient and autonomous environmental control. However, ex...
- KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs : Abstract: Whether attention key value (KV) states computed for one prompt for a small LLM can be reused to accelerate inference on a new similar prompt, giving an increase to the space to its context ...
- KH-FUNSD: A Hierarchical and Fine-Grained Layout Analysis Dataset for Low-Resource Khmer Business Document : Abstract: Automated document layout analysis remains a major challenge for low-resource, non-Latin scripts. Khmer is a language spoken daily by over 17 million people in Cambodia, receiving little att...
- Airport Passenger Flow Forecasting via Deformable Temporal-Spectral Transformer Approach : Abstract: Accurate forecasting of passenger flows is critical for maintaining the efficiency and resilience of airport operations. Recent advances in patch-based Transformer models have shown strong p...
- Spiking Manifesto : Abstract: Practically everything computers do is better, faster, and more power-efficient than the brain. For example, a calculator crunches numbers more energy-efficiently than any human. Yet AI mode...
- Vision Foundry: A System for Training Foundational Vision AI Models : Abstract: Self-supervised learning (SSL) leverages vast unannotated medical datasets, yet steep technical barriers limit adoption by clinical researchers. We introduce Vision Foundry, a code-free, HIP...
- Semantic Nutrition Estimation: Predicting Food Healthfulness from Text Descriptions : Abstract: Accurate nutritional assessment is critical for public health, but existing profiling systems require detailed data often unavailable or inaccessible from colloquial text descriptions of foo...
- Soft Decision Tree classifier: explainable and extendable PyTorch implementation : Abstract: We implemented a Soft Decision Tree (SDT) and a Short-term Memory Soft Decision Tree (SM-SDT) using PyTorch. The methods were extensively tested on simulated and clinical datasets. The SDT w...
- Performance and Efficiency of Climate In-Situ Data Reconstruction: Why Optimized IDW Outperforms kriging and Implicit Neural Representation : Abstract: This study evaluates three reconstruction methods for sparse climate data: the simple inverse distance weighting (IDW), the statistically grounded ordinary kriging (OK), and the advanced imp...
- CR3G: Causal Reasoning for Patient-Centric Explanations in Radiology Report Generation : Abstract: Automatic chest X-ray report generation is an important area of research aimed at improving diagnostic accuracy and helping doctors make faster decisions. Current AI models are good at findi...
- Active Inference with Reusable State-Dependent Value Profiles : Abstract: Adaptive behavior in volatile environments requires agents to switch among value-control regimes across latent contexts, but maintaining separate preferences, policy biases, and action-confi...
- Assessing Greenspace Attractiveness with ChatGPT, Claude, and Gemini: Do AI Models Reflect Human Perceptions? : Abstract: Understanding greenspace attractiveness is essential for designing livable and inclusive urban environments, yet existing assessment approaches often overlook informal or transient spaces an...
- The Ontological Dissonance Hypothesis: AI-Triggered Delusional Ideation as Folie a Deux Technologique : Abstract: This paper argues that contemporary large language models (LLMs) can contribute to psychotic involvement by creating interactions that resemble the relational dynamics of folie a deux. Drawi...
- Totalitarian Technics: The Hidden Cost of AI Scribes in Healthcare : Abstract: Artificial intelligence (AI) scribes, systems that record and summarise patient-clinician interactions, are promoted as solutions to administrative overload. This paper argues that their sig...
- Enhancing Urban Visual Place Recognition for Crowdsourced Flood Imagery via LLM-Guided Attention : Abstract: Crowdsourced street-view imagery from social media provides valuable real-time visual evidence of urban flooding and other crisis events, yet it often lacks reliable geographic metadata for ...
- A Multitask VAE for Time Series Preprocessing and Prediction of Blood Glucose Level : Abstract: Data preprocessing is a critical part of time series data analysis. Data from connected medical devices often have missing or abnormal values during acquisition. Handling such situations req...
- MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph : Abstract: Large language models with reasoning capabilities have demonstrated impressive performance across a wide range of domains. In clinical applications, a transparent, step-by-step reasoning pro...
- Defending the Hierarchical Result Models of Precedential Constraint : Abstract: In recent years, hierarchical case-based-reasoning models of precedential constraint have been proposed. In various papers, Trevor Bench-Capon criticised these models on the grounds that the...
- neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings : Abstract: Envy is a common human behavior that shapes competitiveness and can alter outcomes in team settings. As large language models (LLMs) increasingly act on behalf of humans in collaborative and...
- Differentiable Evolutionary Reinforcement Learning : Abstract: The design of effective reward functions presents a central and often arduous challenge in reinforcement learning (RL), particularly when developing autonomous agents for complex reasoning t...
- Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection : Abstract: Recent advances in Large Language Models (LLMs) have opened new perspectives for automation in optimization. While several studies have explored how LLMs can generate or solve optimization m...
- Error-Driven Prompt Optimization for Arithmetic Reasoning : Abstract: Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular ...
- MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data : Abstract: In medical data analysis, extracting deep insights from complex, multi-modal datasets is essential for improving patient care, increasing diagnostic accuracy, and optimizing healthcare opera...
- Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection : Abstract: Direct Preference Optimization (DPO) has emerged as a lightweight and effective alternative to Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with AI Feedback (...
- Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows : Abstract: We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web...
- SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning : Abstract: Effective human-agent collaboration is increasingly prevalent in real-world applications. Current trends in such collaborations are predominantly unidirectional, with users providing instruc...
- MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations : Abstract: Conversational agents often encounter ambiguous user requests, requiring an effective clarification to successfully complete tasks. While recent advancements in real-world applications favor...
- Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels : Abstract: As large language models increasingly mediate stigmatized health decisions, their capacity to genuinely understand complex psychological and physiological phenomena remains poorly evaluated....
- Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning : Abstract: Generating 3D-based body movements from speech shows great potential in extensive downstream applications, while it still suffers challenges in imitating realistic human movements. Predomina...
- Socratic Students: Teaching Language Models to Learn by Asking Questions : Abstract: Large Language Models (LLMs) excel at static interactions, where they answer user queries by retrieving knowledge encoded in their parameters. However, in many real-world settings, such as e...
- M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization : Abstract: Self-supervised reinforcement learning (RL) presents a promising approach for enhancing the reasoning capabilities of Large Language Models (LLMs) without reliance on expensive human-annotat...
- Towards Open Standards for Systemic Complexity in Digital Forensics : Abstract: The intersection of artificial intelligence (AI) and digital forensics (DF) is becoming increasingly complex, ubiquitous, and pervasive, with overlapping techniques and technologies being ad...
- Satisfiability Modulo Theory Meets Inductive Logic Programming : Abstract: Inductive Logic Programming (ILP) provides interpretable rule learning in relational domains, yet remains limited in its ability to induce and reason with numerical constraints. Classical IL...
- Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents : Abstract: As generative agents become increasingly sophisticated and deployed in long-term interactive scenarios, their memory management capabilities emerge as a critical bottleneck for both performa...
- Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution : Abstract: The transition of Large Language Models (LLMs) from passive code generators to autonomous agents introduces significant safety risks, specifically regarding destructive commands and inconsis...
- Causal Counterfactuals Reconsidered : Abstract: I develop a novel semantics for probabilities of counterfactuals that generalizes the standard Pearlian semantics: it applies to probabilistic causal models that cannot be extended into real...
- Personalized QoE Prediction: A Demographic-Augmented Machine Learning Framework for 5G Video Streaming Networks : Abstract: Quality of Experience (QoE) prediction is a critical component of modern multimedia systems, particularly for adaptive video streaming in 5G networks. Accurate QoE estimation enables intelli...
- Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning : Abstract: The widespread adoption of the "Games as a Service" model necessitates frequent content updates, placing immense pressure on quality assurance. In response, automated game testing has been v...
- WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment : Abstract: LLM-based agents often operate in a greedy, step-by-step manner, selecting actions solely based on the current observation without considering long-term consequences or alternative paths. Th...
- Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI : Abstract: Agentic memory is emerging as a key enabler for large language models (LLM) to maintain continuity, personalization, and long-term context in extended user interactions, critical capabilitie...
- Value-Aware Multiagent Systems : Abstract: This paper introduces the concept of value awareness in AI, which goes beyond the traditional value-alignment problem. Our definition of value awareness presents us with a concise and simpli...
- Modular and Multi-Path-Aware Offline Benchmarking for Mobile GUI Agents : Abstract: Mobile GUI Agents, AI agents capable of interacting with mobile applications on behalf of users, have the potential to transform human computer interaction. However, current evaluation pract...
- AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation : Abstract: LLM agents that use external tools can solve complex tasks, but understanding which tools actually contributed to a response remains a blind spot. No existing XAI methods address tool-level ...
- Large Language Newsvendor: Decision Biases and Cognitive Mechanisms : Abstract: Problem definition: Although large language models (LLMs) are increasingly integrated into business decision making, their potential to replicate and even amplify human cognitive biases caut...
- World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents : Abstract: Patch foraging involves the deliberate and planned process of determining the optimal time to depart from a resource-rich region and investigate potentially more beneficial alternatives. The...
- KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs : Abstract: Multimodal Large Language Models (MLLMs) show remarkable progress across many visual-language tasks; however, their capacity to evaluate artistic expression remains limited. Aesthetic concep...
- SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation : Abstract: Generative Artificial Intelligence (AI) has created unprecedented opportunities for creative expression, education, and research. Text-to-image systems such as DALL.E, Stable Diffusion, and ...
- MetaHGNIE: Meta-Path Induced Hypergraph Contrastive Learning in Heterogeneous Knowledge Graphs : Abstract: Node importance estimation (NIE) in heterogeneous knowledge graphs is a critical yet challenging task, essential for applications such as recommendation, knowledge reasoning, and question an...
- AI Transparency Atlas: Framework, Scoring, and Real-Time Model Card Evaluation Pipeline : Abstract: AI model documentation is fragmented across platforms and inconsistent in structure, preventing policymakers, auditors, and users from reliably assessing safety claims, data provenance, and ...
- Understanding Critical Thinking in Generative Artificial Intelligence Use: Development, Validation, and Correlates of the Critical Thinking in AI Use Scale : Abstract: Generative AI tools are increasingly embedded in everyday work and learning, yet their fluency, opacity, and propensity to hallucinate mean that users must critically evaluate AI outputs rat...
- Feeling the Strength but Not the Source: Partial Introspection in LLMs : Abstract: Recent work from Anthropic claims that frontier models can sometimes detect and name injected "concepts" represented as activation directions. We test the robustness of these claims. First, ...
- Entropy Collapse: A Universal Failure Mode of Intelligent Systems : Abstract: Intelligent systems are widely assumed to improve through learning, coordination, and optimization. However, across domains -- from artificial intelligence to economic institutions and biolo...
- Quantum-Aware Generative AI for Materials Discovery: A Framework for Robust Exploration Beyond DFT Biases : Abstract: Conventional generative models for materials discovery are predominantly trained and validated using data from Density Functional Theory (DFT) with approximate exchange-correlation functiona...
- A Multi-Axial Mindset for Ontology Design Lessons from Wikidata's Polyhierarchical Structure : Abstract: Traditional ontology design emphasizes disjoint and exhaustive top-level distinctions such as continuant vs. occurrent, abstract vs. concrete, or type vs. instance. These distinctions are us...
- A Geometric Theory of Cognition : Abstract: Human cognition spans perception, memory, intuitive judgment, deliberative reasoning, action selection, and social inference, yet these capacities are often explained through distinct comput...
- TA-KAND: Two-stage Attention Triple Enhancement and U-KAN based Diffusion For Few-shot Knowledge Graph Completion : Abstract: Knowledge Graphs (KGs), thanks to their concise and efficient triple-based structure, have been widely applied in intelligent question answering, recommender systems and other domains. Howev...
- Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation : Abstract: Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate sa...
- Rethinking Label Consistency of In-Context Learning: An Implicit Transductive Label Propagation Perspective : Abstract: Large language models (LLMs) perform in-context learning (ICL) with minimal supervised examples, which benefits various natural language processing (NLP) tasks. One of the critical research ...
- Reliable Policy Iteration: Performance Robustness Across Architecture and Environment Perturbations : Abstract: In a recent work, we proposed Reliable Policy Iteration (RPI), that restores policy iteration's monotonicity-of-value-estimates property to the function approximation setting. Here, we asses...
- The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification : Abstract: Monitoring forecasting systems is critical for customer satisfaction, profitability, and operational efficiency in large-scale retail businesses. We propose The Forecast Critic, a system tha...
- Context-Aware Agentic Power Resources Optimisation in EV using Smart2ChargeApp : Abstract: This paper presents a novel context-sensitive multi\-agent coordination for dynamic resource allocation (CAMAC-DRA) framework for optimizing smart electric vehicle (EV) charging ecosystems t...
- Log Anomaly Detection with Large Language Models via Knowledge-Enriched Fusion : Abstract: System logs are a critical resource for monitoring and managing distributed systems, providing insights into failures and anomalous behavior. Traditional log analysis techniques, including t...
- Hypergame Rationalisability: Solving Agent Misalignment In Strategic Play : Abstract: Differences in perception, information asymmetries, and bounded rationality lead game-theoretic players to derive a private, subjective view of the game that may diverge from the underlying ...
- AGAPI-Agents: An Open-Access Agentic AI Platform for Accelerated Materials Design on AtomGPT.org : Abstract: Artificial intelligence is reshaping scientific discovery, yet its use in materials research remains limited by fragmented computational ecosystems, reproducibility challenges, and dependenc...
- CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving : Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks, but their deployment in datacenter environments faces significant challenges due to the massive memory req...
- Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis : Abstract: A systematic, comparative investigation into the effects of low-quality data reveals a stark spectrum of robustness across modern probabilistic models. We find that autoregressive language m...
- Causal Strengths and Leaky Beliefs: Interpreting LLM Reasoning via Noisy-OR Causal Bayes Nets : Abstract: The nature of intelligence in both humans and machines is a longstanding question. While there is no universally accepted definition, the ability to reason causally is often regarded as a pi...
- Structured Personalization: Modeling Constraints as Matroids for Data-Minimal LLM Agents : Abstract: Personalizing Large Language Model (LLM) agents requires conditioning them on user-specific data, creating a critical trade-off between task utility and data disclosure. While the utility of...
- Mirror Mode in Fire Emblem: Beating Players at their own Game with Imitation and Reinforcement Learning : Abstract: Enemy strategies in turn-based games should be surprising and unpredictable. This study introduces Mirror Mode, a new game mode where the enemy AI mimics the personal strategy of a player to...
- Solving Parallel Machine Scheduling With Precedences and Cumulative Resource Constraints With Calendars : Abstract: The task of finding efficient production schedules for parallel machines is a challenge that arises in most industrial manufacturing domains. There is a large potential to minimize productio...
- A Monad-Based Clause Architecture for Artificial Age Score (AAS) in Large Language Models : Abstract: Large language models (LLMs) are often deployed as powerful yet opaque systems, leaving open how their internal memory and "self-like" behavior should be governed in a principled and auditab...
Research Sources: 695 | Generated: 12/16/2025
