AI RESEARCH PAPERS & ACADEMIC SOURCES
- UltraBoneUDF: Self-supervised Bone Surface Reconstruction from Ultrasound Based on Neural Unsigned Distance Functions : Abstract: Bone surface reconstruction is an essential component of computer-assisted orthopedic surgery(CAOS), forming the foundation for both preoperative planning and intraoperative guidance. Compar...
- From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs : Abstract: While Multimodal Large Language Models (MLLMs) have achieved impressive performance on semantic tasks, their spatial intelligence--crucial for robust and grounded AI systems--remains underde...
- VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis : Abstract: Generative models can now produce photorealistic imagery, yet they still struggle with the long, multi-goal prompts that professional designers issue. To expose this gap and better evaluate ...
- MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation : Abstract: With the rise of online dance-video platforms and rapid advances in AI-generated content (AIGC), music-driven dance generation has emerged as a compelling research direction. Despite substan...
- DSwinIR: Rethinking Window-based Attention for Image Restoration : Abstract: Image restoration has witnessed significant advancements with the development of deep learning models. Transformer-based models, particularly those using window-based self-attention, have be...
- Age-Defying Face Recognition with Transformer-Enhanced Loss : Abstract: Aging presents a significant challenge in face recognition, as changes in skin texture and tone can alter facial features over time, making it particularly difficult to compare images of the...
- LidarDM: Generative LiDAR Simulation in a Generated World : Abstract: We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unpre...
- Enhance Multi-Scale Spatial-Temporal Coherence for Configurable Video Anomaly Detection : Abstract: The development of unsupervised Video Anomaly Detection (VAD) relies on technologies in the field of signal processing. Since the anomaly is quite ambiguous and unbounded, different detectio...
- RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion : Abstract: Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion systems rely on either curated m...
- PCR-ORB: Enhanced ORB-SLAM3 with Point Cloud Refinement Using Deep Learning-Based Dynamic Object Filtering : Abstract: Visual Simultaneous Localization and Mapping (vSLAM) systems encounter substantial challenges in dynamic environments where moving objects compromise tracking accuracy and map consistency. T...
- EIR: Enhanced Image Representations for Medical Report Generation : Abstract: Generating medical reports from chest X-ray images is a critical and time-consuming task for radiologists, especially in emergencies. To alleviate the stress on radiologists and reduce the r...
- SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling : Abstract: Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in house...
- Interpretable Gallbladder Ultrasound Diagnosis: A Lightweight Web-Mobile Software Platform with Real-Time XAI : Abstract: Early and accurate detection of gallbladder diseases is crucial, yet ultrasound interpretation is challenging. To address this, an AI-driven diagnostic software integrates our hybrid deep le...
- HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery : Abstract: The rapid advancement of large language models (LLMs) and multimodal foundation models has sparked growing interest in their potential for scientific research. However, scientific intelligen...
- A Rapid GeoSAM-Based Workflow for Multi-Temporal Glacier Delineation: Case Study from Svalbard : Abstract: Consistent glacier boundary delineation is essential for monitoring glacier change, yet many existing approaches are difficult to scale across long time series and heterogeneous environments...
- SwinCCIR: An end-to-end deep network for Compton camera imaging reconstruction : Abstract: Compton cameras (CCs) are a kind of gamma cameras which are designed to determine the directions of incident gammas based on the Compton scatter. However, the reconstruction of CCs face prob...
- Mesquite MoCap: Democratizing Real-Time Motion Capture with Affordable, Bodyworn IoT Sensors and WebXR SLAM : Abstract: Motion capture remains costly and complex to deploy, limiting use outside specialized laboratories. We present Mesquite, an open-source, low-cost inertial motion-capture system that combines...
- Semantic contrastive learning for orthogonal X-ray computed tomography reconstruction : Abstract: X-ray computed tomography (CT) is widely used in medical imaging, with sparse-view reconstruction offering an effective way to reduce radiation dose. However, ill-posed conditions often resu...
- Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation : Abstract: The precise prediction of human mobility has produced significant socioeconomic impacts, such as location recommendations and evacuation suggestions. However, existing methods suffer from li...
- VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models : Abstract: While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To addre...
- JParc: Joint cortical surface parcellation with registration : Abstract: Cortical surface parcellation is a fundamental task in both basic neuroscience research and clinical applications, enabling more accurate mapping of brain regions. Model-based and learning-b...
- MEGA-PCC: A Mamba-based Efficient Approach for Joint Geometry and Attribute Point Cloud Compression : Abstract: Joint compression of point cloud geometry and attributes is essential for efficient 3D data representation. Existing methods often rely on post-hoc recoloring procedures and manually tuned b...
- Super-Resolution Enhancement of Medical Images Based on Diffusion Model: An Optimization Scheme for Low-Resolution Gastric Images : Abstract: Capsule endoscopy has enabled minimally invasive gastrointestinal imaging, but its clinical utility is limited by the inherently low resolution of captured images due to hardware, power, and...
- Complex Swin Transformer for Accelerating Enhanced SMWI Reconstruction : Abstract: Susceptibility Map Weighted Imaging (SMWI) is an advanced magnetic resonance imaging technique used to detect nigral hyperintensity in Parkinsons disease. However, full resolution SMWI acqui...
- AI-Enhanced Virtual Biopsies for Brain Tumor Diagnosis in Low Resource Settings : Abstract: Timely brain tumor diagnosis remains challenging in low-resource clinical environments where expert neuroradiology interpretation, high-end MRI hardware, and invasive biopsy procedures may b...
- Field strength-dependent performance variability in deep learning-based analysis of magnetic resonance imaging : Abstract: This study quantitatively evaluates the impact of MRI scanner magnetic field strength on the performance and generalizability of deep learning-based segmentation algorithms. Three publicly a...
- SlimEdge: Lightweight Distributed DNN Deployment on Constrained Hardware : Abstract: Deep distributed networks (DNNs) have become central to modern computer vision, yet their deployment on resource-constrained edge devices remains hindered by substantial parameter counts and...
- Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion : Abstract: Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive mu...
- Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation : Abstract: Transparent objects remain notoriously hard for perception systems: refraction, reflection and transmission break the assumptions behind stereo, ToF and purely discriminative monocular depth...
- IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition : Abstract: Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods ha...
- OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding : Abstract: Omnimodal large language models have made significant strides in unifying audio and visual modalities; however, they often lack the fine-grained cross-modal understanding and have difficulty...
- Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception : Abstract: Spatio-temporal alignment is crucial for temporal modeling of end-to-end (E2E) perception in autonomous driving (AD), providing valuable structural and textural prior information. Existing m...
- Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging : Abstract: The early detection of pancreatic neoplasm is a major clinical dilemma, and it is predominantly so because tumors are likely to occur with minimal contrast margins and a large spread anatomy...
- Detection Fire in Camera RGB-NIR : Abstract: Improving the accuracy of fire detection using infrared night vision cameras remains a challenging task. Previous studies have reported strong performance with popular detection models. For ...
- Same or Not? Enhancing Visual Perception in Vision-Language Models : Abstract: Vision-language models (VLMs) excel at broad visual understanding but remain coarse-grained, exhibit visual biases, and miss subtle visual details. Existing training corpora reinforce this l...
- LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation : Abstract: Real-time video generation via diffusion is essential for building general-purpose multimodal interactive AI systems. However, the simultaneous denoising of all video frames with bidirection...
- ProGuard: Towards Proactive Multimodal Safeguard : Abstract: The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we p...
- Image Denoising Using Global and Local Circulant Representation : Abstract: The proliferation of imaging devices and countless image data generated every day impose an increasingly high demand on efficient and effective image denoising. In this paper, we establish a...
- ThinkGen: Generalized Thinking for Visual Generation : Abstract: Recent progress in Multimodal Large Language Models (MLLMs) demonstrates that Chain-of-Thought (CoT) reasoning enables systematic solutions to complex understanding tasks. However, its exten...
- RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature : Abstract: The integration of Multimodal Large Language Models (MLLMs) into chemistry promises to revolutionize scientific discovery, yet their ability to comprehend the dense, graphical language of re...
- PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation : Abstract: Recent advances in diffusion models have notably enhanced text-to-image (T2I) generation quality, but they also raise the risk of generating unsafe content. Traditional safety methods like t...
- PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis : Abstract: Recent pathological foundation models have substantially advanced visual representation learning and multimodal interaction. However, most models still rely on a static inference paradigm in...
- AnyMS: Bottom-up Attention Decoupling for Layout-guided and Training-free Multi-subject Customization : Abstract: Multi-subject customization aims to synthesize multiple user-specified subjects into a coherent image. To address issues such as subjects missing or conflicts, recent works incorporate layou...
- Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution : Abstract: Diffusion models have become a leading paradigm for image super-resolution (SR), but existing methods struggle to guarantee both the high-frequency perceptual quality and the low-frequency s...
- IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation : Abstract: Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and...
- Multi-label Classification with Panoptic Context Aggregation Networks : Abstract: Context modeling is crucial for visual recognition, enabling highly discriminative image representations by integrating both intrinsic and extrinsic relationships between objects and labels ...
- TV-RAG: A Temporal-aware and Semantic Entropy-Weighted Framework for Long Video Retrieval and Understanding : Abstract: Large Video Language Models (LVLMs) have rapidly emerged as the focus of multimedia AI research. Nonetheless, when confronted with lengthy videos, these models struggle: their temporal windo...
- SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context : Abstract: Recent research has focused on using convolutional neural networks (CNNs) as the backbones in two-view correspondence learning, demonstrating significant superiority over methods based on mu...
- MCI-Net: A Robust Multi-Domain Context Integration Network for Point Cloud Registration : Abstract: Robust and discriminative feature learning is critical for high-quality point cloud registration. However, existing deep learning-based methods typically rely on Euclidean neighborhood-based...
- HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation : Abstract: We present HY-Motion 1.0, a series of state-of-the-art, large-scale, motion generation models capable of generating 3D human motions from textual descriptions. HY-Motion 1.0 represents the f...
- Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators : Abstract: Image-to-Image (I2I) translation involves converting an image from one domain to another. Deterministic I2I translation, such as in image super-resolution, extends this concept by guaranteei...
- Automated river gauge plate reading using a hybrid object detection and generative AI framework in the Limpopo River Basin : Abstract: Accurate and continuous monitoring of river water levels is essential for flood forecasting, water resource management, and ecological protection. Traditional hydrological observation method...
- CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models : Abstract: Large Vision-Language Models (LVLMs) have achieved impressive progress in multi-modal understanding and generation. However, they still tend to produce hallucinated content that is inconsist...
- RealX3D: A Physically-Degraded 3D Benchmark for Multi-view Visual Restoration and Reconstruction : Abstract: We introduce RealX3D, a real-capture benchmark for multi-view visual restoration and 3D reconstruction under diverse physical degradations. RealX3D groups corruptions into four families, inc...
- Fuzzy-Logic and Deep Learning for Environmental Condition-Aware Road Surface Classification : Abstract: Monitoring states of road surfaces provides valuable information for the planning and controlling vehicles and active vehicle control systems. Classical road monitoring methods are expensive...
- Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision : Abstract: Diffusion models have achieved impressive results in generative tasks such as text-to-image synthesis, yet they often struggle to fully align outputs with nuanced user intent and maintain co...
- DriveLaW:Unifying Planning and Video Generation in a Latent Driving World : Abstract: World models have become crucial for autonomous driving, as they learn how scenarios evolve over time to address the long-tail challenges of the real world. However, current approaches releg...
- Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment : Abstract: The aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC. However, its inherently complex nature, spanning visual perception, c...
- SOFTooth: Semantics-Enhanced Order-Aware Fusion for Tooth Instance Segmentation : Abstract: Three-dimensional (3D) tooth instance segmentation remains challenging due to crowded arches, ambiguous tooth-gingiva boundaries, missing teeth, and rare yet clinically important third molar...
- SoulX-LiveTalk Technical Report : Abstract: Deploying massive diffusion models for real-time, infinite-duration, audio-driven avatar generation presents a significant engineering challenge, primarily due to the conflict between comput...
- NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization : Abstract: The accessibility surge and abuse risks of user-friendly image editing models have created an urgent need for generalizable, up-to-date methods for Image Manipulation Detection and Localizat...
- MGCA-Net: Multi-Graph Contextual Attention Network for Two-View Correspondence Learning : Abstract: Two-view correspondence learning is a key task in computer vision, which aims to establish reliable matching relationships for applications such as camera pose estimation and 3D reconstructi...
- SpatialMosaic: A Multiview VLM Dataset for Partial Visibility : Abstract: The rapid progress of Multimodal Large Language Models (MLLMs) has unlocked the potential for enhanced 3D scene understanding and spatial reasoning. However, existing approaches often rely o...
- CountGD++: Generalized Prompting for Open-World Counting : Abstract: The flexibility and accuracy of methods for automatically counting objects in images and videos are limited by the way the object can be specified. While existing methods allow users to desc...
- CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation : Abstract: Computer-Aided Design (CAD) is essential in industrial design, but the complexity of traditional CAD modeling and workflows presents significant challenges for automating the generation of h...
- MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images : Abstract: Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledge, offering a transformative a...
- Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition : Abstract: Micro-gesture recognition and behavior-based emotion prediction are both highly challenging tasks that require modeling subtle, fine-grained human behaviors, primarily leveraging video and s...
- YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection : Abstract: Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense...
- Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization : Abstract: Although Diffusion Transformer (DiT) has emerged as a predominant architecture for image and video generation, its iterative denoising process results in slow inference, which hinders broade...
- Contour Information Aware 2D Gaussian Splatting for Image Representation : Abstract: Image representation is a fundamental task in computer vision. Recently, Gaussian Splatting has emerged as an efficient representation framework, and its extension to 2D image representation...
- ASemConsist: Adaptive Semantic Feature Control for Training-Free Identity-Consistent Generation : Abstract: Recent text-to-image diffusion models have significantly improved visual quality and text alignment. However, generating a sequence of images while preserving consistent character identity a...
- ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing : Abstract: Remote sensing change detection (RSCD), a complex multi-image inference task, traditionally uses pixel-based operators or encoder-decoder networks that inadequately capture high-level semant...
- Multimodal Interpretation of Remote Sensing Images: Dynamic Resolution Input Strategy and Multi-scale Vision-Language Alignment Mechanism : Abstract: Multimodal fusion of remote sensing images serves as a core technology for overcoming the limitations of single-source data and improving the accuracy of surface information extraction, whic...
- RS-Prune: Training-Free Data Pruning at High Ratios for Efficient Remote Sensing Diffusion Foundation Models : Abstract: Diffusion-based remote sensing (RS) generative foundation models are cruial for downstream tasks. However, these models rely on large amounts of globally representative data, which often con...
- Physics-Inspired Modeling and Content Adaptive Routing in an Infrared Gas Leak Detection Network : Abstract: Detecting infrared gas leaks is critical for environmental monitoring and industrial safety, yet remains difficult because plumes are faint, small, semitransparent, and have weak, diffuse bo...
- SURE Guided Posterior Sampling: Trajectory Correction for Diffusion-Based Inverse Problems : Abstract: Diffusion models have emerged as powerful learned priors for solving inverse problems. However, current iterative solving approaches which alternate between diffusion sampling and data consi...
- Anomaly Detection by Effectively Leveraging Synthetic Images : Abstract: Anomaly detection plays a vital role in industrial manufacturing. Due to the scarcity of real defect images, unsupervised approaches that rely solely on normal images have been extensively s...
- Bridging Your Imagination with Audio-Video Generation via a Unified Director : Abstract: Existing AI-driven video creation systems typically treat script drafting and key-shot design as two disjoint tasks: the former relies on large language models, while the latter depends on i...
- Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information : Abstract: Fashion item detection is challenging due to the ambiguities introduced by the highly diverse appearances of fashion items and the similarities among item subcategories. To address this chal...
- MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios? : Abstract: While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aeria...
- AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding : Abstract: Understanding road scenes for visual perception remains crucial for intelligent self-driving cars. In particular, it is desirable to detect unexpected small road hazards reliably in real-tim...
- Task-oriented Learnable Diffusion Timesteps for Universal Few-shot Learning of Dense Tasks : Abstract: Denoising diffusion probabilistic models have brought tremendous advances in generative tasks, achieving state-of-the-art performance thus far. Current diffusion model-based applications exp...
- Exploring Syn-to-Real Domain Adaptation for Military Target Detection : Abstract: Object detection is one of the key target tasks of interest in the context of civil and military applications. In particular, the real-world deployment of target detection methods is pivotal...
- ForCM: Forest Cover Mapping from Multispectral Sentinel-2 Image by Integrating Deep Learning with Object-Based Image Analysis : Abstract: This research proposes "ForCM", a novel approach to forest cover mapping that combines Object-Based Image Analysis (OBIA) with Deep Learning (DL) using multispectral Sentinel-2 imagery. The ...
- GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation : Abstract: Driving World Models (DWMs) have been developing rapidly with the advances of generative models. However, existing DWMs lack 3D scene understanding capabilities and can only generate content...
- GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection : Abstract: Image-based 3D object detection aims to identify and localize objects in 3D space using only RGB images, eliminating the need for expensive depth sensors required by point cloud-based method...
- REVEALER: Reinforcement-Guided Visual Reasoning for Element-Level Text-Image Alignment Evaluation : Abstract: Evaluating the alignment between textual prompts and generated images is critical for ensuring the reliability and usability of text-to-image (T2I) models. However, most existing evaluation ...
- GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection : Abstract: Semi-supervised 3D object detection, aiming to explore unlabeled data for boosting 3D object detectors, has emerged as an active research area in recent years. Some previous methods have sho...
- Domain-Shift Immunity in Deep Deformable Registration via Local Feature Representations : Abstract: Deep learning has advanced deformable image registration, surpassing traditional optimization-based methods in both accuracy and efficiency. However, learning-based models are widely believe...
- PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion : Abstract: We present PathoSyn, a unified generative framework for Magnetic Resonance Imaging (MRI) image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable ...
- MedSAM-based lung masking for multi-label chest X-ray classification : Abstract: Chest X-ray (CXR) imaging is widely used for screening and diagnosing pulmonary abnormalities, yet automated interpretation remains challenging due to weak disease signals, dataset bias, and...
- Video-BrowseComp: Benchmarking Agentic Video Research on Open Web : Abstract: The evolution of autonomous agents is redefining information seeking, transitioning from passive retrieval to proactive, open-ended web research. However, while textual and static multimodal...
- 3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds : Abstract: Despite recent progress in 3D self-supervised learning, collecting large-scale 3D scene scans remains expensive and labor-intensive. In this work, we investigate whether 3D representations c...
- Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion : Abstract: Semi-supervised remote sensing (RS) image semantic segmentation offers a promising solution to alleviate the burden of exhaustive annotation, yet it fundamentally struggles with pseudo-label...
- An Architecture-Led Hybrid Report on Body Language Detection Project : Abstract: This report provides an architecture-led analysis of two modern vision-language models (VLMs), Qwen2.5-VL-7B-Instruct and Llama-4-Scout-17B-16E-Instruct, and explains how their architectural...
- With Great Context Comes Great Prediction Power: Classifying Objects via Geo-Semantic Scene Graphs : Abstract: Humans effortlessly identify objects by leveraging a rich understanding of the surrounding scene, including spatial relationships, material properties, and the co-occurrence of other objects...
- OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding : Abstract: 3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Mode...
- A Low-Cost UAV Deep Learning Pipeline for Integrated Apple Disease Diagnosis,Freshness Assessment, and Fruit Detection : Abstract: Apple orchards require timely disease detection, fruit quality assessment, and yield estimation, yet existing UAV-based systems address such tasks in isolation and often rely on costly multi...
- Reverse Personalization : Abstract: Recent text-to-image diffusion models have demonstrated remarkable generation of realistic facial images conditioned on textual prompts and human identities, enabling creating personalized f...
- Spatial-aware Symmetric Alignment for Text-guided Medical Image Segmentation : Abstract: Text-guided Medical Image Segmentation has shown considerable promise for medical image segmentation, with rich clinical text serving as an effective supplement for scarce data. However, cur...
- PoseStreamer: A Multi-modal Framework for 6DoF Pose Estimation of Unseen Moving Objects : Abstract: Six degree of freedom (6DoF) pose estimation for novel objects is a critical task in computer vision, yet it faces significant challenges in high-speed and low-light scenarios where standard...
- RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance : Abstract: Camouflaged image generation (CIG) has recently emerged as an efficient alternative for acquiring high-quality training data for camouflaged object detection (COD). However, existing CIG met...
- YOLO-IOD: Towards Real Time Incremental Object Detection : Abstract: Current methods for incremental object detection (IOD) primarily rely on Faster R-CNN or DETR series detectors; however, these approaches do not accommodate the real-time YOLO detection fram...
- Wavelet-based Multi-View Fusion of 4D Radar Tensor and Camera for Robust 3D Object Detection : Abstract: 4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, its inherent sparsity and limite...
- CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision : Abstract: Conventional object detectors rely on cross-entropy classification, which can be vulnerable to class imbalance and label noise. We propose CLIP-Joint-Detect, a simple and detector-agnostic f...
- Learning Where to Focus: Density-Driven Guidance for Detecting Dense Tiny Objects : Abstract: High-resolution remote sensing imagery increasingly contains dense clusters of tiny objects, the detection of which is extremely challenging due to severe mutual occlusion and limited pixel ...
- ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving : Abstract: Autonomous driving requires generating safe and reliable trajectories from complex multimodal inputs. Traditional modular pipelines separate perception, prediction, and planning, while recen...
- JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation : Abstract: This paper presents JavisGPT, the first unified multimodal large language model (MLLM) for Joint Audio-Video (JAV) comprehension and generation. JavisGPT adopts a concise encoder-LLM-decoder...
- Hash Grid Feature Pruning : Abstract: Hash grids are widely used to learn an implicit neural field for Gaussian splatting, serving either as part of the entropy model or for inter-frame prediction. However, due to the irregular ...
- Guided Path Sampling: Steering Diffusion Models Back on Track with Principled Path Guidance : Abstract: Iterative refinement methods based on a denoising-inversion cycle are powerful tools for enhancing the quality and control of diffusion models. However, their effectiveness is critically lim...
- SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation : Abstract: The recent integration of artificial intelligence into medical imaging has driven remarkable advances in automated organ segmentation. However, most existing 3D segmentation frameworks rely ...
- M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models : Abstract: Text-to-image diffusion models may generate harmful or copyrighted content, motivating research on concept erasure. However, existing approaches primarily focus on erasing concepts from text...
- Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples : Abstract: Deep learning models are known to often learn features that spuriously correlate with the class label during training but are irrelevant to the prediction task. Existing methods typically ad...
- Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs : Abstract: Foundation models have been successful in natural language processing and computer vision because they are capable of capturing the underlying structures (foundation) of natural languages. H...
- MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments : Abstract: Socially compliant navigation requires structured reasoning over dynamic pedestrians and physical constraints to ensure safe and interpretable decisions. However, existing social navigation ...
- A Minimal Solver for Relative Pose Estimation with Unknown Focal Length from Two Affine Correspondences : Abstract: In this paper, we aim to estimate the relative pose and focal length between two views with known intrinsic parameters except for an unknown focal length from two affine correspondences (ACs...
- 3D Scene Change Modeling With Consistent Multi-View Aggregation : Abstract: Change detection plays a vital role in scene monitoring, exploration, and continual reconstruction. Existing 3D change detection methods often exhibit spatial inconsistency in the detected c...
- KANO: Kolmogorov-Arnold Neural Operator for Image Super-Resolution : Abstract: The highly nonlinear degradation process, complex physical interactions, and various sources of uncertainty render single-image Super-resolution (SR) a particularly challenging task. Existin...
- Depth Anything in $360^\circ$: Towards Scale Invariance in the Wild : Abstract: Panoramic depth estimation provides a comprehensive solution for capturing complete $360^\circ$ environmental structural information, offering significant benefits for robotics and AR/VR app...
- EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation : Abstract: Humans exhibit adaptive, context-sensitive responses to egocentric visual input. However, faithfully modeling such reactions from egocentric video remains challenging due to the dual require...
- Evaluating the Performance of Open-Vocabulary Object Detection in Low-quality Image : Abstract: Open-vocabulary object detection enables models to localize and recognize objects beyond a predefined set of categories and is expected to achieve recognition capabilities comparable to huma...
- Medical Scene Reconstruction and Segmentation based on 3D Gaussian Representation : Abstract: 3D reconstruction of medical images is a key technology in medical image analysis and clinical diagnosis, providing structural visualization support for disease assessment and surgical plann...
- VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM : Abstract: Vision-Language Tracking aims to continuously localize objects described by a visual template and a language description. Existing methods, however, are typically limited to local search, ma...
- Parallel Diffusion Solver via Residual Dirichlet Policy Optimization : Abstract: Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleratio...
- Plug In, Grade Right: Psychology-Inspired AGIQA : Abstract: Existing AGIQA models typically estimate image quality by measuring and aggregating the similarities between image embeddings and text embeddings derived from multi-grade quality description...
- Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting : Abstract: Understanding semantics and dynamics has been crucial for embodied agents in various tasks. Both tasks have much more data redundancy than the static scene understanding task. We formulate t...
- Neighbor-Aware Token Reduction via Hilbert Curve for Vision Transformers : Abstract: Vision Transformers (ViTs) have achieved remarkable success in visual recognition tasks, but redundant token representations limit their computational efficiency. Existing token merging and ...
- TrimTokenator-LC: Towards Adaptive Visual Token Pruning for Large Multimodal Models with Long Contexts : Abstract: Large Multimodal Models (LMMs) have proven effective on various tasks. They typically encode visual inputs into Original Model sequences of tokens, which are then concatenated with textual t...
- Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation : Abstract: This paper addresses the problem of decomposed 4D scene reconstruction from multi-view videos. Recent methods achieve this by lifting video segmentation results to a 4D representation throug...
- Improved cystic hygroma detection from prenatal imaging using ultrasound-specific self-supervised representation learning : Abstract: Cystic hygroma is a high-risk prenatal ultrasound finding that portends high rates of chromosomal abnormalities, structural malformations, and adverse pregnancy outcomes. Automated detection...
- SCPainter: A Unified Framework for Realistic 3D Asset Insertion and Novel View Synthesis : Abstract: 3D Asset insertion and novel view synthesis (NVS) are key components for autonomous driving simulation, enhancing the diversity of training data. With better training data that is diverse an...
- Autoregressive Flow Matching for Motion Prediction : Abstract: Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, ...
- CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation : Abstract: Recent text-to-image diffusion models have achieved remarkable visual fidelity but often struggle with semantic alignment to complex prompts. We introduce CritiFusion, a novel inference-time...
- Unleashing Foundation Vision Models: Adaptive Transfer for Diverse Data-Limited Scientific Domains : Abstract: In the big data era, the computer vision field benefits from large-scale datasets such as LAION-2B, LAION-400M, and ImageNet-21K, Kinetics, on which popular models like the ViT and ConvNeXt ...
- Visual Autoregressive Modelling for Monocular Depth Estimation : Abstract: We propose a monocular depth estimation method based on visual autoregressive (VAR) priors, offering an alternative to diffusion-based approaches. Our method adapts a large-scale text-to-ima...
- FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution : Abstract: Reinforcement Learning with Human Feedback (RLHF) has proven effective in image generation field guided by reward models to align human preferences. Motivated by this, adapting RLHF for Imag...
- Envision: Embodied Visual Planning via Goal-Imagery Video Diffusion : Abstract: Embodied visual planning aims to enable manipulation tasks by imagining how a scene evolves toward a desired goal and using the imagined trajectories to guide actions. Video diffusion models...
- Rethinking Memory Design in SAM-Based Visual Object Tracking : Abstract: \noindent Memory has become the central mechanism enabling robust visual object tracking in modern segmentation-based frameworks. Recent methods built upon Segment Anything Model 2 (SAM2) ha...
- Enhancing Noise Resilience in Face Clustering via Sparse Differential Transformer : Abstract: The method used to measure relationships between face embeddings plays a crucial role in determining the performance of face clustering. Existing methods employ the Jaccard similarity coeffi...
- PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment : Abstract: Speech-driven 3D talking head generation aims to produce lifelike facial animations precisely synchronized with speech. While considerable progress has been made in achieving high lip-synchr...
- KV-Tracker: Real-Time Pose Tracking with Transformers : Abstract: Multi-view 3D geometry networks offer a powerful prior but are prohibitively slow for real-time applications. We propose a novel way to adapt them for online use, enabling real-time 6-DoF po...
- ReFRM3D: A Radiomics-enhanced Fused Residual Multiparametric 3D Network with Multi-Scale Feature Fusion for Glioma Characterization : Abstract: Gliomas are among the most aggressive cancers, characterized by high mortality rates and complex diagnostic processes. Existing studies on glioma diagnosis and classification often describe ...
- Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains : Abstract: Multimodal LLMs often produce fluent yet unreliable reasoning, exhibiting weak step-to-step coherence and insufficient visual grounding, largely because existing alignment approaches supervi...
- CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation : Abstract: Maintaining narrative coherence and visual consistency remains a central challenge in open-domain video generation. Existing text-to-video models often treat each shot independently, resulti...
- DreamOmni3: Scribble-based Editing and Generation : Abstract: Recently unified generation and editing models have achieved remarkable success with their impressive performance. These models rely mainly on text prompts for instruction-based editing and ...
- SCAFusion: A Multimodal 3D Detection Framework for Small Object Detection in Lunar Surface Exploration : Abstract: Reliable and precise detection of small and irregular objects, such as meteor fragments and rocks, is critical for autonomous navigation and operation in lunar surface exploration. Existing ...
- Tracking by Predicting 3-D Gaussians Over Time : Abstract: We propose Video Gaussian Masked Autoencoders (Video-GMAE), a self-supervised approach for representation learning that encodes a sequence of images into a set of Gaussian splats moving over...
- Scalpel-SAM: A Semi-Supervised Paradigm for Adapting SAM to Infrared Small Object Detection : Abstract: Infrared small object detection urgently requires semi-supervised paradigms due to the high cost of annotation. However, existing methods like SAM face significant challenges of domain gaps,...
- Event-based high temporal resolution measurement of shock wave motion field : Abstract: Accurate measurement of shock wave motion parameters with high spatiotemporal resolution is essential for applications such as power field testing and damage assessment. However, significant...
- Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing : Abstract: Text-based 3D motion generation aims to automatically synthesize diverse motions from natural-language descriptions to extend user creativity, whereas motion editing modifies an existing mot...
- Comparing Object Detection Models for Electrical Substation Component Mapping : Abstract: Electrical substations are a significant component of an electrical grid. Indeed, the assets at these substations (e.g., transformers) are prone to disruption from many hazards, including hu...
- SAM 3D for 3D Object Reconstruction from Remote Sensing Images : Abstract: Monocular 3D building reconstruction from remote sensing imagery is essential for scalable urban modeling, yet existing methods often require task-specific architectures and intensive superv...
- SonoVision: A Computer Vision Approach for Helping Visually Challenged Individuals Locate Objects with the Help of Sound Cues : Abstract: Locating objects for the visually impaired is a significant challenge and is something no one can get used to over time. However, this hinders their independence and could push them towards ...
- Towards Robust Optical-SAR Object Detection under Missing Modalities: A Dynamic Quality-Aware Fusion Framework : Abstract: Optical and Synthetic Aperture Radar (SAR) fusion-based object detection has attracted significant research interest in remote sensing, as these modalities provide complementary information ...
- LECalib: Line-Based Event Camera Calibration : Abstract: Camera calibration is an essential prerequisite for event-based vision applications. Current event camera calibration methods typically involve using flashing patterns, reconstructing intens...
- SuperiorGAT: Graph Attention Networks for Sparse LiDAR Point Cloud Reconstruction in Autonomous Systems : Abstract: LiDAR-based perception in autonomous systems is constrained by fixed vertical beam resolution and further compromised by beam dropout resulting from environmental occlusions. This paper intr...
- EmoCtrl: Controllable Emotional Image Content Generation : Abstract: An image conveys meaning through both its visual content and emotional tone, jointly shaping human perception. We introduce Controllable Emotional Image Content Generation (C-EICG), which ai...
- FluenceFormer: Transformer-Driven Multi-Beam Fluence Map Regression for Radiotherapy Planning : Abstract: Fluence map prediction is central to automated radiotherapy planning but remains an ill-posed inverse problem due to the complex relationship between volumetric anatomy and beam-intensity mo...
- DeFloMat: Detection with Flow Matching for Stable and Efficient Generative Object Localization : Abstract: We propose DeFloMat (Detection with Flow Matching), a novel generative object detection framework that addresses the critical latency bottleneck of diffusion-based detectors, such as Diffusi...
- iOSPointMapper: RealTime Pedestrian and Accessibility Mapping with Mobile AI : Abstract: Accurate, up-to-date sidewalk data is essential for building accessible and inclusive pedestrian infrastructure, yet current approaches to data collection are often costly, fragmented, and d...
- VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement : Abstract: Despite the remarkable progress of Multimodal Large Language Models (MLLMs) in 2D vision-language tasks, their application to complex 3D scene manipulation remains underexplored. In this pap...
- Feature Learning with Multi-Stage Vision Transformers on Inter-Modality HER2 Status Scoring and Tumor Classification on Whole Slides : Abstract: The popular use of histopathology images, such as hematoxylin and eosin (H&E), has proven to be useful in detecting tumors. However, moving such cancer cases forward for treatment requires a...
- The Multi-View Paradigm Shift in MRI Radiomics: Predicting MGMT Methylation in Glioblastoma : Abstract: Non-invasive inference of molecular tumor characteristics from medical imaging is a central goal of radiogenomics, particularly in glioblastoma (GBM), where O6-methylguanine-DNA methyltransf...
- DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models : Abstract: Human motions are compositional: complex behaviors can be described as combinations of simpler primitives. However, existing approaches primarily focus on forward modeling, e.g., learning ho...
- SpotEdit: Selective Region Editing in Diffusion Transformers : Abstract: Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only...
- VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning : Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress in vision-language tasks yet remain limited in long video understanding due to the limited context window. Conseque...
- MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation : Abstract: Multi-subject video generation aims to synthesize videos from textual prompts and multiple reference images, ensuring that each subject preserves natural scale and visual fidelity. However, ...
- PortionNet: Distilling 3D Geometric Knowledge for Food Nutrition Estimation : Abstract: Accurate food nutrition estimation from single images is challenging due to the loss of 3D information. While depth-based methods provide reliable geometry, they remain inaccessible on most ...
- Attack-Aware Deepfake Detection under Counter-Forensic Manipulations : Abstract: This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment condition...
- Learning Dynamic Scene Reconstruction with Sinusoidal Geometric Priors : Abstract: We propose SirenPose, a novel loss function that combines the periodic activation properties of sinusoidal representation networks with geometric priors derived from keypoint structures to i...
- A Three-Level Alignment Framework for Large-Scale 3D Retrieval and Controlled 4D Generation : Abstract: We introduce Uni4D, a unified framework for large scale open vocabulary 3D retrieval and controlled 4D generation based on structured three level alignment across text, 3D models, and image ...
- FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound : Abstract: The growing demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers, creating barriers to essential fetal health monitoring. Deep learning has the po...
- The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency : Abstract: Background: The rapid integration of foundation models into clinical practice and public health necessitates a rigorous evaluation of their true clinical reasoning capabilities beyond narrow...
- GeCo: A Differentiable Geometric Consistency Metric for Video Generation : Abstract: We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, ...
- Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models : Abstract: Text-to-image diffusion models generate highly detailed textures, yet they often rely on surface appearance and fail to follow strict geometric constraints, particularly when those constrain...
- Multi-objective hybrid knowledge distillation for efficient deep learning in smart agriculture : Abstract: Deploying deep learning models on resource-constrained edge devices remains a major challenge in smart agriculture due to the trade-off between computational efficiency and recognition accur...
- Meta-information Guided Cross-domain Synergistic Diffusion Model for Low-dose PET Reconstruction : Abstract: Low-dose PET imaging is crucial for reducing patient radiation exposure but faces challenges like noise interference, reduced contrast, and difficulty in preserving physiological details. Ex...
- KAN-FPN-Stem:A KAN-Enhanced Feature Pyramid Stem for Boosting ViT-based Pose Estimation : Abstract: Vision Transformers (ViT) have demonstrated significant promise in dense prediction tasks such as pose estimation. However, their performance is frequently constrained by the overly simplist...
- On Extending Semantic Abstraction for Efficient Search of Hidden Objects : Abstract: Semantic Abstraction's key observation is that 2D VLMs' relevancy activations roughly correspond to their confidence of whether and where an object is in the scene. Thus, relevancy maps are ...
- Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark : Abstract: Understanding signboard text in natural scenes is essential for real-world applications of Visual Question Answering (VQA), yet remains underexplored, particularly in low-resource languages....
- VLM-PAR: A Vision Language Model for Pedestrian Attribute Recognition : Abstract: Pedestrian Attribute Recognition (PAR) involves predicting fine-grained attributes such as clothing color, gender, and accessories from pedestrian imagery, yet is hindered by severe class im...
- Signal-SGN++: Topology-Enhanced Time-Frequency Spiking Graph Network for Skeleton-Based Action Recognition : Abstract: Graph Convolutional Networks (GCNs) demonstrate strong capability in modeling skeletal topology for action recognition, yet their dense floating-point computations incur high energy costs. S...
- TCFormer: A 5M-Parameter Transformer with Density-Guided Aggregation for Weakly-Supervised Crowd Counting : Abstract: Crowd counting typically relies on labor-intensive point-level annotations and computationally intensive backbones, restricting its scalability and deployment in resource-constrained environ...
- Quadrant Segmentation VLM with Few-Shot Adaptation and OCT Learning-based Explainability Methods for Diabetic Retinopathy : Abstract: Diabetic Retinopathy (DR) is a leading cause of vision loss worldwide, requiring early detection to preserve sight. Limited access to physicians often leaves DR undiagnosed. To address this,...
- Tiny-YOLOSAM: Fast Hybrid Image Segmentation : Abstract: The Segment Anything Model (SAM) enables promptable, high-quality segmentation but is often too computationally expensive for latency-critical settings. TinySAM is a lightweight, distilled S...
- HookMIL: Revisiting Context Modeling in Multiple Instance Learning for Computational Pathology : Abstract: Multiple Instance Learning (MIL) has enabled weakly supervised analysis of whole-slide images (WSIs) in computational pathology. However, traditional MIL approaches often lose crucial contex...
- SAMM2D: Scale-Aware Multi-Modal 2D Dual-Encoder for High-Sensitivity Intracrania Aneurysm Screening : Abstract: Effective aneurysm detection is essential to avert life-threatening hemorrhages, but it remains challenging due to the subtle morphology of the aneurysm, pronounced class imbalance, and the ...
- Real-Time American Sign Language Recognition Using 3D Convolutional Neural Networks and LSTM: Architecture, Training, and Deployment : Abstract: This paper presents a real-time American Sign Language (ASL) recognition system utilizing a hybrid deep learning architecture combining 3D Convolutional Neural Networks (3D CNN) with Long Sh...
- Characterizing Motion Encoding in Video Diffusion Timesteps : Abstract: Text-to-video diffusion models synthesize temporal motion and spatial appearance through iterative denoising, yet how motion is encoded across timesteps remains poorly understood. Practition...
- RAVEL: Rare Concept Generation and Editing via Graph-driven Relational Guidance : Abstract: Despite impressive visual fidelity, current text-to-image (T2I) diffusion models struggle to depict rare, complex, or culturally nuanced concepts due to training data limitations. We introdu...
- Prompt Injection attack against LLM-integrated Applications : Abstract: Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their exte...
- Patience Is The Key to Large Language Model Reasoning : Abstract: Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. H...
- Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs : Abstract: Recent advancements in multimodal large language models (MLLMs) have achieved significant multimodal generation capabilities, akin to GPT-4. These models predominantly map visual information...
- Web World Models : Abstract: Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web frameworks provide reliable but f...
- The Big Three in Marriage Talk: LLM-Assisted Analysis of Moral Ethics and Sentiment on Weibo and Xiaohongshu : Abstract: China's marriage registrations have declined dramatically, dropping from 13.47 million couples in 2013 to 6.1 million in 2024. Understanding public attitudes toward marriage requires examini...
- CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations : Abstract: Large Language Model (LLM) agents, while proficient in the digital realm, face a significant gap in physical-world deployment due to the challenge of forming and maintaining a robust spatial...
- Multimodal Fact-Checking: An Agent-based Approach : Abstract: The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep mult...
- Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone : Abstract: While autoregressive Large Vision-Language Models (VLMs) have achieved remarkable success, their sequential generation often limits their efficacy in complex visual planning and dynamic robo...
- Monadic Context Engineering : Abstract: The proliferation of Large Language Models (LLMs) has catalyzed a shift towards autonomous agents capable of complex reasoning and tool use. However, current agent architectures are frequent...
- Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback : Abstract: Symbolic world models (e.g., PDDL domains or executable simulators) are central to model-based planning, but training LLMs to generate such world models is limited by the lack of large-scale...
- SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence : Abstract: We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpos...
- A CNN-Based Malaria Diagnosis from Blood Cell Images with SHAP and LIME Explainability : Abstract: Malaria remains a prevalent health concern in regions with tropical and subtropical climates. The cause of malaria is the Plasmodium parasite, which is transmitted through the bites of infec...
- Unbiased Visual Reasoning with Controlled Visual Inputs : Abstract: End-to-end Vision-language Models (VLMs) often answer visual questions by exploiting spurious correlations instead of causal visual evidence, and can become more shortcut-prone when fine-tun...
- Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans : Abstract: We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grain...
- PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech : Abstract: Automatic Speech Recognition (ASR) in professional settings faces challenges that existing benchmarks underplay: dense domain terminology, formal register variation, and near-zero tolerance ...
- Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing : Abstract: Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt inject...
- Less is more: Probabilistic reduction is best explained by small-scale predictability measures : Abstract: The primary research questions of this paper center on defining the amount of context that is necessary and/or appropriate when investigating the relationship between language model probabil...
- Nested Browser-Use Learning for Agentic Information Seeking : Abstract: Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval an...
- A Dataset and Benchmark for Consumer Healthcare Question Summarization : Abstract: The quest for seeking health information has swamped the web with consumers health-related questions. Generally, consumers use overly descriptive and peripheral information to express their ...
- Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing : Abstract: Enabling Large Language Models (LLMs) to reliably invoke external tools remains a critical bottleneck for autonomous agents. Existing approaches suffer from three fundamental challenges: exp...
- Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models : Abstract: In this paper, we show that when spoken language models (SLMs) are instructed to speak in a specific speaking style at the beginning of a multi-turn conversation, they cannot maintain the re...
- Instruction-Following Evaluation of Large Vision-Language Models : Abstract: Following the initial flourishing of large language models (LLMs), there has been a surge in proposed large vision-language models (LVLMs) that integrate LLMs with vision capabilities. Howev...
- Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs : Abstract: Hallucinations, the generation of apparently convincing yet false statements, remain a major barrier to the safe deployment of LLMs. Building on the strong performance of self-detection meth...
- Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias : Abstract: Large language models (LLMs) are highly vulnerable to input confirmation bias. When a prompt implies a preferred answer, models often reinforce that bias rather than explore alternatives. Th...
- UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale? : Abstract: Vision-language large models are moving toward the unification of visual understanding and visual generation tasks. However, whether generation can enhance understanding is still under-explo...
- Automatic Detection of Complex Quotation Patterns in Aggadic Literature : Abstract: This paper presents ACT (Allocate Connections between Texts), a novel three-stage algorithm for the automatic detection of biblical quotations in Rabbinic literature. Unlike existing text re...
- Semantic Tree Inference on Text Corpa using a Nested Density Approach together with Large Language Model Embeddings : Abstract: Semantic text classification has undergone significant advances in recent years due to the rise of large language models (LLMs) and their high dimensional embeddings. While LLM-embeddings ar...
- ClinDEF: A Dynamic Evaluation Framework for Large Language Models in Clinical Reasoning : Abstract: Clinical diagnosis begins with doctor-patient interaction, during which physicians iteratively gather information, determine examination and refine differential diagnosis through patients' r...
- C2PO: Diagnosing and Disentangling Bias Shortcuts in LLMs : Abstract: Bias in Large Language Models (LLMs) poses significant risks to trustworthiness, manifesting primarily as stereotypical biases (e.g., gender or racial stereotypes) and structural biases (e.g...
- The Effect of Gender Diversity on Scientific Team Impact: A Team Roles Perspective : Abstract: The influence of gender diversity on the success of scientific teams is of great interest to academia. However, prior findings remain inconsistent, and most studies operationalize diversity ...
- Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data : Abstract: As access to high-quality, domain-specific data grows increasingly scarce, multi-epoch training has become a practical strategy for adapting large language models (LLMs). However, autoregres...
- A Stepwise-Enhanced Reasoning Framework for Large Language Models Based on External Subgraph Generation : Abstract: Large Language Models (LLMs) have achieved strong performance across a wide range of natural language processing tasks in recent years, including machine translation, text generation, and qu...
- AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents : Abstract: Memory serves as the pivotal nexus bridging past and future, providing both humans and AI systems with invaluable concepts and experience to navigate complex tasks. Recent research on autono...
- AI4Reading: Chinese Audiobook Interpretation System Based on Multi-Agent Collaboration : Abstract: Audiobook interpretations are attracting increasing attention, as they provide accessible and in-depth analyses of books that offer readers practical insights and intellectual inspiration. H...
- Chinese Morph Resolution in E-commerce Live Streaming Scenarios : Abstract: E-commerce live streaming in China, particularly on platforms like Douyin, has become a major sales channel, but hosts often use morphs to evade scrutiny and engage in false advertising. Thi...
- Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process : Abstract: We propose LLM-PeerReview, an unsupervised LLM Ensemble method that selects the most ideal response from multiple LLM-generated candidates for each query, harnessing the collective wisdom of...
- Not too long do read: Evaluating LLM-generated extreme scientific summaries : Abstract: High-quality scientific extreme summary (TLDR) facilitates effective science communication. How do large language models (LLMs) perform in generating them? How are LLM-generated summaries di...
- Reservoir Computing inspired Matrix Multiplication-free Language Model : Abstract: Large language models (LLMs) have achieved state-of-the-art performance in natural language processing; however, their high computational cost remains a major bottleneck. In this study, we t...
- TabiBERT: A Large-Scale ModernBERT Foundation Model and Unified Benchmarking Framework for Turkish : Abstract: Since the inception of BERT, encoder-only Transformers have evolved significantly in computational efficiency, training stability, and long-context modeling. ModernBERT consolidates these ad...
- Accelerating Language Model Workflows with Prompt Choreography : Abstract: Large language models are increasingly deployed in multi-agent workflows. We introduce Prompt Choreography, a framework that efficiently executes LLM workflows by maintaining a dynamic, glob...
- LENS: LLM-Enabled Narrative Synthesis for Mental Health by Aligning Multimodal Sensing with Language Models : Abstract: Multimodal health sensing offers rich behavioral signals for assessing mental health, yet translating these numerical time-series measurements into natural language remains challenging. Curr...
- Improving Generalization in LLM Structured Pruning via Function-Aware Neuron Grouping : Abstract: Large Language Models (LLMs) demonstrate impressive performance across natural language tasks but incur substantial computational and storage costs due to their scale. Post-training structur...
- Prompt engineering does not universally improve Large Language Model performance across clinical decision-making tasks : Abstract: Large Language Models (LLMs) have demonstrated promise in medical knowledge assessments, yet their practical utility in real-world clinical decision-making remains underexplored. In this stu...
- Diversity or Precision? A Deep Dive into Next Token Prediction : Abstract: Recent advancements have shown that reinforcement learning (RL) can substantially improve the reasoning abilities of large language models (LLMs). The effectiveness of such RL training, howe...
- AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning : Abstract: Conducting reinforcement learning (RL) in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents. However, previous work has been limited to ...
- NepEMO: A Multi-Label Emotion and Sentiment Analysis on Nepali Reddit with Linguistic Insights and Temporal Trends : Abstract: Social media (SM) platforms (e.g. Facebook, Twitter, and Reddit) are increasingly leveraged to share opinions and emotions, specifically during challenging events, such as natural disasters,...
- Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language : Abstract: Misinformation on social media is a widely acknowledged issue, and researchers worldwide are actively engaged in its detection. However, low-resource languages such as Urdu have received lim...
- Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis : Abstract: Human-interaction-involved applications underscore the need for Multi-modal Sentiment Analysis (MSA). Although many approaches have been proposed to address the subtle emotions in different ...
- Harnessing Large Language Models for Biomedical Named Entity Recognition : Abstract: Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical tri...
- WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference : Abstract: Autoregressive (AR) generation is the standard decoding paradigm for Large Language Models (LLMs), but its token-by-token nature limits parallelism at inference time. Diffusion Language Mode...
- Mitigating Social Desirability Bias in Random Silicon Sampling : Abstract: Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''. However, responses to socially sensitive questions frequently exh...
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages : Abstract: Large language models demonstrate strong reasoning capabilities through chain-of-thought prompting, but whether this reasoning quality transfers across languages remains underexplored. We in...
- Conformal Prediction Sets for Next-Token Prediction in Large Language Models: Balancing Coverage Guarantees with Set Efficiency : Abstract: Deploying large language models (LLMs) in high-stakes domains requires rigorous uncertainty quantification, yet standard softmax probabilities are often poorly calibrated. We present a syste...
- Evaluating GRPO and DPO for Faithful Chain-of-Thought Reasoning in LLMs : Abstract: Chain-of-thought (CoT) reasoning has emerged as a powerful technique for improving the problem-solving capabilities of large language models (LLMs), particularly for tasks requiring multi-st...
- On the Role of Discreteness in Diffusion LLMs : Abstract: Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the ...
- M2G-Eval: Enhancing and Evaluating Multi-granularity Multilingual Code Generation : Abstract: The rapid advancement of code large language models (LLMs) has sparked significant research interest in systematically evaluating their code generation capabilities, yet existing benchmarks ...
- Chain-of-thought Reviewing and Correction for Time Series Question Answering : Abstract: With the advancement of large language models (LLMs), diverse time series analysis tasks are reformulated as time series question answering (TSQA) through a unified natural language interfac...
- Structured Prompting and LLM Ensembling for Multimodal Conversational Aspect-based Sentiment Analysis : Abstract: Understanding sentiment in multimodal conversations is a complex yet crucial challenge toward building emotionally intelligent AI systems. The Multimodal Conversational Aspect-based Sentimen...
- Learning When Not to Attend Globally : Abstract: When reading books, humans focus primarily on the current page, flipping back to recap prior context only when necessary. Similarly, we demonstrate that Large Language Models (LLMs) can lear...
- ManchuTTS: Towards High-Quality Manchu Speech Synthesis via Flow Matching and Hierarchical Text Representation : Abstract: As an endangered language, Manchu presents unique challenges for speech synthesis, including severe data scarcity and strong phonological agglutination. This paper proposes ManchuTTS(Manchu ...
- Constituency Structure over Eojeol in Korean Treebanks : Abstract: The design of Korean constituency treebanks raises a fundamental representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treatin...
- Exploring the Vertical-Domain Reasoning Capabilities of Large Language Models : Abstract: Large Language Models (LLMs) are reshaping learning paradigms, cognitive processes, and research methodologies across a wide range of domains. Integrating LLMs with professional fields and r...
- Hallucination Detection and Evaluation of Large Language Model : Abstract: Hallucinations in Large Language Models (LLMs) pose a significant challenge, generating misleading or unverifiable content that undermines trust and reliability. Existing evaluation methods,...
- LLM-Guided Exemplar Selection for Few-Shot Wearable-Sensor Human Activity Recognition : Abstract: In this paper, we propose an LLM-Guided Exemplar Selection framework to address a key limitation in state-of-the-art Human Activity Recognition (HAR) methods: their reliance on large labeled...
- Towards Efficient Post-Training via Fourier-Driven Adapter Architectures : Abstract: We propose a novel framework, termed Fourier-Activated Adapter (FAA), for parameter-efficient fine-tuning of large pre-trained language models. By incorporating random Fourier features into ...
- The Syntax of qulk-clauses in Yemeni Ibbi Arabic: A Minimalist Approach : Abstract: This study investigates the syntax of qulk-clauses in Yemeni Ibbi Arabic (YIA) within the Minimalist Program. The construction qulk-clause, a morphologically fused form meaning 'I said,' int...
- Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding : Abstract: Large vision-language models (LVLMs) excel at multimodal tasks but are prone to misinterpreting visual inputs, often resulting in hallucinations and unreliable outputs. We present DROPOUT DE...
- ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection : Abstract: Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly appl...
- Text-Driven Weakly Supervised OCT Lesion Segmentation with Structural Guidance : Abstract: Accurate segmentation of Optical Coherence Tomography (OCT) images is crucial for diagnosing and monitoring retinal diseases. However, the labor-intensive nature of pixel-level annotation li...
- How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots : Abstract: Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a litt...
- Generative Modeling by Minimizing the Wasserstein-2 Loss : Abstract: This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE)...
- Image and Video Quality Assessment using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization : Abstract: The design of image and video quality assessment (QA) algorithms is extremely important to benchmark and calibrate user experience in modern visual systems. A major drawback of the state-of-...
- Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection : Abstract: In industrial manufacturing, deploying deep learning models for visual inspection is mostly hindered by the high and often intractable cost of collecting and annotating large-scale training ...
- Predicting large scale cosmological structure evolution with generative adversarial network-based autoencoders : Abstract: Predicting the nonlinear evolution of cosmic structure from initial conditions is typically approached using Lagrangian, particle-based methods. These techniques excel in terms of tracking i...
- NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction : Abstract: Typical quantitative MRI (qMRI) methods estimate parameter maps in a two-step pipeline that first reconstructs images from undersampled k-space data and then performs model fitting, which is...
- Scalable and Privacy-Preserving Synthetic Data Generation on Decentralised Web : Abstract: Data on the Web has fueled much of the recent progress in AI. As more high-quality data becomes difficult to access, synthetic data is emerging as a promising solution for privacy-friendly d...
- A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot : Abstract: Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GAN...
- An extended method for Statistical Signal Characterization using moments and cumulants, as a fast and accurate pre-processing stage of simple ANNs applied to the recognition of pattern alterations in pulse-like waveforms : Abstract: We propose a feature-extraction procedure based on the statistical characterization of waveforms, applied as a fast pre-processing stage in a pattern recognition task using simple artificial...
- Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework : Abstract: Decentralized execution is one core demand in multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized ex...
- An Efficient Minimax Optimal Estimator For Multivariate Convex Regression : Abstract: This work studies the computational aspects of multivariate convex regression in dimensions $d \ge 5$. Our results include the \emph{first} estimators that are minimax optimal (up to logarit...
- PearSAN: A Machine Learning Method for Inverse Design using Pearson Correlated Surrogate Annealing : Abstract: PearSAN is a machine learning-assisted optimization algorithm applicable to inverse design problems with large design spaces, where traditional optimizers struggle. The algorithm leverages t...
- AdvPrefix: An Objective for Nuanced LLM Jailbreaks : Abstract: Many jailbreak attacks on large language models (LLMs) rely on a common objective: making the model respond with the prefix ``Sure, here is (harmful request)''. While straightforward, this o...
- A large language model-type architecture for high-dimensional molecular potential energy surfaces : Abstract: Computing high-dimensional potential energy surfaces for molecular systems and materials is considered to be a great challenge in computational chemistry with potential impact in a range of ...
- Epidemiology-informed Graph Neural Network for Heterogeneity-aware Epidemic Forecasting : Abstract: Among various spatio-temporal prediction tasks, epidemic forecasting plays a critical role in public health management. Recent studies have demonstrated the strong potential of spatio-tempor...
- Machine Unlearning using Forgetting Neural Networks : Abstract: Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model t...
- On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training : Abstract: Aiming to accelerate the training of large deep neural networks (DNN) in an energy-efficient way, analog in-memory computing (AIMC) emerges as a solution with immense potential. AIMC acceler...
- Constraint Decoupled Latent Diffusion for Protein Backmapping : Abstract: Coarse-grained (CG) molecular dynamics simulations enable efficient exploration of protein conformational ensembles. However, reconstructing atomic details from CG structures (backmapping) r...
- Trust-free Personalized Decentralized Learning : Abstract: Personalized collaborative learning in federated settings faces a critical trade-off between customization and participant trust. Existing approaches typically rely on centralized coordinato...
- Preconditioning for Accelerated Gradient Descent Optimization and Regularization : Abstract: Accelerated training algorithms, such as adaptive learning rates (or preconditioning) and various normalization methods, are widely used but not fully understood. When regularization is intr...
- GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method : Abstract: Deep neural networks (DNNs) have demonstrated remarkable performance across various domains, but their inherent complexity makes them challenging to interpret. This is especially true for te...
- Enhanced $H$-Consistency Bounds : Abstract: Recent research has introduced a key notion of $H$-consistency bounds for surrogate losses. These bounds offer finite-sample guarantees, quantifying the relationship between the zero-one est...
- Efficient Offline Reinforcement Learning: First Imitate, then Improve : Abstract: Supervised imitation-based approaches are often favored over off-policy reinforcement learning approaches for learning policies offline, since their straightforward optimization objective ma...
- Aligning Agents like Large Language Models : Abstract: Training agents to act competently in complex 3D environments from high-dimensional visual information is challenging. Reinforcement learning is conventionally used to train such agents, but...
- Application-Driven Innovation in Machine Learning : Abstract: In this position paper, we argue that application-driven research has been systemically under-valued in the machine learning community. As applications of machine learning proliferate, innov...
- DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks : Abstract: Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most ...
- CarSpeedNet: Learning-Based Speed Estimation from Accelerometer-Only Inertial Sensing : Abstract: Velocity estimation is a core component of state estimation and sensor fusion pipelines in mobile robotics and autonomous ground systems, directly affecting navigation accuracy, control stab...
- A Survey of Reinforcement Learning from Human Feedback : Abstract: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on...
- Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods : Abstract: In the past several years, the last-iterate convergence of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of...
- Sequential learning on a Tensor Network Born machine with Trainable Token Embedding : Abstract: Generative models aim to learn the probability distributions underlying data, enabling the generation of new, realistic samples. Quantum inspired generative models, such as Born machines bas...
- Development of Crop Yield Estimation Model using Soil and Environmental Parameters : Abstract: Crop yield is affected by various soil and environmental parameters and can vary significantly. Therefore, a crop yield estimation model which can predict pre-harvest yield is required for f...
- Eliciting Behaviors in Multi-Turn Conversations : Abstract: Identifying specific and often complex behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation. Recent work proposes novel techniques to find n...
- Bellman Calibration for V-Learning in Offline Reinforcement Learning : Abstract: We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman ca...
- Calibrated Multi-Level Quantile Forecasting : Abstract: We present an online method for guaranteeing calibration of quantile forecasts at multiple quantile levels simultaneously. A sequence of $α$-level quantile forecasts is calibrated if the for...
- Simultaneous Approximation of the Score Function and Its Derivatives by Deep Neural Networks : Abstract: We present a theory for simultaneous approximation of the score function and its derivatives, enabling the handling of data distributions with low-dimensional structure and unbounded support...
- AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms : Abstract: One-to-one tutoring is widely considered the gold standard for personalized education, yet it remains prohibitively expensive to scale. To evaluate whether generative AI might help expand ac...
- Memorization in 3D Shape Generation: An Empirical Study : Abstract: Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training shapes. Understanding their mem...
- Regret-Based Federated Causal Discovery with Unknown Interventions : Abstract: Most causal discovery methods recover a completed partially directed acyclic graph representing a Markov equivalence class from observational data. Recent work has extended these methods to ...
- The Nonstationarity-Complexity Tradeoff in Return Prediction : Abstract: We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspeci...
- From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints : Abstract: How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequen...
- Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning : Abstract: Signal decay and regime shifts pose recurring challenges for data-driven investment strategies in non-stationary markets. Conventional time-series and machine learning approaches, which rely...
- Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following : Abstract: Reinforcement Learning (RL) has shown promise for aligning Large Language Models (LLMs) to follow instructions with various constraints. Despite the encouraging results, RL improvement inevi...
- Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss : Abstract: Mixture-of-Experts (MoE) models lack explicit constraints to ensure the router's decisions align well with the experts' capabilities, which ultimately limits model performance. To address th...
- Assessing behaviour coverage in a multi-agent system simulation for autonomous vehicle testing : Abstract: As autonomous vehicle technology advances, ensuring the safety and reliability of these systems becomes paramount. Consequently, comprehensive testing methodologies are essential to evaluate...
- Adaptive Fusion Graph Network for 3D Strain Field Prediction in Solid Rocket Motor Grains : Abstract: Local high strain in solid rocket motor grains is a primary cause of structural failure. However, traditional numerical simulations are computationally expensive, and existing surrogate mode...
- Towards Integrating Uncertainty for Domain-Agnostic Segmentation : Abstract: Foundation models for segmentation such as the Segment Anything Model (SAM) family exhibit strong zero-shot performance, but remain vulnerable in shifted or limited-knowledge domains. This w...
- A general framework for deep learning : Abstract: This paper develops a general approach for deep learning for a setting that includes nonparametric regression and classification. We perform a framework from data that fulfills a generalized...
- AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis : Abstract: Modern AI models demand high-performance computation kernels. The growing complexity of LLMs, multimodal architectures, and recommendation systems, combined with techniques like sparsity and...
- Probabilistic Modelling is Sufficient for Causal Inference : Abstract: Causal inference is a key research area in machine learning, yet confusion reigns over the tools needed to tackle it. There are prevalent claims in the machine learning literature that you n...
- Beyond-Diagonal Reconfigurable Intelligent Surfaces for 6G Networks: Principles, Challenges, and Quantum Horizons : Abstract: A beyond-diagonal reconfigurable intelligent surface (BD-RIS) is an innovative type of reconfigurable intelligent surface (RIS) that has recently been proposed and is considered a revolution...
- Persistent Homology via Finite Topological Spaces : Abstract: We propose a functorial framework for persistent homology based on finite topological spaces and their associated posets. Starting from a finite metric space, we associate a filtration of fi...
- Visual Language Hypothesis : Abstract: We study visual representation learning from a structural and topological perspective. We begin from a single hypothesis: that visual understanding presupposes a semantic language for vision...
- Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control : Abstract: The prevailing paradigm in AI for physical systems, scaling general-purpose foundation models toward universal multimodal reasoning, confronts a fundamental barrier at the control interface....
- Revealing design archetypes and flexibility in e-molecule import pathways using Modeling to Generate Alternatives and interpretable machine learning : Abstract: Given the central role of green e-molecule imports in the European energy transition, many studies optimize import pathways and identify a single cost-optimal solution. However, cost optimal...
- Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation : Abstract: Parameter-efficient fine-tuning has become the dominant paradigm for adapting large language models to downstream tasks. Low-rank adaptation methods such as LoRA operate under the assumption...
- Anka: A Domain-Specific Language for Reliable LLM Code Generation : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, yet they exhibit systematic errors on complex, multi-step programming tasks. We hypothesize that th...
- Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis : Abstract: Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a fi...
- Certifying the Right to Be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning : Abstract: Federated unlearning has become an attractive approach to address privacy concerns in collaborative machine learning, for situations when sensitive data is remembered by AI models during the...
- SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search : Abstract: Large Language Models (LLMs) often falter at complex planning tasks that require exploration and self-correction, as their linear reasoning process struggles to recover from early mistakes. ...
- An Inference-Based Architecture for Intent and Affordance Saturation in Decision-Making : Abstract: Decision paralysis, i.e. hesitation, freezing, or failure to act despite full knowledge and motivation, poses a challenge for choice models that assume options are already specified and read...
- Why Machine Learning Models Systematically Underestimate Extreme Values II: How to Fix It with LatentNN : Abstract: Attenuation bias -- the systematic underestimation of regression coefficients due to measurement errors in input variables -- affects astronomical data-driven models. For linear regression, ...
- Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems : Abstract: Machine learning (ML) underpins foundation models in finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated j...
- InSPO: Unlocking Intrinsic Self-Reflection for LLM Preference Optimization : Abstract: Direct Preference Optimization (DPO) and its variants have become standard for aligning Large Language Models due to their simplicity and offline stability. However, we identify two fundamen...
- Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients : Abstract: Recent Reinforcement Learning (RL) advances for Large Language Models (LLMs) have improved reasoning tasks, yet their resource-constrained application to medical imaging remains underexplore...
- QSAR-Guided Generative Framework for the Discovery of Synthetically Viable Odorants : Abstract: The discovery of novel odorant molecules is key for the fragrance and flavor industries, yet efficiently navigating the vast chemical space to identify structures with desirable olfactory pr...
- Deep Learning for Art Market Valuation : Abstract: We study how deep learning can improve valuation in the art market by incorporating the visual content of artworks into predictive models. Using a large repeated-sales dataset from major auc...
- Federated Learning With L0 Constraint Via Probabilistic Gates For Sparsity : Abstract: Federated Learning (FL) is a distributed machine learning setting that requires multiple clients to collaborate on training a model while maintaining data privacy. The unaddressed inherent s...
- The Reward Model Selection Crisis in Personalized Alignment : Abstract: Personalized alignment from preference data has focused primarily on improving reward model (RM) accuracy, with the implicit assumption that better preference ranking translates to better pe...
- Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization : Abstract: Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric confuses unfaithfulness with ...
- JADAI: Jointly Amortizing Adaptive Design and Bayesian Inference : Abstract: We consider problems of parameter estimation where design variables can be actively optimized to maximize information gain. To this end, we introduce JADAI, a framework that jointly amortize...
- Risk-Averse Learning with Varying Risk Levels : Abstract: In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynami...
- Deep Learning for the Multiple Optimal Stopping Problem : Abstract: This paper presents a novel deep learning framework for solving multiple optimal stopping problems in high dimensions. While deep learning has recently shown promise for single stopping prob...
- Geometric Structural Knowledge Graph Foundation Model : Abstract: Structural knowledge graph foundation models aim to generalize reasoning to completely new graphs with unseen entities and relations. A key limitation of existing approaches like Ultra is th...
- A first-order method for nonconvex-strongly-concave constrained minimax optimization : Abstract: In this paper we study a nonconvex-strongly-concave constrained minimax problem. Specifically, we propose a first-order augmented Lagrangian method for solving it, whose subproblems are nonc...
- A Neural Network-Based Real-time Casing Collar Recognition System for Downhole Instruments : Abstract: Accurate downhole positioning is critical in oil and gas operations but is often compromised by signal degradation in traditional surface-based Casing Collar Locator (CCL) monitoring. To add...
- Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks : Abstract: Modern AI systems often comprise multiple learnable components that can be naturally organized as graphs. A central challenge is the end-to-end training of such systems without restrictive a...
- Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks : Abstract: Securing blockchain-enabled IoT networks against sophisticated adversarial attacks remains a critical challenge. This paper presents a trust-based delegated consensus framework integrating F...
- ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning : Abstract: Human-object interaction (HOI) video generation has garnered increasing attention due to its promising applications in digital humans, e-commerce, advertising, and robotics imitation learnin...
- Causal-Policy Forest for End-to-End Policy Learning : Abstract: This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome cor...
- CNSight: Evaluation of Clinical Note Segmentation Tools : Abstract: Clinical notes are often stored in unstructured or semi-structured formats after extraction from electronic medical record (EMR) systems, which complicates their use for secondary analysis a...
- Nonlinear Dynamical Modeling of Human Intracranial Brain Activity with Flexible Inference : Abstract: Dynamical modeling of multisite human intracranial neural recordings is essential for developing neurotechnologies such as brain-computer interfaces (BCIs). Linear dynamical models are widel...
- Active Constraint Learning in High Dimensions from Demonstrations : Abstract: We present an iterative active constraint learning (ACL) algorithm, within the learning from demonstrations (LfD) paradigm, which intelligently solicits informative demonstration trajectorie...
- Data Augmentation for Classification of Negative Pregnancy Outcomes in Imbalanced Data : Abstract: Infant mortality remains a significant public health concern in the United States, with birth defects identified as a leading cause. Despite ongoing efforts to understand the causes of negat...
- Memento-II: Learning by Stateful Reflective Memory : Abstract: We propose a theoretical framework for continual and experiential learning in large language model agents that integrates episodic memory with reinforcement learning. The framework identifie...
- GHaLIB: A Multilingual Framework for Hope Speech Detection in Low-Resource Languages : Abstract: Hope speech has been relatively underrepresented in Natural Language Processing (NLP). Current studies are largely focused on English, which has resulted in a lack of resources for low-resou...
- Multimodal Diffeomorphic Registration with Neural ODEs and Structural Descriptors : Abstract: This work proposes a multimodal diffeomorphic registration method using Neural Ordinary Differential Equations (Neural ODEs). Nonrigid registration algorithms exhibit tradeoffs between their...
- Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2 : Abstract: Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model ...
- INTERACT-CMIL: Multi-Task Shared Learning and Inter-Task Consistency for Conjunctival Melanocytic Intraepithelial Lesion Grading : Abstract: Accurate grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL) is essential for treatment and melanoma prediction but remains difficult due to subtle morphological cues and inte...
- Machine learning models for predicting catastrophe bond coupons using climate data : Abstract: In recent years, the growing frequency and severity of natural disasters have increased the need for effective tools to manage catastrophe risk. Catastrophe (CAT) bonds allow the transfer of...
- Investigating Deep Learning Models for Ejection Fraction Estimation from Echocardiography Videos : Abstract: Left ventricular ejection fraction (LVEF) is a key indicator of cardiac function and plays a central role in the diagnosis and management of cardiovascular disease. Echocardiography, as a re...
- Clinically Calibrated Machine Learning Benchmarks for Large-Scale Multi-Disorder EEG Classification : Abstract: Clinical electroencephalography is routinely used to evaluate patients with diverse and often overlapping neurological conditions, yet interpretation remains manual, time-intensive, and vari...
- Tree Meets Transformer: A Hybrid Architecture for Scalable Power Allocation in Cell-Free Networks : Abstract: Power allocation remains a fundamental challenge in wireless communication networks, particularly under dynamic user loads and large-scale deployments. While Transformerbased models have dem...
- Likelihood-Preserving Embeddings for Statistical Inference : Abstract: Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistic...
- Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers : Abstract: Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer po...
- RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure : Abstract: Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads...
- Computing Pure-Strategy Nash Equilibria in a Two-Party Policy Competition: Existence and Algorithmic Approaches : Abstract: We formulate two-party policy competition as a two-player non-cooperative game, generalizing Lin et al.'s work (2021). Each party selects a real-valued policy vector as its strategy from a c...
- Role-Based Fault Tolerance System for LLM RL Post-Training : Abstract: RL post-training for LLMs has been widely scaled to enhance reasoning and tool-using capabilities. However, RL post-training interleaves training and inference workloads, exposing the system...
- SPECTRE: Spectral Pre-training Embeddings with Cylindrical Temporal Rotary Position Encoding for Fine-Grained sEMG-Based Movement Decoding : Abstract: Decoding fine-grained movement from non-invasive surface Electromyography (sEMG) is a challenge for prosthetic control due to signal non-stationarity and low signal-to-noise ratios. Generic ...
- Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds : Abstract: Transformers empirically perform precise probabilistic reasoning in carefully constructed ``Bayesian wind tunnels'' and in large-scale language models, yet the mechanisms by which gradient-b...
- HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG : Abstract: Retrieval-Augmented Generation (RAG) in open-domain settings faces significant challenges regarding irrelevant information in retrieved documents and the alignment of generated answers with ...
- AnalogSAGE: Self-evolving Analog Design Multi-Agents with Stratified Memory and Grounded Experience : Abstract: Analog circuit design remains a knowledge- and experience-intensive process that relies heavily on human intuition for topology generation and device parameter tuning. Existing LLM-based app...
- Uncertainty-Aware Flow Field Reconstruction Using SVGP Kolmogorov-Arnold Networks : Abstract: Reconstructing time-resolved flow fields from temporally sparse velocimetry measurements is critical for characterizing many complex thermal-fluid systems. We introduce a machine learning fr...
- Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy : Abstract: Label-free 3D brightfield microscopy offers a fast and noninvasive way to visualize cellular morphology, yet robust volumetric segmentation still typically depends on fluorescence or heavy p...
- Differentiable Inverse Modeling with Physics-Constrained Latent Diffusion for Heterogeneous Subsurface Parameter Fields : Abstract: We present a latent diffusion-based differentiable inversion method (LD-DIM) for PDE-constrained inverse problems involving high-dimensional spatially distributed coefficients. LD-DIM couple...
- Integrating Wide and Deep Neural Networks with Squeeze-and-Excitation Blocks for Multi-Target Property Prediction in Additively Manufactured Fiber Reinforced Composites : Abstract: Continuous fiber-reinforced composite manufactured by additive manufacturing (CFRC-AM) offers opportunities for printing lightweight materials with high specific strength. However, their per...
- PHANTOM: Physics-Aware Adversarial Attacks against Federated Learning-Coordinated EV Charging Management System : Abstract: The rapid deployment of electric vehicle charging stations (EVCS) within distribution networks necessitates intelligent and adaptive control to maintain the grid's resilience and reliability...
- Self-Evaluation Unlocks Any-Step Text-to-Image Generation : Abstract: We introduce the Self-Evaluating Model (Self-E), a novel, from-scratch training approach for text-to-image generation that supports any-step inference. Self-E learns from data similarly to a...
- Human-like visual computing advances explainability and few-shot learning in deep neural networks for complex physiological data : Abstract: Machine vision models, particularly deep neural networks, are increasingly applied to physiological signal interpretation, including electrocardiography (ECG), yet they typically require lar...
- Emotion classification using EEG headset signals and Random Forest : Abstract: Emotions are one of the important components of the human being, thus they are a valuable part of daily activities such as interaction with people, decision making and learning. For this rea...
- SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents : Abstract: Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of...
- LLA: Enhancing Security and Privacy for Generative Models with Logic-Locked Accelerators : Abstract: We introduce LLA, an effective intellectual property (IP) protection scheme for generative AI models. LLA leverages the synergy between hardware and software to defend against various supply...
- Real-Time In-Cabin Driver Behavior Recognition on Low-Cost Edge Hardware : Abstract: In-cabin Driver Monitoring Systems (DMS) must recognize distraction- and drowsiness-related behaviors with low latency under strict constraints on compute, power, and cost. We present a sing...
- A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure : Abstract: Ensemble learning is traditionally justified as a variance-reduction strategy, explaining its strong performance for unstable predictors such as decision trees. This explanation, however, do...
- On Fibonacci Ensembles: An Alternative Approach to Ensemble Learning Inspired by the Timeless Architecture of the Golden Ratio : Abstract: Nature rarely reveals her secrets bluntly, yet in the Fibonacci sequence she grants us a glimpse of her quiet architecture of growth, harmony, and recursive stability \citep{Koshy2001Fibonac...
- A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue : Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matric...
- Evaluating an Adaptive Multispectral Turret System for Autonomous Tracking Across Variable Illumination Conditions : Abstract: Autonomous robotic platforms are playing a growing role across the emergency services sector, supporting missions such as search and rescue operations in disaster zones and reconnaissance. H...
- INSIGHT: Spatially resolved survival modelling from routine histology crosslinked with molecular profiling reveals prognostic epithelial-immune axes in stage II/III colorectal cancer : Abstract: Routine histology contains rich prognostic information in stage II/III colorectal cancer, much of which is embedded in complex spatial tissue organisation. We present INSIGHT, a graph neural...
- Logic Sketch Prompting (LSP): A Deterministic and Interpretable Prompting Method : Abstract: Large language models (LLMs) excel at natural language reasoning but remain unreliable on tasks requiring strict rule adherence, determinism, and auditability. Logic Sketch Prompting (LSP) i...
- Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks : Abstract: We present the surprising finding that a language model's reasoning capabilities can be improved by training on synthetic datasets of chain-of-thought (CoT) traces from more capable models, ...
- Analyzing Skill Element in Online Fantasy Cricket : Abstract: Online fantasy cricket has emerged as large-scale competitive systems in which participants construct virtual teams and compete based on real-world player performances. This massive growth h...
- We are not able to identify AI-generated images : Abstract: AI-generated images are now pervasive online, yet many people believe they can easily tell them apart from real photographs. We test this assumption through an interactive web experiment whe...
- Hierarchical Geometry of Cognitive States in Transformer Embedding Spaces : Abstract: Recent work has shown that transformer-based language models learn rich geometric structure in their embedding spaces, yet the presence of higher-level cognitive organization within these re...
- VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs : Abstract: Understanding long videos with multimodal large language models (MLLMs) remains challenging due to the heavy redundancy across frames and the need for temporally coherent representations. Ex...
- Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs : Abstract: We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel. MPK intr...
- Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA : Abstract: Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary L...
- CosineGate: Semantic Dynamic Routing via Cosine Incompatibility in Residual Networks : Abstract: Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an en...
- AETAS: Analysis of Evolving Temporal Affect and Semantics for Legal History : Abstract: Digital-humanities work on semantic shift often alternates between handcrafted close readings and opaque embedding machinery. We present a reproducible expert-system style pipeline that quan...
- MatKV: Trading Compute for Flash Storage in LLM Inference : Abstract: We observe two major trends in LLM-based generative AI: (1) inference is becoming the dominant factor in terms of cost and power consumption, surpassing training, and (2) retrieval augmented...
- Enhancing Medical Data Analysis through AI-Enhanced Locally Linear Embedding: Applications in Medical Point Location and Imagery : Abstract: The rapid evolution of Artificial intelligence in healthcare has opened avenues for enhancing various processes, including medical billing and transcription. This paper introduces an innovat...
- PaperNet: Efficient Temporal Convolutions and Channel Residual Attention for EEG Epilepsy Detection : Abstract: Electroencephalography (EEG) signals contain rich temporal-spectral structure but are difficult to model due to noise, subject variability, and multi-scale dynamics. Lightweight deep learnin...
- Sampling with Shielded Langevin Monte Carlo Using Navigation Potentials : Abstract: We introduce shielded Langevin Monte Carlo (LMC), a constrained sampler inspired by navigation functions, capable of sampling from unnormalized target distributions defined over punctured su...
- Neural ocean forecasting from sparse satellite-derived observations: a case-study for SSH dynamics and altimetry data : Abstract: We present an end-to-end deep learning framework for short-term forecasting of global sea surface dynamics based on sparse satellite altimetry data. Building on two state-of-the-art architec...
- Machine Learning-Based Basil Yield Prediction in IoT-Enabled Indoor Vertical Hydroponic Farms : Abstract: As agriculture faces increasing pressure from water scarcity, especially in regions like Tunisia, innovative, resource-efficient solutions are urgently needed. This work explores the integra...
- GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs : Abstract: In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be comp...
- EEG-to-Voice Decoding of Spoken and Imagined speech Using Non-Invasive EEG : Abstract: Restoring speech communication from neural signals is a central goal of brain-computer interface research, yet EEG-based speech reconstruction remains challenging due to limited spatial reso...
- The Complete Anatomy of the Madden-Julian Oscillation Revealed by Artificial Intelligence : Abstract: Accurately defining the life cycle of the Madden-Julian Oscillation (MJO), the dominant mode of intraseasonal climate variability, remains a foundational challenge due to its propagating nat...
- UniFi: Combining Irregularly Sampled CSI from Diverse Communication Packets and Frequency Bands for Wi-Fi Sensing : Abstract: Existing Wi-Fi sensing systems rely on injecting high-rate probing packets to extract channel state information (CSI), leading to communication degradation and poor deployability. Although I...
- On Harnessing Idle Compute at the Edge for Foundation Model Training : Abstract: The ecosystem behind foundation model development today is highly centralized and limited to large-scale cloud data center operators: training foundation models is costly, needing immense co...
- ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling : Abstract: Ad-hoc teamwork (AHT) requires agents to infer the behavior of previously unseen teammates and adapt their policy accordingly. Conventional approaches often rely on fixed probabilistic model...
- Training AI Co-Scientists Using Rubric Rewards : Abstract: AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan ...
- End-to-End Test-Time Training for Long Context : Abstract: We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer w...
- Random Controlled Differential Equations : Abstract: We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs). In this approach, large randomly parameteriz...
- BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization : Abstract: Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and ...
- Le Cam Distortion: A Decision-Theoretic Framework for Robust Transfer Learning : Abstract: Distribution shift is the defining challenge of real-world machine learning. The dominant paradigm--Unsupervised Domain Adaptation (UDA)--enforces feature invariance, aligning source and tar...
- Distribution-Free Process Monitoring with Conformal Prediction : Abstract: Traditional Statistical Process Control (SPC) is essential for quality management but is limited by its reliance on often violated statistical assumptions, leading to unreliable monitoring i...
- VL-RouterBench: A Benchmark for Vision-Language Model Routing : Abstract: Multi-model routing has evolved from an engineering technique into essential infrastructure, yet existing work lacks a systematic, reproducible benchmark for evaluating vision-language model...
- EEG-based Graph-guided Domain Adaptation for Robust Cross-Session Emotion Recognition : Abstract: Accurate recognition of human emotional states is critical for effective human-machine interaction. Electroencephalography (EEG) offers a reliable source for emotion recognition due to its h...
- Trustworthy Machine Learning under Distribution Shifts : Abstract: Machine Learning (ML) has been a foundational topic in artificial intelligence (AI), providing both theoretical groundwork and practical tools for its exciting advancements. From ResNet for ...
- Joint Link Adaptation and Device Scheduling Approach for URLLC Industrial IoT Network: A DRL-based Method with Bayesian Optimization : Abstract: In this article, we consider an industrial internet of things (IIoT) network supporting multi-device dynamic ultra-reliable low-latency communication (URLLC) while the channel state informat...
- ML Compass: Navigating Capability, Cost, and Compliance Trade-offs in AI Model Deployment : Abstract: We study how organizations should select among competing AI models when user utility, deployment costs, and compliance requirements jointly matter. Widely used capability leaderboards do not...
- FRoD: Full-Rank Efficient Fine-Tuning with Rotational Degrees for Fast Convergence : Abstract: Parameter-efficient fine-tuning (PEFT) methods have emerged as a practical solution for adapting large foundation models to downstream tasks, reducing computational and memory costs by updat...
- Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance : Abstract: Reward models (RMs) are essential in reinforcement learning from human feedback (RLHF) to align large language models (LLMs) with human values. However, RM training data is commonly recogniz...
- Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion : Abstract: Mixture of Experts (MoE) models scale capacity but often suffer from representation collapse and gradient instability. We propose Dynamic Subspace Composition (DSC), a framework that approxi...
- Stochastic Siamese MAE Pretraining for Longitudinal Medical Images : Abstract: Temporally aware image representations are crucial for capturing disease progression in 3D volumes of longitudinal medical datasets. However, recent state-of-the-art self-supervised learning...
- Directly Constructing Low-Dimensional Solution Subspaces in Deep Neural Networks : Abstract: While it is well-established that the weight matrices and feature manifolds of deep neural networks exhibit a low Intrinsic Dimension (ID), current state-of-the-art models still rely on mass...
- Theoretical Foundations of Scaling Law in Familial Models : Abstract: Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks ...
- Task-driven Heterophilic Graph Structure Learning : Abstract: Graph neural networks (GNNs) often struggle to learn discriminative node representations for heterophilic graphs, where connected nodes tend to have dissimilar labels and feature similarity ...
- On the Sample Complexity of Learning for Blind Inverse Problems : Abstract: Blind inverse problems arise in many experimental settings where the forward operator is partially or entirely unknown. In this context, methods developed for the non-blind case cannot be ad...
- A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers : Abstract: Log anomaly detection is crucial for preserving the security of operating systems. Depending on the source of log data collection, various information is recorded in logs that can be conside...
- Diffusion priors enhanced velocity model building from time-lag images using a neural operator : Abstract: Velocity model building serves as a crucial component for achieving high precision subsurface imaging. However, conventional velocity model building methods are often computationally expensi...
- Post-Training Quantization of OpenPangu Models for Efficient Deployment on Atlas A2 : Abstract: Huawei's openPangu-Embedded-1B and openPangu-Embedded-7B, variants of the openPangu large language model, integrate three distinct Chain-of-Thought (CoT) reasoning paradigms, namely slow_thi...
- ISOPO: Proximal policy gradients without pi-old : Abstract: This note introduces Isometric Policy Optimization (ISOPO), an efficient method to approximate the natural policy gradient in a single gradient step. In comparison, existing proximal policy ...
- ECG-RAMBA: Zero-Shot ECG Generalization by Morphology-Rhythm Disentanglement and Long-Range Modeling : Abstract: Deep learning has achieved strong performance for electrocardiogram (ECG) classification within individual datasets, yet dependable generalization across heterogeneous acquisition settings r...
- The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models : Abstract: Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume inc...
- Deep learning for pedestrians: backpropagation in Transformers : Abstract: This document is a follow-up to our previous paper dedicated to a vectorized derivation of backpropagation in CNNs. Following the same principles and notations already put in place there, we...
- Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL : Abstract: Deploying large language models (LLMs) on edge devices is challenging due to their limited memory and power resources. Cloud-only inference reduces device burden but introduces high latency ...
- Spectral Analysis of Hard-Constraint PINNs: The Spatial Modulation Mechanism of Boundary Functions : Abstract: Physics-Informed Neural Networks with hard constraints (HC-PINNs) are increasingly favored for their ability to strictly enforce boundary conditions via a trial function ansatz $\tilde{u} = ...
- On the Inverse Flow Matching Problem in the One-Dimensional and Gaussian Cases : Abstract: This paper studies the inverse problem of flow matching (FM) between distributions with finite exponential moment, a problem motivated by modern generative AI applications such as the distil...
- PFed-Signal: An ADR Prediction Model based on Federated Learning : Abstract: The adverse drug reactions (ADRs) predicted based on the biased records in FAERS (U.S. Food and Drug Administration Adverse Event Reporting System) may mislead diagnosis online. Generally, s...
- KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta : Abstract: Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, ke...
- FairGFL: Privacy-Preserving Fairness-Aware Federated Learning with Overlapping Subgraphs : Abstract: Graph federated learning enables the collaborative extraction of high-order information from distributed subgraphs while preserving the privacy of raw data. However, graph data often exhibit...
- Energy and Memory-Efficient Federated Learning With Ordered Layer Freezing : Abstract: Federated Learning (FL) has emerged as a privacy-preserving paradigm for training machine learning models across distributed edge devices in the Internet of Things (IoT). By keeping data loc...
- PGOT: A Physics-Geometry Operator Transformer for Complex PDEs : Abstract: While Transformers have demonstrated remarkable potential in modeling Partial Differential Equations (PDEs), modeling large-scale unstructured meshes with complex geometries remains a signif...
- A Simple, Optimal and Efficient Algorithm for Online Exp-Concave Optimization : Abstract: Online eXp-concave Optimization (OXO) is a fundamental problem in online learning. The standard algorithm, Online Newton Step (ONS), balances statistical optimality and computational practic...
- Machine Learning-Assisted Vocal Cord Ultrasound Examination: Project VIPR : Abstract: Intro: Vocal cord ultrasound (VCUS) has emerged as a less invasive and better tolerated examination technique, but its accuracy is operator dependent. This research aims to apply a machine l...
- HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction : Abstract: Therapeutic peptides have emerged as a pivotal modality in modern drug discovery, occupying a chemically and topologically rich space. While accurate prediction of their physicochemical prop...
- Evaluating Parameter Efficient Methods for RLVR : Abstract: We systematically evaluate Parameter-Efficient Fine-Tuning (PEFT) methods under the paradigm of Reinforcement Learning with Verifiable Rewards (RLVR). RLVR incentivizes language models to en...
- Diffusion-based Decentralized Federated Multi-Task Representation Learning : Abstract: Representation learning is a widely adopted framework for learning in data-scarce environments to obtain a feature extractor or representation from various different yet related tasks. Despi...
- A Weak Signal Learning Dataset and Its Baseline Method : Abstract: Weak signal learning (WSL) is a common challenge in many fields like fault diagnosis, medical imaging, and autonomous driving, where critical information is often masked by noise and interfe...
- Graph Neural Networks with Transformer Fusion of Brain Connectivity Dynamics and Tabular Data for Forecasting Future Tobacco Use : Abstract: Integrating non-Euclidean brain imaging data with Euclidean tabular data, such as clinical and demographic information, poses a substantial challenge for medical imaging analysis, particular...
- Principled Algorithms for Optimizing Generalized Metrics in Binary Classification : Abstract: In applications with significant class imbalance or asymmetric costs, metrics such as the $F_β$-measure, AM measure, Jaccard similarity coefficient, and weighted accuracy offer more suitable...
- SE-MLP Model for Predicting Prior Acceleration Features in Penetration Signals : Abstract: Accurate identification of the penetration process relies heavily on prior feature values of penetration acceleration. However, these feature values are typically obtained through long simul...
- How Much Data Is Enough? Uniform Convergence Bounds for Generative & Vision-Language Models under Low-Dimensional Structure : Abstract: Modern generative and vision-language models (VLMs) are increasingly used in scientific and medical decision support, where predicted probabilities must be both accurate and well calibrated....
- A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms : Abstract: We present a unified framework for Large Language Model (LLM) fine-tuning that integrates Imitation Learning and Reinforcement Learning. By analyzing the gradient of a composite objective co...
- Osmotic Learning: A Self-Supervised Paradigm for Decentralized Contextual Data Representation : Abstract: Data within a specific context gains deeper significance beyond its isolated interpretation. In distributed systems, interdependent data sources reveal hidden relationships and latent struct...
- Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning : Abstract: Reinforcement learning for large language models (LLMs) faces a fundamental tension: high-throughput inference engines and numerically-precise training systems produce different probability ...
- Multimodal Functional Maximum Correlation for Emotion Recognition : Abstract: Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning...
- Trust Region Masking for Long-Horizon LLM Reinforcement Learning : Abstract: Policy gradient methods for large language models optimize a surrogate objective computed from samples of a rollout policy $π_{\text{roll}}$. When $π_{\text{roll}} \ne π_θ$, there is approxi...
- Rethinking Fine-Tuning: Unlocking Hidden Capabilities in Vision-Language Models : Abstract: Explorations in fine-tuning Vision-Language Models (VLMs), such as Low-Rank Adaptation (LoRA) from Parameter Efficient Fine-Tuning (PEFT), have made impressive progress. However, most approa...
- FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment : Abstract: Mixture-of-Experts (MoE) models enable scalable neural networks through conditional computation. However, their deployment with federated learning (FL) faces two critical challenges: 1) reso...
- Breaking the Memory Wall: Exact Analytical Differentiation via Tiled Operator-Space Evolution : Abstract: Selective State Space Models (SSMs) achieve linear-time inference, yet their gradient-based sensitivity analysis remains bottlenecked by O(L) memory scaling during backpropagation. This memo...
- PI-MFM: Physics-informed multimodal foundation model for solving partial differential equations : Abstract: Partial differential equations (PDEs) govern a wide range of physical systems, and recent multimodal foundation models have shown promise for learning PDE solution operators across diverse e...
- Mechanistic Analysis of Circuit Preservation in Federated Learning : Abstract: Federated Learning (FL) enables collaborative training of models on decentralized data, but its performance degrades significantly under Non-IID (non-independent and identically distributed)...
- Merge before Forget: A Single LoRA Continual Learning via Continual Merging : Abstract: Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Curren...
- Fusion or Confusion? Multimodal Complexity Is Not All You Need : Abstract: Deep learning architectures for multimodal learning have increased in complexity, driven by the assumption that multimodal-specific methods improve performance. We challenge this assumption ...
- A Context-Aware Temporal Modeling through Unified Multi-Scale Temporal Encoding and Hierarchical Sequence Learning for Single-Channel EEG Sleep Staging : Abstract: Automatic sleep staging is a critical task in healthcare due to the global prevalence of sleep disorders. This study focuses on single-channel electroencephalography (EEG), a practical and w...
- FLOW: A Feedback-Driven Synthetic Longitudinal Dataset of Work and Wellbeing : Abstract: Access to longitudinal, individual-level data on work-life balance and wellbeing is limited by privacy, ethical, and logistical constraints. This poses challenges for reproducible research, ...
- APO: Alpha-Divergence Preference Optimization : Abstract: Two divergence regimes dominate modern alignment practice. Supervised fine-tuning and many distillation-style objectives implicitly minimize the forward KL divergence KL(q || pi_theta), yiel...
- Multiple Token Divergence: Measuring and Steering In-Context Computation Density : Abstract: Measuring the in-context computational effort of language models is a key challenge, as metrics like next-token loss fail to capture reasoning complexity. Prior methods based on latent state...
- Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning : Abstract: Deep Q-learning algorithms remain notoriously unstable, especially during early training when the maximization operator amplifies estimation errors. Inspired by bounded rationality theory an...
- MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning : Abstract: Cognitive diagnosis is an essential research topic in intelligent education, aimed at assessing the level of mastery of different skills by students. So far, many research works have used de...
- Debugging Tabular Log as Dynamic Graphs : Abstract: Tabular log abstracts objects and events in the real-world system and reports their updates to reflect the change of the system, where one can detect real-world inconsistencies efficiently b...
- Federated Multi-Task Clustering : Abstract: Spectral clustering has emerged as one of the most effective clustering algorithms due to its superior performance. However, most existing models are designed for centralized settings, rende...
- Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral : Abstract: Large language models (LLMs) have achieved remarkable performance but face critical challenges: hallucinations and high inference costs. Leveraging multiple experts offers a solution: deferr...
- Fundamental Novel Consistency Theory: $H$-Consistency Bounds : Abstract: In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiabi...
- TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning : Abstract: Reinforcement Learning (RL) has achieved significant success in solving single-goal tasks. However, uniform goal selection often results in sample inefficiency in multi-goal settings where a...
- Long-Range Distillation: Distilling 10,000 Years of Simulated Climate into Long Timestep AI Weather Models : Abstract: Accurate long-range weather forecasting remains a major challenge for AI models, both because errors accumulate over autoregressive rollouts and because reanalysis datasets used for training...
- MoR: Mixture Of Representations For Mixed-Precision Training : Abstract: Mixed-precision training is a crucial technique for scaling deep learning models, but successful mixedprecision training requires identifying and applying the right combination of training m...
- ReDiF: Reinforced Distillation for Few Step Diffusion : Abstract: Distillation addresses the slow sampling problem in diffusion models by creating models with smaller size or fewer steps that approximate the behavior of high-step teachers. In this work, we...
- SNM-Net: A Universal Framework for Robust Open-Set Gas Recognition via Spherical Normalization and Mahalanobis Distance : Abstract: Electronic nose (E-nose) systems face dual challenges in open-set gas recognition: feature distribution shifts caused by signal drift and decision failures induced by unknown interference. E...
- Discovering Transmission Dynamics of COVID-19 in China : Abstract: A comprehensive retrospective analysis of public health interventions, such as large scale testing, quarantining, and contact tracing, can help identify mechanisms most effective in mitigati...
- Adapting, Fast and Slow: Transportable Circuits for Few-Shot Learning : Abstract: Generalization across the domains is not possible without asserting a structure that constrains the unseen target domain w.r.t. the source domain. Building on causal transportability theory,...
- Schrodinger AI: A Unified Spectral-Dynamical Framework for Classification, Reasoning, and Operator-Based Generalization : Abstract: We introduce \textbf{Schrödinger AI}, a unified machine learning framework inspired by quantum mechanics. The system is defined by three tightly coupled components: (1) a {time-independent w...
- GRExplainer: A Universal Explanation Method for Temporal Graph Neural Networks : Abstract: Dynamic graphs are widely used to represent evolving real-world networks. Temporal Graph Neural Networks (TGNNs) have emerged as a powerful tool for processing such graphs, but the lack of t...
- Understanding the Mechanisms of Fast Hyperparameter Transfer : Abstract: The growing scale of deep learning models has rendered standard hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware hyperparameters, whic...
- A Micro-Macro Machine Learning Framework for Predicting Childhood Obesity Risk Using NHANES and Environmental Determinants : Abstract: Childhood obesity remains a major public health challenge in the United States, strongly influenced by a combination of individual-level, household-level, and environmental-level risk factor...
- From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms : Abstract: We study the pricing behavior of third-party platforms facing strategic agents. Assuming the platform is a revenue maximizer, it observes market features that generally affect demand. Since ...
- Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL : Abstract: Text-to-SQL translates natural language questions into SQL statements grounded in a target database schema. Ensuring the reliability and executability of such systems requires validating gen...
- When Does Multi-Task Learning Fail? Quantifying Data Imbalance and Task Independence in Metal Alloy Property Prediction : Abstract: Multi-task learning (MTL) assumes related material properties share underlying physics that can be leveraged for better predictions. We test this by simultaneously predicting electrical resi...
- FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents : Abstract: Long-horizon reinforcement learning (RL) for large language models faces critical scalability challenges from unbounded context growth, leading to context folding methods that compress inter...
- What Matters in Deep Learning for Time Series Forecasting? : Abstract: Deep learning models have grown increasingly popular in time series applications. However, the large quantity of newly proposed architectures, together with often contradictory empirical res...
- Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors : Abstract: This paper presents a novel learning-based framework for predicting power outages caused by extreme events. The proposed approach specifically targets low-probability, high-consequence outag...
- Learning with the $p$-adics : Abstract: Existing machine learning frameworks operate over the field of real numbers ($\mathbb{R}$) and learn representations in real (Euclidean or Hilbert) vector spaces (e.g., $\mathbb{R}^d$). Thei...
- Beyond Centralization: Provable Communication Efficient Decentralized Multi-Task Learning : Abstract: Representation learning is a widely adopted framework for learning in data-scarce environments, aiming to extract common features from related tasks. While centralized approaches have been e...
- Quantum Generative Models for Computational Fluid Dynamics: A First Exploration of Latent Space Learning in Lattice Boltzmann Simulations : Abstract: This paper presents the first application of quantum generative models to learned latent space representations of computational fluid dynamics (CFD) data. While recent work has explored quan...
- Scaling Unverifiable Rewards: A Case Study on Visual Insights : Abstract: Large Language Model (LLM) agents can increasingly automate complex reasoning through Test-Time Scaling (TTS), iterative refinement guided by reward signals. However, many real-world tasks i...
- Communication Compression for Distributed Learning with Aggregate and Server-Guided Feedback : Abstract: Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which ...
- Gold Price Prediction Using Long Short-Term Memory and Multi-Layer Perceptron with Gray Wolf Optimizer : Abstract: The global gold market, by its fundamentals, has long been home to many financial institutions, banks, governments, funds, and micro-investors. Due to the inherent complexity and relationshi...
- Cryptocurrency Price Prediction Using Parallel Gated Recurrent Units : Abstract: According to the advent of cryptocurrencies and Bitcoin, many investments and businesses are now conducted online through cryptocurrencies. Among them, Bitcoin uses blockchain technology to ...
- Energy-Guided Flow Matching Enables Few-Step Conformer Generation and Ground-State Identification : Abstract: Generating low-energy conformer ensembles and identifying ground-state conformations from molecular graphs remain computationally demanding with physics-based pipelines. Current learning-bas...
- Data-Driven Analysis of Crash Patterns in SAE Level 2 and Level 4 Automated Vehicles Using K-means Clustering and Association Rule Mining : Abstract: Automated Vehicles (AV) hold potential to reduce or eliminate human driving errors, enhance traffic safety, and support sustainable mobility. Recently, crash data has increasingly revealed t...
- On Admissible Rank-based Input Normalization Operators : Abstract: Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, th...
- TimePerceiver: An Encoder-Decoder Framework for Generalized Time-Series Forecasting : Abstract: In machine learning, effective modeling requires a holistic consideration of how to encode inputs, make predictions (i.e., decoding), and train the model. However, in time-series forecasting...
- Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks : Abstract: Spiking Neural Networks (SNNs) utilize spike-based activations to mimic the brain's energy-efficient information processing. However, the binary and discontinuous nature of spike activations...
- Decomposing Task Vectors for Refined Model Editing : Abstract: Large pre-trained models have transformed machine learning, yet adapting these models effectively to exhibit precise, concept-specific behaviors remains a significant challenge. Task vectors...
- Predicting LLM Correctness in Prosthodontics Using Metadata and Hallucination Signals : Abstract: Large language models (LLMs) are increasingly adopted in high-stakes domains such as healthcare and medical education, where the risk of generating factually incorrect (i.e., hallucinated) i...
- The Quest for Winning Tickets in Low-Rank Adapters : Abstract: The Lottery Ticket Hypothesis (LTH) suggests that over-parameterized neural networks contain sparse subnetworks ("winning tickets") capable of matching full model performance when trained fr...
- Toward Real-World IoT Security: Concept Drift-Resilient IoT Botnet Detection via Latent Space Representation Learning and Alignment : Abstract: Although AI-based models have achieved high accuracy in IoT threat detection, their deployment in enterprise environments is constrained by reliance on stationary datasets that fail to refle...
- Collaborative Optimization of Multiclass Imbalanced Learning: Density-Aware and Region-Guided Boosting : Abstract: Numerous studies attempt to mitigate classification bias caused by class imbalance. However, existing studies have yet to explore the collaborative optimization of imbalanced learning and mo...
- The Bayesian Geometry of Transformer Attention : Abstract: Transformers often appear to perform Bayesian reasoning in context, but verifying this rigorously has been impossible: natural data lack analytic posteriors, and large models conflate reason...
- GLUE: Gradient-free Learning to Unify Experts : Abstract: In many deployed systems (multilingual ASR, cross-hospital imaging, region-specific perception), multiple pretrained specialist models coexist. Yet, new target domains often require domain e...
- AMBIT: Augmenting Mobility Baselines with Interpretable Trees : Abstract: Origin-destination (OD) flow prediction remains a core task in GIS and urban analytics, yet practical deployments face two conflicting needs: high accuracy and clear interpretability. This p...
- AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing : Abstract: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap be...
- Causality-Inspired Safe Residual Correction for Multivariate Time Series : Abstract: While modern multivariate forecasters such as Transformers and GNNs achieve strong benchmark performance, they often suffer from systematic errors at specific variables or horizons and, crit...
- BLISS: Bandit Layer Importance Sampling Strategy for Efficient Training of Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their application to large graphs is hindered by computational costs. The need to process every n...
- Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration : Abstract: Hyperparameter tuning can dramatically impact training stability and final performance of large-scale models. Recent works on neural network parameterisations, such as $μ$P, have enabled tra...
- The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models : Abstract: Although parameter-efficient fine-tuning methods, such as LoRA, only modify a small subset of parameters, they can have a significant impact on the model. Our instruction-tuning experiments ...
- Expert System for Bitcoin Forecasting: Integrating Global Liquidity via TimeXer Transformers : Abstract: Bitcoin price forecasting is characterized by extreme volatility and non-stationarity, often defying traditional univariate time-series models over long horizons. This paper addresses a crit...
- Decomposing Uncertainty in Probabilistic Knowledge Graph Embeddings: Why Entity Variance Is Not Enough : Abstract: Probabilistic knowledge graph embeddings represent entities as distributions, using learned variances to quantify epistemic uncertainty. We identify a fundamental limitation: these variances...
- LangPrecip: Language-Aware Multimodal Precipitation Nowcasting : Abstract: Short-term precipitation nowcasting is an inherently uncertain and under-constrained spatiotemporal forecasting problem, especially for rapidly evolving and extreme weather events. Existing ...
- Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making : Abstract: Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions ...
- LLMBoost: Make Large Language Models Stronger with Boosting : Abstract: Ensemble learning of LLMs has emerged as a promising alternative to enhance performance, but existing approaches typically treat models as black boxes, combining the inputs or final outputs ...
- PDx -- Adaptive Credit Risk Forecasting Model in Digital Lending using Machine Learning Operations : Abstract: This paper presents PDx, an adaptive, machine learning operations (MLOps) driven decision system for forecasting credit risk using probability of default (PD) modeling in digital lending. Wh...
- Statistical and Machine Learning Analysis of Traffic Accidents on US 158 in Currituck County: A Comparison with HSM Predictions : Abstract: This study extends previous hotspot and Chi-Square analysis by Sawyer \cite{sawyer2025hotspot} by integrating advanced statistical analysis, machine learning, and spatial modeling techniques...
- Hybrid Quantum-Classical Mixture of Experts: Unlocking Topological Advantage via Interference-Based Routing : Abstract: The Mixture-of-Experts (MoE) architecture has emerged as a powerful paradigm for scaling deep learning models, yet it is fundamentally limited by challenges such as expert imbalance and the ...
- Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against : Abstract: Warning-framed content in training data (e.g., "DO NOT USE - this code is vulnerable") does not, it turns out, teach language models to avoid the warned-against behavior. In experiments repo...
- Multi-Head Spectral-Adaptive Graph Anomaly Detection : Abstract: Graph anomaly detection technology has broad applications in financial fraud and risk control. However, existing graph anomaly detection methods often face significant challenges when dealin...
- When Algorithms Manage Humans: A Double Machine Learning Approach to Estimating Nonlinear Effects of Algorithmic Control on Gig Worker Performance and Wellbeing : Abstract: A central question for the future of work is whether person centered management can survive when algorithms take on managerial roles. Standard tools often miss what is happening because work...
- Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model : Abstract: Recently, Masked Diffusion Models (MDMs) have shown promising potential across vision, language, and cross-modal generation. However, a notable discrepancy exists between their training and ...
- Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation : Abstract: Synthetic appliance data are essential for developing non-intrusive load monitoring algorithms and enabling privacy preserving energy research, yet the scarcity of labeled datasets remains a...
- DBAW-PIKAN: Dynamic Balance Adaptive Weight Kolmogorov-Arnold Neural Network for Solving Partial Differential Equations : Abstract: Physics-informed neural networks (PINNs) have led to significant advancements in scientific computing by integrating fundamental physical principles with advanced data-driven techniques. How...
- Valori: A Deterministic Memory Substrate for AI Systems : Abstract: Modern AI systems rely on vector embeddings stored and searched using floating-point arithmetic. While effective for approximate similarity search, this design introduces fundamental non-det...
- Hierarchical Stacking Optimization Using Dirichlet's Process (SoDip): Towards Accelerated Design for Graft Polymerization : Abstract: Radiation-induced grafting (RIG) enables precise functionalization of polymer films for ion-exchange membranes, CO2-separation membranes, and battery electrolytes by generating radicals on r...
- LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs : Abstract: The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and import...
- LuxIA: A Lightweight Unitary matriX-based Framework Built on an Iterative Algorithm for Photonic Neural Network Training : Abstract: PNNs present promising opportunities for accelerating machine learning by leveraging the unique benefits of photonic circuits. However, current state of the art PNN simulation tools face sig...
- The Physics Constraint Paradox: When Removing Explicit Constraints Improves Physics-Informed Data for Machine Learning : Abstract: Physics-constrained data generation is essential for machine learning in scientific domains where real data are scarce; however, existing approaches often over-constrain models without ident...
- Cardiac mortality prediction in patients undergoing PCI based on real and synthetic data : Abstract: Patient status, angiographic and procedural characteristics encode crucial signals for predicting long-term outcomes after percutaneous coronary intervention (PCI). The aim of the study was ...
- Graph Attention-based Adaptive Transfer Learning for Link Prediction : Abstract: Graph neural networks (GNNs) have brought revolutionary advancements to the field of link prediction (LP), providing powerful tools for mining potential relationships in graphs. However, exi...
- Interpretable Perturbation Modeling Through Biomedical Knowledge Graphs : Abstract: Understanding how small molecules perturb gene expression is essential for uncovering drug mechanisms, predicting off-target effects, and identifying repurposing opportunities. While prior d...
- Temporal Visual Semantics-Induced Human Motion Understanding with Large Language Models : Abstract: Unsupervised human motion segmentation (HMS) can be effectively achieved using subspace clustering techniques. However, traditional methods overlook the role of temporal semantic exploration...
- Amortized Inference for Model Rocket Aerodynamics: Learning to Estimate Physical Parameters from Simulation : Abstract: Accurate prediction of model rocket flight performance requires estimating aerodynamic parameters that are difficult to measure directly. Traditional approaches rely on computational fluid d...
- The Affine Divergence: Aligning Activation Updates Beyond Normalisation : Abstract: A systematic mismatch exists between mathematically ideal and effective activation updates during gradient descent. As intended, parameters update in their direction of steepest descent. How...
- Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation : Abstract: As LLM-based judges become integral to industry applications, obtaining well-calibrated uncertainty estimates efficiently has become critical for production deployment. However, existing tec...
- Predicting Mycotoxin Contamination in Irish Oats Using Deep and Transfer Learning : Abstract: Mycotoxin contamination poses a significant risk to cereal crop quality, food safety, and agricultural productivity. Accurate prediction of mycotoxin levels can support early intervention st...
- Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening : Abstract: Lung cancer is the leading cause of cancer-related mortality in adults worldwide. Screening high-risk individuals with annual low-dose CT (LDCT) can support earlier detection and reduce deat...
- Enhanced geometry prediction in laser directed energy deposition using meta-learning : Abstract: Accurate bead geometry prediction in laser-directed energy deposition (L-DED) is often hindered by the scarcity and heterogeneity of experimental datasets collected under different materials...
- EvoXplain: When Machine Learning Models Agree on Predictions but Disagree on Why -- Measuring Mechanistic Multiplicity Across Training Runs : Abstract: Machine learning models are primarily judged by predictive performance, especially in applied settings. Once a model reaches high accuracy, its explanation is often assumed to be correct and...
- Masking Teacher and Reinforcing Student for Distilling Vision-Language Models : Abstract: Large-scale vision-language models (VLMs) have recently achieved remarkable multimodal understanding, but their massive size makes them impractical for deployment on mobile or edge devices. ...
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models : Abstract: Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models. While recent efforts have validated their pre-training potential and accelerated infe...
- ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis : Abstract: Modern networks generate vast, heterogeneous traffic that must be continuously analyzed for security and performance. Traditional network traffic analysis systems, whether rule-based or mach...
- M\"untz-Sz\'asz Networks: Neural Architectures with Learnable Power-Law Bases : Abstract: Standard neural network architectures employ fixed activation functions (ReLU, tanh, sigmoid) that are poorly suited for approximating functions with singular or fractional power behavior, a...
- Interpretable and Adaptive Node Classification on Heterophilic Graphs via Combinatorial Scoring and Hybrid Learning : Abstract: Graph neural networks (GNNs) achieve strong performance on homophilic graphs but often struggle under heterophily, where adjacent nodes frequently belong to different classes. We propose an ...
- On the Existence and Behaviour of Secondary Attention Sinks : Abstract: Attention sinks are tokens, often the beginning-of-sequence (BOS) token, that receive disproportionately high attention despite limited semantic relevance. In this work, we identify a class ...
- Transformer Reconstructed with Dynamic Value Attention : Abstract: Since transformer was firstly published in 2017, several works have been proposed to optimize it. However, the major structure of transformer remains unchanged, ignoring one of its main intr...
- Emotion-Inspired Learning Signals (EILS): A Homeostatic Framework for Adaptive Autonomous Agents : Abstract: The ruling method in modern Artificial Intelligence spanning from Deep Reinforcement Learning (DRL) to Large Language Models (LLMs) relies on a surge of static, externally defined reward fun...
- Frequency Regularization: Unveiling the Spectral Inductive Bias of Deep Neural Networks : Abstract: Regularization techniques such as L2 regularization (Weight Decay) and Dropout are fundamental to training deep neural networks, yet their underlying physical mechanisms regarding feature fr...
- Physics-Informed Machine Learning for Transformer Condition Monitoring -- Part I: Basic Concepts, Neural Networks, and Variants : Abstract: Power transformers are critical assets in power networks, whose reliability directly impacts grid resilience and stability. Traditional condition monitoring approaches, often rule-based or p...
- Physics-Informed Machine Learning for Transformer Condition Monitoring -- Part II: Physics-Informed Neural Networks and Uncertainty Quantification : Abstract: The integration of physics-based knowledge with machine learning models is increasingly shaping the monitoring, diagnostics, and prognostics of electrical transformers. In this two-part seri...
- Learning Tennis Strategy Through Curriculum-Based Dueling Double Deep Q-Networks : Abstract: Tennis strategy optimization is a challenging sequential decision-making problem involving hierarchical scoring, stochastic outcomes, long-horizon credit assignment, physical fatigue, and ad...
- Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection : Abstract: A fundamental limitation of supervised deep learning in high-dimensional tabular domains is "Generalization Collapse": models learn precise decision boundaries for known distributions but fa...
- Wireless Traffic Prediction with Large Language Model : Abstract: The growing demand for intelligent, adaptive resource management in next-generation wireless networks has underscored the importance of accurate and scalable wireless traffic prediction. Whi...
- SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models : Abstract: Post-training alignment of video generation models with human preferences is a critical goal. Developing effective Reward Models (RMs) for this process faces significant methodological hurdl...
- Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders : Abstract: Unsupervised representation learning seeks to recover latent generative factors, yet standard methods relying on statistical independence often fail to capture causal dependencies. A central...
- Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses : Abstract: Graph Neural Networks (GNNs) have emerged as a dominant paradigm for learning on graph-structured data, thanks to their ability to jointly exploit node features and relational information en...
Research Sources: 553 | Generated: 12/30/2025
