AI Research News Feeds for December 29th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Efficient Vision Mamba for MRI Super-Resolution via Hybrid Selective Scanning : Abstract: Background: High-resolution MRI is critical for diagnosis, but long acquisition times limit clinical use. Super-resolution (SR) can enhance resolution post-scan, yet existing deep learning m...
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning : Abstract: Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (MLLMs). While token pruning offers a path to acceleration, we find that c...
Non-Contrast CT Esophageal Varices Grading through Clinical Prior-Enhanced Multi-Organ Analysis : Abstract: Esophageal varices (EV) represent a critical complication of portal hypertension, affecting approximately 60% of cirrhosis patients with a significant bleeding risk of ~30%. While traditiona...
Total Normal Curvature Regularization and its Minimization for Surface and Image Smoothing : Abstract: We introduce a novel formulation for curvature regularization by penalizing normal curvatures from multiple directions. This total normal curvature regularization is capable of producing sol...
Multi-Part Object Representations via Graph Structures and Co-Part Discovery : Abstract: Discovering object-centric representations from images can significantly enhance the robustness, sample efficiency and generalizability of vision models. Works on images with multi-part obje...
AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection : Abstract: Cross-modal misalignments, such as spatial offsets, resolution discrepancies, and semantic deficiencies, frequently occur in visible-infrared object detection (VI-OD). To mitigate this, exis...
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond : Abstract: Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for skeleton-based action understanding. Different from th...
Co-Teaching for Unsupervised Domain Adaptation and Expansion : Abstract: Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To overcome this, Unsupervised Domain Expan...
SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching : Abstract: Creating physically realistic content in VR often requires complex modeling tools or predefined 3D models, textures, and animations, which present significant barriers for non-expert users. ...
The Color-Clinical Decoupling: Why Perceptual Calibration Fails Clinical Biomarkers in Smartphone Dermatology : Abstract: Smartphone-based tele-dermatology assumes that colorimetric calibration ensures clinical reliability, yet this remains untested for underrepresented skin phototypes. We investigated whether ...
RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring : Abstract: Motion blur caused by camera or object movement severely degrades image quality and poses challenges for real-time applications such as autonomous driving, UAV perception, and medical imagin...
Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG : Abstract: Driver drowsiness remains a primary cause of traffic accidents, necessitating the development of real-time, reliable detection systems to ensure road safety. This study presents a Modified T...
A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI : Abstract: The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data vol...
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning : Abstract: Large vision-language models (VLMs) often benefit from intermediate visual cues, either injected via external tools or generated as latent visual tokens during reasoning, but these mechanism...
ProEdit: Inversion-based Editing From Prompts Done Right : Abstract: Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information ...
Learning Association via Track-Detection Matching for Multi-Object Tracking : Abstract: Multi-object tracking aims to maintain object identities over time by associating detections across video frames. Two dominant paradigms exist in literature: tracking-by-detection methods, w...
Yume-1.5: A Text-Controlled Interactive World Generation Model : Abstract: Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face critical challenges such as exces...
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents : Abstract: The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning t...
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models : Abstract: Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital pathology, raising concerns about backdo...
Patch-Discontinuity Mining for Generalized Deepfake Detection : Abstract: The rapid advancement of generative artificial intelligence has enabled the creation of highly realistic fake facial images, posing serious threats to personal privacy and the integrity of o...
iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception : Abstract: Multimodal Large Language Models (MLLMs) show strong potential for interpreting and interacting with complex, pixel-rich Graphical User Interface (GUI) environments. However, building agents...
A Lightweight Multi-Scale Attention Framework for Real-Time Spinal Endoscopic Instance Segmentation : Abstract: Real-time instance segmentation for spinal endoscopy is important for identifying and protecting critical anatomy during surgery, but it is difficult because of the narrow field of view, spe...
Perceive and Calibrate: Analyzing and Enhancing Robustness of Medical Multi-Modal Large Language Models : Abstract: Medical Multi-modal Large Language Models (MLLMs) have shown promising clinical performance. However, their sensitivity to real-world input perturbations, such as imaging artifacts and textu...
Automated Discovery of Parsimonious Spectral Indices via Normalized Difference Polynomials : Abstract: We introduce an automated way to find compact spectral indices for vegetation classification. The idea is to take all pairwise normalized differences from the spectral bands and then build p...
Patch as Node: Human-Centric Graph Representation Learning for Multimodal Action Recognition : Abstract: While human action recognition has witnessed notable achievements, multimodal methods fusing RGB and skeleton modalities still suffer from their inherent heterogeneity and fail to fully expl...
High-Fidelity and Long-Duration Human Image Animation with Diffusion Transformer : Abstract: Recent progress in diffusion models has significantly advanced the field of human image animation. While existing methods can generate temporally consistent results for short or regular moti...
CrownGen: Patient-customized Crown Generation via Point Diffusion Model : Abstract: Digital crown design remains a labor-intensive bottleneck in restorative dentistry. We present \textbf{CrownGen}, a generative framework that automates patient-customized crown design using ...
Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer : Abstract: Visual localization has traditionally been formulated as a pair-wise pose regression problem. Existing approaches mainly estimate relative poses between two images and employ a late-fusion s...
SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis : Abstract: Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel s...
DPAR: Dynamic Patchification for Efficient Autoregressive Visual Generation : Abstract: Decoder-only autoregressive image generation typically relies on fixed-length tokenization schemes whose token counts grow quadratically with resolution, substantially increasing the computa...
EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition : Abstract: Existing video omnimatte methods typically rely on slow, multi-stage, or inference-time optimization pipelines that fail to fully exploit powerful generative priors, producing suboptimal dec...
Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models : Abstract: Conditional image embeddings are feature representations that focus on specific aspects of an image indicated by a given textual condition (e.g., color, genre), which has been a challenging ...
Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees : Abstract: Autoregressive (AR) image models achieve diffusion-level quality but suffer from sequential inference, requiring approximately 2,000 steps for a 576x576 image. Speculative decoding with draf...
Breaking Alignment Barriers: TPS-Driven Semantic Correlation Learning for Alignment-Free RGB-T Salient Object Detection : Abstract: Existing RGB-T salient object detection methods predominantly rely on manually aligned and annotated datasets, struggling to handle real-world scenarios with raw, unaligned RGB-T image pairs...
End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration : Abstract: Multi-view cooperative perception and multimodal fusion are essential for reliable 3D spatiotemporal understanding in autonomous driving, especially under occlusions, limited viewpoints, and...
Diffusion Posterior Sampling for Super-Resolution under Gaussian Measurement Noise : Abstract: This report studies diffusion posterior sampling (DPS) for single-image super-resolution (SISR) under a known degradation model. We implement a likelihood-guided sampling procedure that comb...
AI for Mycetoma Diagnosis in Histopathological Images: The MICCAI 2024 Challenge : Abstract: Mycetoma is a neglected tropical disease caused by fungi or bacteria leading to severe tissue damage and disabilities. It affects poor and rural communities and presents medical challenges a...
Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models : Abstract: Segmenting long-form videos into semantically coherent scenes is a fundamental task in large-scale video understanding. Existing encoder-based methods are limited by visual-centric biases, c...
SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild : Abstract: High-quality AI-powered video dubbing demands precise audio-lip synchronization, high-fidelity visual generation, and faithful preservation of identity and background. Most existing methods ...
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation : Abstract: Real-time portrait animation is essential for interactive applications such as virtual assistants and live avatars, requiring high visual fidelity, temporal coherence, ultra-low latency, and...
AstraNav-World: World Model for Foresight Control and Consistency : Abstract: Embodied navigation in open, dynamic environments demands accurate foresight of how the world will evolve and how actions will unfold over time. We propose AstraNav-World, an end-to-end worl...
RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention : Abstract: Video prediction is plagued by a fundamental trilemma: achieving high-resolution and perceptual quality typically comes at the cost of real-time speed, hindering its use in latency-critical ...
Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction : Abstract: Comprehensively and flexibly capturing the complex spatio-temporal dependencies of human motion is critical for multi-person motion prediction. Existing methods grapple with two primary limi...
FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection : Abstract: The fast evolution of generative models has heightened the demand for reliable detection of AI-generated images. To tackle this challenge, we introduce FUSE, a hybrid system that combines sp...
Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating : Abstract: Accurate segmentation of macular edema, a hallmark pathological feature in vision-threatening conditions such as age-related macular degeneration and diabetic macular edema, is essential for...
ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields : Abstract: Recent advances in digitization technologies have transformed the preservation and dissemination of cultural heritage. In this vein, Neural Radiance Fields (NeRF) have emerged as a leading t...
Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective : Abstract: Visual Geometry Grounded Transformer (VGGT) delivers state-of-the-art feed-forward 3D reconstruction, yet its global self-attention layer suffers from a drastic collapse phenomenon when the ...
SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration : Abstract: Modern vision--language models (VLMs) are increasingly used to interpret and generate educational content, yet their semantic outputs remain challenging to verify, reproduce, and audit over ...
Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation : Abstract: Cross-domain few-shot medical image segmentation (CD-FSMIS) offers a promising and data-efficient solution for medical applications where annotations are severely scarce and multimodal analy...
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture : Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress in visual understanding tasks such as visual grounding, segmentation, and captioning. However, their ability to per...
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding : Abstract: Weather modeling requires both accurate prediction and mechanistic interpretation, yet existing methods treat these goals in isolation, separating generation from understanding. To address t...
Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints : Abstract: Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as ide...
SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration : Abstract: High-fidelity and controllable 3D simulation is essential for addressing the long-tail data scarcity in Autonomous Driving (AD), yet existing methods struggle to simultaneously achieve photo...
CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective : Abstract: Few-shot fine-grained visual categorization (FS-FGVC) focuses on identifying various subcategories within a common superclass given just one or few support examples. Most existing methods ai...
TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant : Abstract: Multimodal Large Language Model (MLLM) Personalization is a critical research problem that facilitates personalized dialogues with MLLMs targeting specific entities (known as personalized co...
GaussianEM: Model compositional and conformational heterogeneity using 3D Gaussians : Abstract: Understanding protein flexibility and its dynamic interactions with other molecules is essential for protein function study. Cryogenic electron microscopy (cryo-EM) provides an opportunity t...
From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement : Abstract: The proliferation of harmful memes on online media poses significant risks to public health and stability. Existing detection methods heavily rely on large-scale labeled data for training, w...
UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation : Abstract: Skin lesion segmentation is a crucial step in dermatology for guiding clinical decision-making. However, existing methods for accurate, robust, and resource-efficient lesion analysis have li...
LLM-Free Image Captioning Evaluation in Reference-Flexible Settings : Abstract: We focus on the automatic evaluation of image captions in both reference-based and reference-free settings. Existing metrics based on large language models (LLMs) favor their own generations...
Toward Intelligent Scene Augmentation for Context-Aware Object Placement and Sponsor-Logo Integration : Abstract: Intelligent image editing increasingly relies on advances in computer vision, multimodal reasoning, and generative modeling. While vision-language models (VLMs) and diffusion models enable g...
EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal : Abstract: Object removal differs from common inpainting, since it must prevent the masked target from reappearing and reconstruct the occluded background with structural and contextual fidelity, rathe...
Vision Transformers are Circulant Attention Learners : Abstract: The self-attention mechanism has been a key factor in the advancement of vision Transformers. However, its quadratic complexity imposes a heavy computational burden in high-resolution scenar...
MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions : Abstract: Polarization-based underwater 3D imaging exploits polarization cues to suppress background scattering, exhibiting distinct advantages in turbid water. Although data-driven polarization-based...
Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art : Abstract: AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compressio...
Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification : Abstract: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly. We study parameter-efficient training (PET) strategies, including frozen encod...
SVBench: Evaluation of Video Generation Models on Social Reasoning : Abstract: Recent text-to-video generation models exhibit remarkable progress in visual realism, motion fidelity, and text-video alignment, yet they remain fundamentally limited in their ability to gen...
Generative Multi-Focus Image Fusion : Abstract: Multi-focus image fusion aims to generate an all-in-focus image from a sequence of partially focused input images. Existing fusion algorithms generally assume that, for every spatial locatio...
IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset : Abstract: Multi-annotator medical image segmentation is an important research problem, but requires annotated datasets that are expensive to collect. Dermoscopic skin lesion imaging allows human exper...
Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment Evaluation : Abstract: Evaluating short-form video content requires moving beyond surface-level quality metrics toward human-aligned, multimodal reasoning. While existing frameworks like VideoScore-2 assess visual...
MAD: Multi-Alignment MEG-to-Text Decoding : Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and m...
Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management : Abstract: Natural Language Processing (NLP) systems are increasingly used in sensitive domains such as healthcare, finance, and government, where they handle large volumes of personal and regulated da...
Context as a Tool: Context Management for Long-Horizon SWE-Agents : Abstract: Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebase...
Self-attention vector output similarities reveal how machines pay attention : Abstract: The self-attention mechanism has significantly advanced the field of natural language processing, facilitating the development of advanced language-learning machines. Although its utility is...
Broken Words, Broken Performance: Effect of Tokenization on Performance of LLMs : Abstract: Tokenization is the first step in training any Large Language Model (LLM), where the text is split into a sequence of tokens as per the model's fixed vocabulary. This tokenization in LLMs is...
SWE-RM: Execution-free Feedback For Software Engineering Agents : Abstract: Execution-based feedback like unit testing is widely used in the development of coding agents through test-time scaling (TTS) and reinforcement learning (RL). This paradigm requires scalable...
Accelerate Speculative Decoding with Sparse Computation in Verification : Abstract: Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel. However, the verification stage often becomes the dominant computatio...
Explainable Statute Prediction via Attention-based Model and LLM Prompting : Abstract: In this paper, we explore the problem of automatic statute prediction where for a given case description, a subset of relevant statutes are to be predicted. Here, the term "statute" refers t...
TimeBill: Time-Budgeted Inference for Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in time-critical systems, such as robotics, autonomous driving, embodied intelligence, and industrial automation, where generating accu...
AlignAR: Generative Sentence Alignment for Arabic-English Parallel Corpora of Legal and Literary Texts : Abstract: High-quality parallel corpora are essential for Machine Translation (MT) research and translation teaching. However, Arabic-English resources remain scarce and existing datasets mainly consi...
Knowledge Reasoning of Large Language Models Integrating Graph-Structured Information for Pest and Disease Control in Tobacco : Abstract: This paper proposes a large language model (LLM) approach that integrates graph-structured information for knowledge reasoning in tobacco pest and disease control. Built upon the GraphRAG fr...
Method Decoration (DeMe): A Framework for LLM-Driven Adaptive Method Generation in Dynamic IoT Environments : Abstract: Intelligent IoT systems increasingly rely on large language models (LLMs) to generate task-execution methods for dynamic environments. However, existing approaches lack the ability to system...
On The Conceptualization and Societal Impact of Cross-Cultural Bias : Abstract: Research has shown that while large language models (LLMs) can generate their responses based on cultural context, they are not perfect and tend to generalize across cultures. However, when ...
Ara-HOPE: Human-Centric Post-Editing Evaluation for Dialectal Arabic to Modern Standard Arabic Translation : Abstract: Dialectal Arabic to Modern Standard Arabic (DA-MSA) translation is a challenging task in Machine Translation (MT) due to significant lexical, syntactic, and semantic divergences between Arab...
MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles : Abstract: Despite recent advancements of fine-tuning large language models (LLMs) to facilitate agent tasks, parameter-efficient fine-tuning (PEFT) methodologies for agent remain largely unexplored. I...
Heaven-Sent or Hell-Bent? Benchmarking the Intelligence and Defectiveness of LLM Hallucinations : Abstract: Hallucinations in large language models (LLMs) are commonly regarded as errors to be minimized. However, recent perspectives suggest that some hallucinations may encode creative or epistemic...
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards : Abstract: Large reasoning models (LRMs) are typically trained using reinforcement learning with verifiable reward (RLVR) to enhance their reasoning abilities. In this paradigm, policies are updated us...
Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM : Abstract: We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Ga...
Beyond Heuristics: A Decision-Theoretic Framework for Agent Memory Management : Abstract: External memory is a key component of modern large language model (LLM) systems, enabling long-term interaction and personalization. Despite its importance, memory management is still largel...
Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach : Abstract: Federated learning (FL) has emerged as a promising distributed learning paradigm for training deep neural networks (DNNs) at the wireless edge, but its performance can be severely hindered b...
Generative Language Models on Nucleotide Sequences of Human Genes : Abstract: Language models, especially transformer-based ones, have achieved colossal success in NLP. To be precise, studies like BERT for NLU and works like GPT-3 for NLG are very important. If we con...
Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models : Abstract: Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this w...
Surrogate Representation Inference for Text and Image Annotations : Abstract: As researchers increasingly rely on machine learning models and LLMs to annotate unstructured data, such as texts or images, various approaches have been proposed to correct bias in downstre...
Bias-variance decompositions: the exclusive privilege of Bregman divergences : Abstract: Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, ot...
HopCast: Calibration of Autoregressive Dynamics Models : Abstract: Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. Many of these models are optimized to predict one step ahead; such a...
Revisiting Bi-Encoder Neural Search: An Encoding--Searching Separation Perspective : Abstract: This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scal...
A Frobenius-Optimal Projection for Enforcing Linear Conservation in Learned Dynamical Models : Abstract: We consider the problem of restoring linear conservation laws in data-driven linear dynamical models. Given a learned operator $\widehat{A}$ and a full-rank constraint matrix $C$ encoding on...
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling : Abstract: Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energ...
Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs : Abstract: While Vision-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications, they exhibit persistent hallucination issues, genera...
Modeling high dimensional point clouds with the spherical cluster model : Abstract: A parametric cluster model is a statistical model providing geometric insights onto the points defining a cluster. The {\em spherical cluster model} (SC) approximates a finite point set $P\s...
Data relativistic uncertainty framework for low-illumination anime scenery image enhancement : Abstract: By contrast with the prevailing works of low-light enhancement in natural images and videos, this study copes with the low-illumination quality degradation in anime scenery images to bridge ...
AutoPP: Towards Automated Product Poster Generation and Optimization : Abstract: Product posters blend striking visuals with informative text to highlight the product and capture customer attention. However, crafting appealing posters and manually optimizing them based o...
Scalable Class-Incremental Learning Based on Parametric Neural Collapse : Abstract: Incremental learning often encounter challenges such as overfitting to new data and catastrophic forgetting of old data. Existing methods can effectively extend the model for new tasks while...
Tilt Matching for Scalable Sampling and Fine-Tuning : Abstract: We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises fro...
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models : Abstract: Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliabilit...
BertsWin: Resolving Topological Sparsity in 3D Masked Autoencoders via Component-Balanced Structural Optimization : Abstract: The application of self-supervised learning (SSL) and Vision Transformers (ViTs) approaches demonstrates promising results in the field of 2D medical imaging, but the use of these methods on...
Assessing the Effectiveness of Membership Inference on Generative Music : Abstract: Generative AI systems are quickly improving, now able to produce believable output in several modalities including images, text, and audio. However, this fast development has prompted increa...
The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds : Abstract: Deepfake detection models have achieved high accuracy in identifying synthetic media, but their decision processes remain largely opaque. In this paper we present a mechanistic interpretabil...
Semantic Codebooks as Effective Priors for Neural Speech Compression : Abstract: Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to ine...
Quantitative Verification of Omega-regular Properties in Probabilistic Programming : Abstract: Probabilistic programming provides a high-level framework for specifying statistical models as executable programs with built-in randomness and conditioning. Existing inference techniques, h...
Incorporating rank-free coupling and external field via an amplitude-only modulated spatial photonic Ising machine : Abstract: Ising machines have emerged as effective solvers for combinatorial optimization problems, such as NP-hard problems, machine learning, and financial modeling. Recent spatial photonic Ising ma...
nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures : Abstract: The efficient deployment of large language models (LLMs) is hindered by memory architecture heterogeneity, where traditional compilers suffer from fragmented workflows and high adaptation co...
Quantum Nondecimated Wavelet Transform: Theory, Circuits, and Applications : Abstract: The nondecimated or translation-invariant wavelet transform (NDWT) is a central tool in classical multiscale signal analysis, valued for its stability, redundancy, and shift invariance. This...
CCAD: Compressed Global Feature Conditioned Anomaly Detection : Abstract: Anomaly detection holds considerable industrial significance, especially in scenarios with limited anomalous data. Currently, reconstruction-based and unsupervised representation-based appro...
An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry : Abstract: Being infinite dimensional, non-parametric information geometry has long faced an "intractability barrier" due to the fact that the Fisher-Rao metric is now a functional incurring difficulti...
Fuzzwise: Intelligent Initial Corpus Generation for Fuzzing : Abstract: In mutation-based greybox fuzzing, generating high-quality input seeds for the initial corpus is essential for effective fuzzing. Rather than conducting separate phases for generating a larg...
Dynamic Attention (DynAttn): Interpretable High-Dimensional Spatio-Temporal Forecasting (with Application to Conflict Fatalities) : Abstract: Forecasting conflict-related fatalities remains a central challenge in political science and policy analysis due to the sparse, bursty, and highly non-stationary nature of violence data. We ...
Scalable Deep Subspace Clustering Network : Abstract: Subspace clustering methods face inherent scalability limits due to the $O(n^3)$ cost (with $n$ denoting the number of data samples) of constructing full $n\times n$ affinities and performin...
Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors : Abstract: In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exception...
A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding : Abstract: Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs...
Learning to Reconfigure: Using Device Status to Select the Right Constrained Coding Scheme : Abstract: In the age of data revolution, a modern storage~or transmission system typically requires different levels of protection. For example, the coding technique used to fortify data in a modern s...
Deep learning-enhanced dual-mode multiplexed optical sensor for point-of-care diagnostics of cardiovascular diseases : Abstract: Rapid and accessible cardiac biomarker testing is essential for the timely diagnosis and risk assessment of myocardial infarction (MI) and heart failure (HF), two interrelated conditions tha...
Sensitivity Analysis of the Consistency Assumption : Abstract: Sensitivity analysis informs causal inference by assessing the sensitivity of conclusions to departures from assumptions. The consistency assumption states that there are no hidden versions ...
Harnessing Data Spaces to Build Intelligent Smart City Infrastructures Across the Cloud-Edge Continuum : Abstract: Smart cities are increasingly adopting data-centric architectures to enhance the efficiency, sustainability, and resilience of urban services.
Explainable Multimodal Regression via Information Decomposition : Abstract: Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods l...
Scaling Adversarial Training via Data Selection : Abstract: Projected Gradient Descent (PGD) is a strong and widely used first-order adversarial attack, yet its computational cost scales poorly, as all training samples undergo identical iterative inn...
Why Smooth Stability Assumptions Fail for ReLU Learning : Abstract: Stability analyses of modern learning systems are frequently derived under smoothness assumptions that are violated by ReLU-type nonlinearities. In this note, we isolate a minimal obstructio...
Direction Finding with Sparse Arrays Based on Variable Window Size Spatial Smoothing : Abstract: In this work, we introduce a variable window size (VWS) spatial smoothing framework that enhances coarray-based direction of arrival (DOA) estimation for sparse linear arrays. By compressing...
HWL-HIN: A Hypergraph-Level Hypergraph Isomorphism Network as Powerful as the Hypergraph Weisfeiler-Lehman Test with Application to Higher-Order Network Robustness : Abstract: Robustness in complex systems is of significant engineering and economic importance. However, conventional attack-based a posteriori robustness assessments incur prohibitive computational ov...
DuaDeep-SeqAffinity: Dual-Stream Deep Learning Framework for Sequence-Only Antigen-Antibody Affinity Prediction : Abstract: Predicting the binding affinity between antigens and antibodies is fundamental to drug discovery and vaccine development. Traditional computational approaches often rely on experimentally de...
Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms : Abstract: The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where ...
Exploring the Heterogeneity of Tabular Data: A Diversity-aware Data Generator via LLMs : Abstract: Tabular data generation has become increasingly essential for enabling robust machine learning applications, which require large-scale, high-quality data. Existing solutions leverage generat...
GQ-VAE: A gated quantized VAE for learning variable length tokens : Abstract: While most frontier models still use deterministic frequency-based tokenization algorithms such as byte-pair encoding (BPE), there has been significant recent work to design learned neural t...
Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers : Abstract: AI data centers which are GPU centric, have adopted liquid cooling to handle extreme heat loads, but coolant leaks result in substantial energy loss through unplanned shutdowns and extended ...
Synthetic Financial Data Generation for Enhanced Financial Modelling : Abstract: Data scarcity and confidentiality in finance often impede model development and robust testing. This paper presents a unified multi-criteria evaluation framework for synthetic financial data...
VAMP-Net: An Interpretable Multi-Path Framework of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance : Abstract: Genomic prediction of drug resistance in Mycobacterium tuberculosis remains challenging due to complex epistatic interactions and highly variable sequencing data quality. We present a novel ...
Approximation Capabilities of Feedforward Neural Networks with GELU Activations : Abstract: We derive an approximation error bound that holds simultaneously for a function and all its derivatives up to any prescribed order. The bounds apply to elementary functions, including multiv...
Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning : Abstract: Continual learning aims to acquire new tasks while preserving performance on previously learned ones, but most methods struggle with catastrophic forgetting. Existing approaches typically tr...
Dictionary-Transform Generative Adversarial Networks : Abstract: Generative adversarial networks (GANs) are widely used for distribution learning, yet their classical formulations remain theoretically fragile, with ill-posed objectives, unstable training ...
Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models : Abstract: Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computati...
Causal-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation : Abstract: Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods of...
Mechanical Strength Prediction of Steel-Polypropylene Fiber-based High-Performance Concrete Using Hybrid Machine Learning Algorithms : Abstract: This research develops and evaluates machine learning models to predict the mechanical properties of steel-polypropylene fiber-reinforced high-performance concrete (HPC). Three model familie...
MAD-NG: Meta-Auto-Decoder Neural Galerkin Method for Solving Parametric Partial Differential Equations : Abstract: Parametric partial differential equations (PDEs) are fundamental for modeling a wide range of physical and engineering systems influenced by uncertain or varying parameters. Traditional neur...
A Data-Driven Multi-Objective Approach for Predicting Mechanical Performance, Flowability, and Porosity in Ultra-High-Performance Concrete (UHPC) : Abstract: This study presents a data-driven, multi-objective approach to predict the mechanical performance, flow ability, and porosity of Ultra-High-Performance Concrete (UHPC). Out of 21 machine lea...
Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care : Abstract: Emergency and intensive care environments require predictive models that are both accurate and computationally efficient, yet clinical data in these settings are often severely imbalanced. S...
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations : Abstract: Humans can efficiently extract knowledge and learn skills from the videos within only a few trials and errors. However, it poses a big challenge to replicate this learning process for autono...
RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models : Abstract: Financial time series forecasting is particularly challenging for transformer-based time series foundation models (TSFMs) due to non-stationarity, heavy-tailed distributions, and high-freque...
AnchorGK: Anchor-based Incremental and Stratified Graph Learning Framework for Inductive Spatio-Temporal Kriging : Abstract: Spatio-temporal kriging is a fundamental problem in sensor networks, driven by the sparsity of deployed sensors and the resulting missing observations. Although recent approaches model spati...
Discovering Sparse Recovery Algorithms Using Neural Architecture Search : Abstract: The design of novel algorithms for solving inverse problems in signal processing is an incredibly difficult, heuristic-driven, and time-consuming task. In this short paper, we the idea of au...
AVP-Fusion: Adaptive Multi-Modal Fusion and Contrastive Learning for Two-Stage Antiviral Peptide Identification : Abstract: Accurate identification of antiviral peptides (AVPs) is critical for accelerating novel drug development. However, current computational methods struggle to capture intricate sequence depend...
Generative Actor Critic : Abstract: Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online exper...
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions : Abstract: Federated Learning (FL) enables collaborative training on decentralized data. Differential privacy (DP) is crucial for FL, but current private methods often rely on unrealistic assumptions (...
Global-Graph Guided and Local-Graph Weighted Contrastive Learning for Unified Clustering on Incomplete and Noise Multi-View Data : Abstract: Recently, contrastive learning (CL) plays an important role in exploring complementary information for multi-view clustering (MVC) and has attracted increasing attention. Nevertheless, real-...
Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training : Abstract: Continual Pre-training (CPT) serves as a fundamental approach for adapting foundation models to domain-specific applications. Scaling laws for pre-training define a power-law relationship be...
Missing Pattern Tree based Decision Grouping and Ensemble for Deep Incomplete Multi-View Clustering : Abstract: Real-world multi-view data usually exhibits highly inconsistent missing patterns which challenges the effectiveness of incomplete multi-view clustering (IMVC). Although existing IMVC methods...
When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning : Abstract: Functional tensor decomposition can analyze multi-dimensional data with real-valued indices, paving the path for applications in machine learning and signal processing. A limitation of exist...
Statistical vs. Deep Learning Models for Estimating Substance Overdose Excess Mortality in the US : Abstract: Substance overdose mortality in the United States claimed over 80,000 lives in 2023, with the COVID-19 pandemic exacerbating existing trends through healthcare disruptions and behavioral cha...
RLLaVA: An RL-central Framework for Language and Vision Assistants : Abstract: We present an RL-central framework for Language and Vision Assistants (RLLaVA) with its formulation of Markov decision process (MDP). RLLaVA decouples RL algorithmic logic from model archite...
An Equivariance Toolbox for Learning Dynamics : Abstract: Many theoretical results in deep learning can be traced to symmetry or equivariance of neural networks under parameter transformations. However, existing analyses are typically problem-speci...
DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction : Abstract: Error-bounded lossy compression techniques have become vital for scientific data management and analytics, given the ever-increasing volume of data generated by modern scientific simulations...
A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning : Abstract: The age of information (AoI) has become a central measure of data freshness in modern wireless systems, yet existing surveys either focus on classical AoI formulations or provide broad discu...
kooplearn: A Scikit-Learn Compatible Library of Algorithms for Evolution Operator Learning : Abstract: kooplearn is a machine-learning library that implements linear, kernel, and deep-learning estimators of dynamical operators and their spectral decompositions. kooplearn can model both discre...
A Reinforcement Learning Approach to Synthetic Data Generation : Abstract: Synthetic data generation (SDG) is a promising approach for enabling data sharing in biomedical studies while preserving patient privacy. Yet, state-of-the-art generative models often requir...
Physics-Informed Neural Solvers for Periodic Quantum Eigenproblems : Abstract: This thesis presents a physics-informed machine learning framework for solving the Floquet-Bloch eigenvalue problem associated with particles in two-dimensional periodic potentials, with a f...
A Causal Lens for Evaluating Faithfulness Metrics : Abstract: Large Language Models (LLMs) offer natural language explanations as an alternative to feature attribution methods for model interpretability. However, despite their plausibility, they may no...
An Exploration of Higher Education Course Evaluation by Large Language Models : Abstract: Course evaluation plays a critical role in ensuring instructional quality and guiding curriculum development in higher education. However, traditional evaluation methods, such as student sur...
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion : Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abiliti...
SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments : Abstract: Split Federated Learning (SFL) is a distributed machine learning framework which strategically divides the learning process between a server and clients and collaboratively trains a shared m...
Pre-training Vision Transformers with Formula-driven Supervised Learning : Abstract: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k and can approach that of the JFT-300M dataset wit...
Creative Agents: Empowering Agents with Imagination for Creative Tasks : Abstract: We study building embodied agents for open-ended creative tasks. While existing methods build instruction-following agents that can perform diverse open-ended tasks, none of them demonstrate...
Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications : Abstract: Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. Prior research identifies code- and configurati...
A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting : Abstract: Automating end-to-end data science pipeline with AI agents still stalls on two gaps: generating insightful, diverse visual evidence and assembling it into a coherent, professional report. We...
Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis : Abstract: Evaluating the performance of various model architectures, such as transformers, large language models (LLMs), and other NLP systems, requires comprehensive benchmarks that measure performan...
Unifying Learning Dynamics and Generalization in Transformers Scaling Law : Abstract: The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources. Yet, while empirically validated...
StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars : Abstract: Real-time, streaming interactive avatars represent a critical yet challenging goal in digital human research. Although diffusion-based human avatar generation methods achieve remarkable succ...
From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation : Abstract: Hit identification is a critical yet resource-intensive step in the drug discovery pipeline, traditionally relying on high-throughput screening of large compound libraries. Despite advanceme...
LibContinual: A Comprehensive Library towards Realistic Continual Learning : Abstract: A fundamental challenge in Continual Learning (CL) is catastrophic forgetting, where adapting to new tasks degrades the performance on previous ones. While the field has evolved with diverse...
Meta-Learning-Based Handover Management in NextG O-RAN : Abstract: While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency ba...
LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration : Abstract: Unmanned aerial vehicles (UAVs) are crucial tools for post-disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures,...
LVLM-Aided Alignment of Task-Specific Vision Models : Abstract: In high-stakes domains, small task-specific vision models are crucial due to their low computational requirements and the availability of numerous methods to explain their results. However, ...
Unsupervised Anomaly Detection in Brain MRI via Disentangled Anatomy Learning : Abstract: Detection of various lesions in brain MRI is clinically critical, but challenging due to the diversity of lesions and variability in imaging conditions. Current unsupervised learning methods...
Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model : Abstract: Aligning large language models to preference data is commonly implemented by assuming a known link function between the distribution of observed preferences and the unobserved rewards (e.g.,...
Flexible Multitask Learning with Factorized Diffusion Policy : Abstract: Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task dis...
MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction : Abstract: Addressing the challenge of multimodal data fusion in high-dimensional biomedical informatics, we propose MMCTOP, a MultiModal Clinical-Trial Outcome Prediction framework that integrates het...
Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space : Abstract: Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigatio...
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models : Abstract: Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. Recently, a distr...
MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting : Abstract: Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where transparent and reproducible analysis of he...
CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics : Abstract: Cricket is the second most popular sport globally, commanding a massive following of over 2.5 billion fans globally. Enthusiasts and analysts frequently seek advanced statistical insights, s...
Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content? : Abstract: Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential cop...
Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation : Abstract: We propose an explainable, privacy-preserving dataset distillation framework for collaborative financial fraud detection. A trained random forest is converted into transparent, axis-aligned ...
Balancing Accuracy and Efficiency: CNN Fusion Models for Diabetic Retinopathy Screening : Abstract: Diabetic retinopathy (DR) remains a leading cause of preventable blindness, yet large-scale screening is constrained by limited specialist availability and variable image quality across devi...
MoonBot: Modular and On-Demand Reconfigurable Robot Toward Moon Base Construction : Abstract: The allure of lunar surface exploration and development has recently captured widespread global attention. Robots have proved to be indispensable for exploring uncharted terrains, uncovering...
A Comedy of Estimators: On KL Regularization in RL Training of LLMs : Abstract: The reasoning performance of large language models (LLMs) can be substantially improved by training them with reinforcement learning (RL). The RL objective for LLM training involves a regula...
HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs : Abstract: While Large Language Models (LLMs) have achieved remarkable success in cognitive and reasoning benchmarks, they exhibit a persistent deficit in anthropomorphic intelligence-the capacity to n...
S&P 500 Stock's Movement Prediction using CNN : Abstract: This paper is about predicting the movement of stock consist of S&P 500 index. Historically there are many approaches have been tried using various methods to predict the stock movement and ...
CellMamba: Adaptive Mamba for Accurate and Efficient Cell Detection : Abstract: Cell detection in pathological images presents unique challenges due to densely packed objects, subtle inter-class differences, and severe background clutter. In this paper, we propose CellM...
Applications of synthetic financial data in portfolio and risk modeling : Abstract: Synthetic financial data offers a practical way to address the privacy and accessibility challenges that limit research in quantitative finance. This paper examines the use of generative mod...
Multi-agent Adaptive Mechanism Design : Abstract: We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. ...
Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning : Abstract: Between 2021 and 2025, the SciCap project grew from a small seed-funded idea at The Pennsylvania State University (Penn State) into one of the central efforts shaping the scientific figure-c...
InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation : Abstract: Parameter-Efficient Fine-Tuning of Diffusion Transformers (DiTs) for diverse, multi-conditional tasks often suffers from task interference when using monolithic adapters like LoRA. The Mixtu...
Inference-based GAN Video Generation : Abstract: Video generation has seen remarkable progresses thanks to advancements in generative deep learning. Generated videos should not only display coherent and continuous movement but also meaning...
A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets : Abstract: Multimodal medical imaging provides complementary information that is crucial for accurate delineation of pathology, but the development of deep learning models is limited by the scarcity of...
How Do Agents Perform Code Optimization? An Empirical Study : Abstract: Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifi...
A Model of Causal Explanation on Neural Networks for Tabular Data : Abstract: The problem of explaining the results produced by machine learning methods continues to attract attention. Neural network (NN) models, along with gradient boosting machines, are expected to ...
HELP: Hierarchical Embodied Language Planner for Household Tasks : Abstract: Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, ...
An Information Theoretic Perspective on Agentic System Design : Abstract: Agentic language model (LM) systems power modern applications like "Deep Research" and "Claude Code," and leverage multi-LM architectures to overcome context limitations. Beneath their appar...
Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities : Abstract: Space-air-ground-integrated network (SAGIN)-enabled multiconnectivity (MC) is emerging as a key enabler for next-generation networks, enabling users to simultaneously utilize multiple links ...
CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation : Abstract: Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, ...
Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought : Abstract: Latent tokens are gaining attention for enhancing reasoning in large language models (LLMs), yet their internal mechanisms remain unclear. This paper examines the problem from a reliability ...
Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers : Abstract: Large language models (LLMs) can produce text that closely resembles human writing. This capability raises concerns about misuse, including disinformation and content manipulation. Detecting...
Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech : Abstract: Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this causal pathway is key to building natural full-duplex interactive systems...
Zero-Shot to Zero-Lies: Detecting Bengali Deepfake Audio through Transfer Learning : Abstract: The rapid growth of speech synthesis and voice conversion systems has made deepfake audio a major security concern. Bengali deepfake detection remains largely unexplored. In this work, we st...
BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks : Abstract: Handwritten Text Recognition (HTR) is a well-established research area. In contrast, Handwritten Text Generation (HTG) is an emerging field with significant potential. This task is challengi...
RIPCN: A Road Impedance Principal Component Network for Probabilistic Traffic Flow Forecasting : Abstract: Accurate traffic flow forecasting is crucial for intelligent transportation services such as navigation and ride-hailing. In such applications, uncertainty estimation in forecasting is impor...
Comparative Analysis of Deep Learning Models for Perception in Autonomous Vehicles : Abstract: Recently, a plethora of machine learning (ML) and deep learning (DL) algorithms have been proposed to achieve the efficiency, safety, and reliability of autonomous vehicles (AVs). The AVs us...
Near-Optimal Coalition Structures in Polynomial Time : Abstract: We study the classical coalition structure generation (CSG) problem and compare the anytime behavior of three algorithmic paradigms: dynamic programming (DP), MILP branch-and-bound, and spar...
Structural Induced Exploration for Balanced and Scalable Multi-Robot Path Planning : Abstract: Multi-robot path planning is a fundamental yet challenging problem due to its combinatorial complexity and the need to balance global efficiency with fair task allocation among robots. Tradi...
Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database : Abstract: Multimodal cardiovascular magnetic resonance (CMR) imaging provides comprehensive and non-invasive insights into cardiovascular disease (CVD) diagnosis and underlying mechanisms. Despite dec...
Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search : Abstract: Monte Carlo Tree Search (MCTS) has profoundly influenced reinforcement learning (RL) by integrating planning and learning in tasks requiring long-horizon reasoning, exemplified by the AlphaZ...
TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References : Abstract: Understanding natural-language references to objects in dynamic 3D driving scenes is essential for interactive autonomous systems. In practice, many referring expressions describe targets th...
LLM-I2I: Boost Your Small Item2Item Recommendation Model with Large Language Model : Abstract: Item-to-Item (I2I) recommendation models are widely used in real-world systems due to their scalability, real-time capabilities, and high recommendation quality. Research to enhance I2I perf...
Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models : Abstract: Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typic...
A Unified Definition of Hallucination, Or: It's the World Model, Stupid : Abstract: Despite numerous attempts to solve the issue of hallucination since the inception of neural language models, it remains a problem in even frontier large language models today. Why is this th...
Towards Long-window Anchoring in Vision-Language Model Distillation : Abstract: While large vision-language models (VLMs) demonstrate strong long-context understanding, their prevalent small branches fail on linguistics-photography alignment for a limited window size. W...
Exploration of Reproducible Generated Image Detection : Abstract: While the technology for detecting AI-Generated Content (AIGC) images has advanced rapidly, the field still faces two core issues: poor reproducibility and insufficient gen eralizability, wh...
Bidirectional Human-AI Alignment in Education for Trustworthy Learning Environments : Abstract: Artificial intelligence (AI) is transforming education, offering unprecedented opportunities to personalize learning, enhance assessment, and support educators. Yet these opportunities also ...
Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures : Abstract: The rapid integration of generative AI into everyday life underscores the need to move beyond unidirectional alignment models that only adapt AI to human values. This workshop focuses on bid...
Hierarchy-Aware Fine-Tuning of Vision-Language Models : Abstract: Vision-Language Models (VLMs) learn powerful multimodal representations through large-scale image-text pretraining, but adapting them to hierarchical classification is underexplored. Standar...
Selective LLM-Guided Regularization for Enhancing Recommendation Models : Abstract: Large language models provide rich semantic priors and strong reasoning capabilities, making them promising auxiliary signals for recommendation. However, prevailing approaches either deploy...
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO : Abstract: Reinforcement learning (RL), particularly GRPO, improves image generation quality significantly by comparing the relative performance of images generated within the same group. However, in t...
MotionTeller: Multi-modal Integration of Wearable Time-Series with LLMs for Health and Behavioral Understanding : Abstract: As wearable sensing becomes increasingly pervasive, a key challenge remains: how can we generate natural language summaries from raw physiological signals such as actigraphy - minute-level m...
Oogiri-Master: Benchmarking Humor Understanding via Oogiri : Abstract: Humor is a salient testbed for human-like creative thinking in large language models (LLMs). We study humor using the Japanese creative response game Oogiri, in which participants produce wi...
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism : Abstract: The mixture-of-experts (MoE) architecture scales model size with sublinear computational increase but suffers from memory-intensive inference due to KV caches and sparse expert activation. R...
GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification : Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras, which plays an important role in the pr...
Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism : Abstract: Ground Penetrating Radar (GPR) has emerged as a pivotal tool for non-destructive evaluation of subsurface road defects. However, conventional GPR image interpretation remains heavily reliant...
dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning : Abstract: Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticat...
Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models : Abstract: Moral actions are judged not only by their outcomes but by the context in which they occur. We present COMETH (Contextual Organization of Moral Evaluation from Textual Human inputs), a frame...
Teaching People LLM's Errors and Getting it Right : Abstract: People use large language models (LLMs) when they should not. This is partly because they see LLMs compose poems and answer intricate questions, so they understandably, but incorrectly, assu...
LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors : Abstract: The rapid growth in both the scale and complexity of Android malware has driven the widespread adoption of machine learning (ML) techniques for scalable and accurate malware detection. Despi...
Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks : Abstract: Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadow...
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories : Abstract: We introduce AInsteinBench, a large-scale benchmark for evaluating whether large language model (LLM) agents can operate as scientific computing development agents within real research softw...
Reflection-Driven Control for Trustworthy Code Agents : Abstract: Contemporary large language model (LLM) agents are remarkably capable, but they still lack reliable safety controls and can produce unconstrained, unpredictable, and even actively harmful ou...
Multi-Agent LLM Committees for Autonomous Software Beta Testing : Abstract: Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-a...
CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation : Abstract: Building on the affective dream-replay reinforcement learning framework of CosmoCore, we introduce CosmoCore-Evo, an extension that incorporates evolutionary algorithms to enhance adaptabili...
Fairness Is Not Just Ethical: Performance Trade-Off via Data Correlation Tuning to Mitigate Bias in ML Software : Abstract: Traditional software fairness research typically emphasizes ethical and social imperatives, neglecting that fairness fundamentally represents a core software quality issue arising directly f...
Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks : Abstract: Text-to-SQL systems allow non-SQL experts to interact with relational databases using natural language. However, their tendency to generate executable SQL for ambiguous, out-of-scope, or una...
Atomistic Simulation Guided Convolutional Neural Networks for Thermal Modeling of Friction Stir Welding : Abstract: Accurate prediction of temperature evolution is essential for understanding thermomechanical behavior in friction stir welding. In this study, molecular dynamics simulations were performed u...
EcoNet: Multiagent Planning and Control Of Household Energy Resources Using Active Inference : Abstract: Advances in automated systems afford new opportunities for intelligent management of energy at household, local area, and utility scales. Home Energy Management Systems (HEMS) can play a rol...
Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks : Abstract: Neural network pruning is widely used to reduce model size and computational cost. Yet, most existing methods treat sparsity as an externally imposed constraint, enforced through heuristic i...
SpatialBench: Can Agents Analyze Real-World Spatial Biology Data? : Abstract: Spatial transcriptomics assays are rapidly increasing in scale and complexity, making computational analysis a major bottleneck in biological discovery. Although frontier AI agents have impr...
Accelerating Scientific Discovery with Autonomous Goal-evolving Agents : Abstract: There has been unprecedented interest in developing agents that expand the boundary of scientific discovery, primarily by optimizing quantitative objective functions specified by scientists....
Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets : Abstract: Generative Artificial Intelligence (GAI) has experienced exponential growth in recent years, partly facilitated by the abundance of large-scale open-source datasets. These datasets are often...
Towards Responsible and Explainable AI Agents with Consensus-Driven Reasoning : Abstract: Agentic AI represents a major shift in how autonomous systems reason, plan, and execute multi-step tasks through the coordination of Large Language Models (LLMs), Vision Language Models (VLM...
Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing : Abstract: This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of $M$...
Democratizing Drug Discovery with an Orchestrated, Knowledge-Driven Multi-Agent Team for User-Guided Therapeutic Design : Abstract: Therapeutic discovery remains a formidable challenge, impeded by the fragmentation of specialized domains and the execution gap between computational design and physiological validation. Alt...
AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design : Abstract: In this paper, we propose AMS-IO-Agent, a domain-specialized LLM-based agent for structure-aware input/output (I/O) subsystem generation in analog and mixed-signal (AMS) integrated circuits ...
A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning : Abstract: With the rapid growth of large language models (LLMs) and vision-language models (VLMs) in medicine, simply integrating clinical text and medical imaging does not guarantee reliable reasonin...
NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent : Abstract: We present the development and optimization of PayPal's Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Thro...
Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model : Abstract: Existing approaches typically rely on fixed length penalties, but such penalties are hard to tune and fail to adapt to the evolving reasoning abilities of LLMs, leading to suboptimal trade-o...
LogicLens: Visual-Logical Co-Reasoning for Text-Centric Forgery Analysis : Abstract: Sophisticated text-centric forgeries, fueled by rapid AIGC advancements, pose a significant threat to societal security and information authenticity. Current methods for text-centric forgery...
Three-way decision with incomplete information based on similarity and satisfiability : Abstract: Three-way decision is widely applied with rough set theory to learn classification or decision rules. The approaches dealing with complete information are well established in the literature,...
Feasible strategies in three-way conflict analysis with three-valued ratings : Abstract: Most existing work on three-way conflict analysis has focused on trisecting agent pairs, agents, or issues, which contributes to understanding the nature of conflicts but falls short in addr...
Three-way conflict analysis based on alliance and conflict functions : Abstract: Trisecting agents, issues, and agent pairs are essential topics of three-way conflict analysis. They have been commonly studied based on either a rating or an auxiliary function. A rating fu...
A Study of Solving Life-and-Death Problems in Go Using Relevance-Zone Based Solvers : Abstract: This paper analyzes the behavior of solving Life-and-Death (L&D) problems in the game of Go using current state-of-the-art computer Go solvers with two techniques: the Relevance-Zone Based S...
From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration : Abstract: Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenge...

Research Sources: 264 | Generated: 12/29/2025