AI RESEARCH PAPERS & ACADEMIC SOURCES
- Efficient Vision Mamba for MRI Super-Resolution via Hybrid Selective Scanning : Abstract: Background: High-resolution MRI is critical for diagnosis, but long acquisition times limit clinical use. Super-resolution (SR) can enhance resolution post-scan, yet existing deep learning m...
- D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning : Abstract: Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (MLLMs). While token pruning offers a path to acceleration, we find that c...
- Non-Contrast CT Esophageal Varices Grading through Clinical Prior-Enhanced Multi-Organ Analysis : Abstract: Esophageal varices (EV) represent a critical complication of portal hypertension, affecting approximately 60% of cirrhosis patients with a significant bleeding risk of ~30%. While traditiona...
- Total Normal Curvature Regularization and its Minimization for Surface and Image Smoothing : Abstract: We introduce a novel formulation for curvature regularization by penalizing normal curvatures from multiple directions. This total normal curvature regularization is capable of producing sol...
- Multi-Part Object Representations via Graph Structures and Co-Part Discovery : Abstract: Discovering object-centric representations from images can significantly enhance the robustness, sample efficiency and generalizability of vision models. Works on images with multi-part obje...
- AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection : Abstract: Cross-modal misalignments, such as spatial offsets, resolution discrepancies, and semantic deficiencies, frequently occur in visible-infrared object detection (VI-OD). To mitigate this, exis...
- Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond : Abstract: Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for skeleton-based action understanding. Different from th...
- Co-Teaching for Unsupervised Domain Adaptation and Expansion : Abstract: Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To overcome this, Unsupervised Domain Expan...
- SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching : Abstract: Creating physically realistic content in VR often requires complex modeling tools or predefined 3D models, textures, and animations, which present significant barriers for non-expert users. ...
- The Color-Clinical Decoupling: Why Perceptual Calibration Fails Clinical Biomarkers in Smartphone Dermatology : Abstract: Smartphone-based tele-dermatology assumes that colorimetric calibration ensures clinical reliability, yet this remains untested for underrepresented skin phototypes. We investigated whether ...
- RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring : Abstract: Motion blur caused by camera or object movement severely degrades image quality and poses challenges for real-time applications such as autonomous driving, UAV perception, and medical imagin...
- Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG : Abstract: Driver drowsiness remains a primary cause of traffic accidents, necessitating the development of real-time, reliable detection systems to ensure road safety. This study presents a Modified T...
- A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI : Abstract: The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data vol...
- See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning : Abstract: Large vision-language models (VLMs) often benefit from intermediate visual cues, either injected via external tools or generated as latent visual tokens during reasoning, but these mechanism...
- ProEdit: Inversion-based Editing From Prompts Done Right : Abstract: Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information ...
- Learning Association via Track-Detection Matching for Multi-Object Tracking : Abstract: Multi-object tracking aims to maintain object identities over time by associating detections across video frames. Two dominant paradigms exist in literature: tracking-by-detection methods, w...
- Yume-1.5: A Text-Controlled Interactive World Generation Model : Abstract: Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face critical challenges such as exces...
- MAI-UI Technical Report: Real-World Centric Foundation GUI Agents : Abstract: The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning t...
- Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models : Abstract: Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital pathology, raising concerns about backdo...
- Patch-Discontinuity Mining for Generalized Deepfake Detection : Abstract: The rapid advancement of generative artificial intelligence has enabled the creation of highly realistic fake facial images, posing serious threats to personal privacy and the integrity of o...
- iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception : Abstract: Multimodal Large Language Models (MLLMs) show strong potential for interpreting and interacting with complex, pixel-rich Graphical User Interface (GUI) environments. However, building agents...
- A Lightweight Multi-Scale Attention Framework for Real-Time Spinal Endoscopic Instance Segmentation : Abstract: Real-time instance segmentation for spinal endoscopy is important for identifying and protecting critical anatomy during surgery, but it is difficult because of the narrow field of view, spe...
- Perceive and Calibrate: Analyzing and Enhancing Robustness of Medical Multi-Modal Large Language Models : Abstract: Medical Multi-modal Large Language Models (MLLMs) have shown promising clinical performance. However, their sensitivity to real-world input perturbations, such as imaging artifacts and textu...
- Automated Discovery of Parsimonious Spectral Indices via Normalized Difference Polynomials : Abstract: We introduce an automated way to find compact spectral indices for vegetation classification. The idea is to take all pairwise normalized differences from the spectral bands and then build p...
- Patch as Node: Human-Centric Graph Representation Learning for Multimodal Action Recognition : Abstract: While human action recognition has witnessed notable achievements, multimodal methods fusing RGB and skeleton modalities still suffer from their inherent heterogeneity and fail to fully expl...
- High-Fidelity and Long-Duration Human Image Animation with Diffusion Transformer : Abstract: Recent progress in diffusion models has significantly advanced the field of human image animation. While existing methods can generate temporally consistent results for short or regular moti...
- CrownGen: Patient-customized Crown Generation via Point Diffusion Model : Abstract: Digital crown design remains a labor-intensive bottleneck in restorative dentistry. We present \textbf{CrownGen}, a generative framework that automates patient-customized crown design using ...
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer : Abstract: Visual localization has traditionally been formulated as a pair-wise pose regression problem. Existing approaches mainly estimate relative poses between two images and employ a late-fusion s...
- SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis : Abstract: Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel s...
- DPAR: Dynamic Patchification for Efficient Autoregressive Visual Generation : Abstract: Decoder-only autoregressive image generation typically relies on fixed-length tokenization schemes whose token counts grow quadratically with resolution, substantially increasing the computa...
- EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition : Abstract: Existing video omnimatte methods typically rely on slow, multi-stage, or inference-time optimization pipelines that fail to fully exploit powerful generative priors, producing suboptimal dec...
- Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models : Abstract: Conditional image embeddings are feature representations that focus on specific aspects of an image indicated by a given textual condition (e.g., color, genre), which has been a challenging ...
- Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees : Abstract: Autoregressive (AR) image models achieve diffusion-level quality but suffer from sequential inference, requiring approximately 2,000 steps for a 576x576 image. Speculative decoding with draf...
- Breaking Alignment Barriers: TPS-Driven Semantic Correlation Learning for Alignment-Free RGB-T Salient Object Detection : Abstract: Existing RGB-T salient object detection methods predominantly rely on manually aligned and annotated datasets, struggling to handle real-world scenarios with raw, unaligned RGB-T image pairs...
- End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration : Abstract: Multi-view cooperative perception and multimodal fusion are essential for reliable 3D spatiotemporal understanding in autonomous driving, especially under occlusions, limited viewpoints, and...
- Diffusion Posterior Sampling for Super-Resolution under Gaussian Measurement Noise : Abstract: This report studies diffusion posterior sampling (DPS) for single-image super-resolution (SISR) under a known degradation model. We implement a likelihood-guided sampling procedure that comb...
- AI for Mycetoma Diagnosis in Histopathological Images: The MICCAI 2024 Challenge : Abstract: Mycetoma is a neglected tropical disease caused by fungi or bacteria leading to severe tissue damage and disabilities. It affects poor and rural communities and presents medical challenges a...
- Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models : Abstract: Segmenting long-form videos into semantically coherent scenes is a fundamental task in large-scale video understanding. Existing encoder-based methods are limited by visual-centric biases, c...
- SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild : Abstract: High-quality AI-powered video dubbing demands precise audio-lip synchronization, high-fidelity visual generation, and faithful preservation of identity and background. Most existing methods ...
- Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation : Abstract: Real-time portrait animation is essential for interactive applications such as virtual assistants and live avatars, requiring high visual fidelity, temporal coherence, ultra-low latency, and...
- AstraNav-World: World Model for Foresight Control and Consistency : Abstract: Embodied navigation in open, dynamic environments demands accurate foresight of how the world will evolve and how actions will unfold over time. We propose AstraNav-World, an end-to-end worl...
- RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention : Abstract: Video prediction is plagued by a fundamental trilemma: achieving high-resolution and perceptual quality typically comes at the cost of real-time speed, hindering its use in latency-critical ...
- Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction : Abstract: Comprehensively and flexibly capturing the complex spatio-temporal dependencies of human motion is critical for multi-person motion prediction. Existing methods grapple with two primary limi...
- FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection : Abstract: The fast evolution of generative models has heightened the demand for reliable detection of AI-generated images. To tackle this challenge, we introduce FUSE, a hybrid system that combines sp...
- Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating : Abstract: Accurate segmentation of macular edema, a hallmark pathological feature in vision-threatening conditions such as age-related macular degeneration and diabetic macular edema, is essential for...
- ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields : Abstract: Recent advances in digitization technologies have transformed the preservation and dissemination of cultural heritage. In this vein, Neural Radiance Fields (NeRF) have emerged as a leading t...
- Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective : Abstract: Visual Geometry Grounded Transformer (VGGT) delivers state-of-the-art feed-forward 3D reconstruction, yet its global self-attention layer suffers from a drastic collapse phenomenon when the ...
- SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration : Abstract: Modern vision--language models (VLMs) are increasingly used to interpret and generate educational content, yet their semantic outputs remain challenging to verify, reproduce, and audit over ...
- Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation : Abstract: Cross-domain few-shot medical image segmentation (CD-FSMIS) offers a promising and data-efficient solution for medical applications where annotations are severely scarce and multimodal analy...
- UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture : Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress in visual understanding tasks such as visual grounding, segmentation, and captioning. However, their ability to per...
- Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding : Abstract: Weather modeling requires both accurate prediction and mechanistic interpretation, yet existing methods treat these goals in isolation, separating generation from understanding. To address t...
- Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints : Abstract: Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as ide...
- SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration : Abstract: High-fidelity and controllable 3D simulation is essential for addressing the long-tail data scarcity in Autonomous Driving (AD), yet existing methods struggle to simultaneously achieve photo...
- CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective : Abstract: Few-shot fine-grained visual categorization (FS-FGVC) focuses on identifying various subcategories within a common superclass given just one or few support examples. Most existing methods ai...
- TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant : Abstract: Multimodal Large Language Model (MLLM) Personalization is a critical research problem that facilitates personalized dialogues with MLLMs targeting specific entities (known as personalized co...
- GaussianEM: Model compositional and conformational heterogeneity using 3D Gaussians : Abstract: Understanding protein flexibility and its dynamic interactions with other molecules is essential for protein function study. Cryogenic electron microscopy (cryo-EM) provides an opportunity t...
- From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement : Abstract: The proliferation of harmful memes on online media poses significant risks to public health and stability. Existing detection methods heavily rely on large-scale labeled data for training, w...
- UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation : Abstract: Skin lesion segmentation is a crucial step in dermatology for guiding clinical decision-making. However, existing methods for accurate, robust, and resource-efficient lesion analysis have li...
- LLM-Free Image Captioning Evaluation in Reference-Flexible Settings : Abstract: We focus on the automatic evaluation of image captions in both reference-based and reference-free settings. Existing metrics based on large language models (LLMs) favor their own generations...
- Toward Intelligent Scene Augmentation for Context-Aware Object Placement and Sponsor-Logo Integration : Abstract: Intelligent image editing increasingly relies on advances in computer vision, multimodal reasoning, and generative modeling. While vision-language models (VLMs) and diffusion models enable g...
- EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal : Abstract: Object removal differs from common inpainting, since it must prevent the masked target from reappearing and reconstruct the occluded background with structural and contextual fidelity, rathe...
- Vision Transformers are Circulant Attention Learners : Abstract: The self-attention mechanism has been a key factor in the advancement of vision Transformers. However, its quadratic complexity imposes a heavy computational burden in high-resolution scenar...
- MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions : Abstract: Polarization-based underwater 3D imaging exploits polarization cues to suppress background scattering, exhibiting distinct advantages in turbid water. Although data-driven polarization-based...
- Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art : Abstract: AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compressio...
- Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification : Abstract: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly. We study parameter-efficient training (PET) strategies, including frozen encod...
- SVBench: Evaluation of Video Generation Models on Social Reasoning : Abstract: Recent text-to-video generation models exhibit remarkable progress in visual realism, motion fidelity, and text-video alignment, yet they remain fundamentally limited in their ability to gen...
- Generative Multi-Focus Image Fusion : Abstract: Multi-focus image fusion aims to generate an all-in-focus image from a sequence of partially focused input images. Existing fusion algorithms generally assume that, for every spatial locatio...
- IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset : Abstract: Multi-annotator medical image segmentation is an important research problem, but requires annotated datasets that are expensive to collect. Dermoscopic skin lesion imaging allows human exper...
- Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment Evaluation : Abstract: Evaluating short-form video content requires moving beyond surface-level quality metrics toward human-aligned, multimodal reasoning. While existing frameworks like VideoScore-2 assess visual...
- MAD: Multi-Alignment MEG-to-Text Decoding : Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and m...
- Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management : Abstract: Natural Language Processing (NLP) systems are increasingly used in sensitive domains such as healthcare, finance, and government, where they handle large volumes of personal and regulated da...
- Context as a Tool: Context Management for Long-Horizon SWE-Agents : Abstract: Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebase...
- Self-attention vector output similarities reveal how machines pay attention : Abstract: The self-attention mechanism has significantly advanced the field of natural language processing, facilitating the development of advanced language-learning machines. Although its utility is...
- Broken Words, Broken Performance: Effect of Tokenization on Performance of LLMs : Abstract: Tokenization is the first step in training any Large Language Model (LLM), where the text is split into a sequence of tokens as per the model's fixed vocabulary. This tokenization in LLMs is...
- SWE-RM: Execution-free Feedback For Software Engineering Agents : Abstract: Execution-based feedback like unit testing is widely used in the development of coding agents through test-time scaling (TTS) and reinforcement learning (RL). This paradigm requires scalable...
- Accelerate Speculative Decoding with Sparse Computation in Verification : Abstract: Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel. However, the verification stage often becomes the dominant computatio...
- Explainable Statute Prediction via Attention-based Model and LLM Prompting : Abstract: In this paper, we explore the problem of automatic statute prediction where for a given case description, a subset of relevant statutes are to be predicted. Here, the term "statute" refers t...
- TimeBill: Time-Budgeted Inference for Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in time-critical systems, such as robotics, autonomous driving, embodied intelligence, and industrial automation, where generating accu...
- AlignAR: Generative Sentence Alignment for Arabic-English Parallel Corpora of Legal and Literary Texts : Abstract: High-quality parallel corpora are essential for Machine Translation (MT) research and translation teaching. However, Arabic-English resources remain scarce and existing datasets mainly consi...
- Knowledge Reasoning of Large Language Models Integrating Graph-Structured Information for Pest and Disease Control in Tobacco : Abstract: This paper proposes a large language model (LLM) approach that integrates graph-structured information for knowledge reasoning in tobacco pest and disease control. Built upon the GraphRAG fr...
- Method Decoration (DeMe): A Framework for LLM-Driven Adaptive Method Generation in Dynamic IoT Environments : Abstract: Intelligent IoT systems increasingly rely on large language models (LLMs) to generate task-execution methods for dynamic environments. However, existing approaches lack the ability to system...
- On The Conceptualization and Societal Impact of Cross-Cultural Bias : Abstract: Research has shown that while large language models (LLMs) can generate their responses based on cultural context, they are not perfect and tend to generalize across cultures. However, when ...
- Ara-HOPE: Human-Centric Post-Editing Evaluation for Dialectal Arabic to Modern Standard Arabic Translation : Abstract: Dialectal Arabic to Modern Standard Arabic (DA-MSA) translation is a challenging task in Machine Translation (MT) due to significant lexical, syntactic, and semantic divergences between Arab...
- MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles : Abstract: Despite recent advancements of fine-tuning large language models (LLMs) to facilitate agent tasks, parameter-efficient fine-tuning (PEFT) methodologies for agent remain largely unexplored. I...
- Heaven-Sent or Hell-Bent? Benchmarking the Intelligence and Defectiveness of LLM Hallucinations : Abstract: Hallucinations in large language models (LLMs) are commonly regarded as errors to be minimized. However, recent perspectives suggest that some hallucinations may encode creative or epistemic...
- Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards : Abstract: Large reasoning models (LRMs) are typically trained using reinforcement learning with verifiable reward (RLVR) to enhance their reasoning abilities. In this paradigm, policies are updated us...
- Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM : Abstract: We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Ga...
- Beyond Heuristics: A Decision-Theoretic Framework for Agent Memory Management : Abstract: External memory is a key component of modern large language model (LLM) systems, enabling long-term interaction and personalization. Despite its importance, memory management is still largel...
- Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach : Abstract: Federated learning (FL) has emerged as a promising distributed learning paradigm for training deep neural networks (DNNs) at the wireless edge, but its performance can be severely hindered b...
- Generative Language Models on Nucleotide Sequences of Human Genes : Abstract: Language models, especially transformer-based ones, have achieved colossal success in NLP. To be precise, studies like BERT for NLU and works like GPT-3 for NLG are very important. If we con...
- Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models : Abstract: Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this w...
- Surrogate Representation Inference for Text and Image Annotations : Abstract: As researchers increasingly rely on machine learning models and LLMs to annotate unstructured data, such as texts or images, various approaches have been proposed to correct bias in downstre...
- Bias-variance decompositions: the exclusive privilege of Bregman divergences : Abstract: Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, ot...
- HopCast: Calibration of Autoregressive Dynamics Models : Abstract: Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. Many of these models are optimized to predict one step ahead; such a...
- Revisiting Bi-Encoder Neural Search: An Encoding--Searching Separation Perspective : Abstract: This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scal...
- A Frobenius-Optimal Projection for Enforcing Linear Conservation in Learned Dynamical Models : Abstract: We consider the problem of restoring linear conservation laws in data-driven linear dynamical models. Given a learned operator $\widehat{A}$ and a full-rank constraint matrix $C$ encoding on...
- Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling : Abstract: Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energ...
- Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs : Abstract: While Vision-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications, they exhibit persistent hallucination issues, genera...
- Modeling high dimensional point clouds with the spherical cluster model : Abstract: A parametric cluster model is a statistical model providing geometric insights onto the points defining a cluster. The {\em spherical cluster model} (SC) approximates a finite point set $P\s...
- Data relativistic uncertainty framework for low-illumination anime scenery image enhancement : Abstract: By contrast with the prevailing works of low-light enhancement in natural images and videos, this study copes with the low-illumination quality degradation in anime scenery images to bridge ...
- AutoPP: Towards Automated Product Poster Generation and Optimization : Abstract: Product posters blend striking visuals with informative text to highlight the product and capture customer attention. However, crafting appealing posters and manually optimizing them based o...
- Scalable Class-Incremental Learning Based on Parametric Neural Collapse : Abstract: Incremental learning often encounter challenges such as overfitting to new data and catastrophic forgetting of old data. Existing methods can effectively extend the model for new tasks while...
- Tilt Matching for Scalable Sampling and Fine-Tuning : Abstract: We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises fro...
- Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models : Abstract: Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliabilit...
- BertsWin: Resolving Topological Sparsity in 3D Masked Autoencoders via Component-Balanced Structural Optimization : Abstract: The application of self-supervised learning (SSL) and Vision Transformers (ViTs) approaches demonstrates promising results in the field of 2D medical imaging, but the use of these methods on...
- Assessing the Effectiveness of Membership Inference on Generative Music : Abstract: Generative AI systems are quickly improving, now able to produce believable output in several modalities including images, text, and audio. However, this fast development has prompted increa...
- The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds : Abstract: Deepfake detection models have achieved high accuracy in identifying synthetic media, but their decision processes remain largely opaque. In this paper we present a mechanistic interpretabil...
- Semantic Codebooks as Effective Priors for Neural Speech Compression : Abstract: Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to ine...
- Quantitative Verification of Omega-regular Properties in Probabilistic Programming : Abstract: Probabilistic programming provides a high-level framework for specifying statistical models as executable programs with built-in randomness and conditioning. Existing inference techniques, h...
- Incorporating rank-free coupling and external field via an amplitude-only modulated spatial photonic Ising machine : Abstract: Ising machines have emerged as effective solvers for combinatorial optimization problems, such as NP-hard problems, machine learning, and financial modeling. Recent spatial photonic Ising ma...
- nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures : Abstract: The efficient deployment of large language models (LLMs) is hindered by memory architecture heterogeneity, where traditional compilers suffer from fragmented workflows and high adaptation co...
- Quantum Nondecimated Wavelet Transform: Theory, Circuits, and Applications : Abstract: The nondecimated or translation-invariant wavelet transform (NDWT) is a central tool in classical multiscale signal analysis, valued for its stability, redundancy, and shift invariance. This...
- CCAD: Compressed Global Feature Conditioned Anomaly Detection : Abstract: Anomaly detection holds considerable industrial significance, especially in scenarios with limited anomalous data. Currently, reconstruction-based and unsupervised representation-based appro...
- An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry : Abstract: Being infinite dimensional, non-parametric information geometry has long faced an "intractability barrier" due to the fact that the Fisher-Rao metric is now a functional incurring difficulti...
- Fuzzwise: Intelligent Initial Corpus Generation for Fuzzing : Abstract: In mutation-based greybox fuzzing, generating high-quality input seeds for the initial corpus is essential for effective fuzzing. Rather than conducting separate phases for generating a larg...
- Dynamic Attention (DynAttn): Interpretable High-Dimensional Spatio-Temporal Forecasting (with Application to Conflict Fatalities) : Abstract: Forecasting conflict-related fatalities remains a central challenge in political science and policy analysis due to the sparse, bursty, and highly non-stationary nature of violence data. We ...
- Scalable Deep Subspace Clustering Network : Abstract: Subspace clustering methods face inherent scalability limits due to the $O(n^3)$ cost (with $n$ denoting the number of data samples) of constructing full $n\times n$ affinities and performin...
- Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors : Abstract: In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exception...
- A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding : Abstract: Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs...
- Learning to Reconfigure: Using Device Status to Select the Right Constrained Coding Scheme : Abstract: In the age of data revolution, a modern storage~or transmission system typically requires different levels of protection. For example, the coding technique used to fortify data in a modern s...
- Deep learning-enhanced dual-mode multiplexed optical sensor for point-of-care diagnostics of cardiovascular diseases : Abstract: Rapid and accessible cardiac biomarker testing is essential for the timely diagnosis and risk assessment of myocardial infarction (MI) and heart failure (HF), two interrelated conditions tha...
- Sensitivity Analysis of the Consistency Assumption : Abstract: Sensitivity analysis informs causal inference by assessing the sensitivity of conclusions to departures from assumptions. The consistency assumption states that there are no hidden versions ...
- Harnessing Data Spaces to Build Intelligent Smart City Infrastructures Across the Cloud-Edge Continuum : Abstract: Smart cities are increasingly adopting data-centric architectures to enhance the efficiency, sustainability, and resilience of urban services.
- Explainable Multimodal Regression via Information Decomposition : Abstract: Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods l...
- Scaling Adversarial Training via Data Selection : Abstract: Projected Gradient Descent (PGD) is a strong and widely used first-order adversarial attack, yet its computational cost scales poorly, as all training samples undergo identical iterative inn...
- Why Smooth Stability Assumptions Fail for ReLU Learning : Abstract: Stability analyses of modern learning systems are frequently derived under smoothness assumptions that are violated by ReLU-type nonlinearities. In this note, we isolate a minimal obstructio...
- Direction Finding with Sparse Arrays Based on Variable Window Size Spatial Smoothing : Abstract: In this work, we introduce a variable window size (VWS) spatial smoothing framework that enhances coarray-based direction of arrival (DOA) estimation for sparse linear arrays. By compressing...
- HWL-HIN: A Hypergraph-Level Hypergraph Isomorphism Network as Powerful as the Hypergraph Weisfeiler-Lehman Test with Application to Higher-Order Network Robustness : Abstract: Robustness in complex systems is of significant engineering and economic importance. However, conventional attack-based a posteriori robustness assessments incur prohibitive computational ov...
- DuaDeep-SeqAffinity: Dual-Stream Deep Learning Framework for Sequence-Only Antigen-Antibody Affinity Prediction : Abstract: Predicting the binding affinity between antigens and antibodies is fundamental to drug discovery and vaccine development. Traditional computational approaches often rely on experimentally de...
- Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms : Abstract: The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where ...
- Exploring the Heterogeneity of Tabular Data: A Diversity-aware Data Generator via LLMs : Abstract: Tabular data generation has become increasingly essential for enabling robust machine learning applications, which require large-scale, high-quality data. Existing solutions leverage generat...
- GQ-VAE: A gated quantized VAE for learning variable length tokens : Abstract: While most frontier models still use deterministic frequency-based tokenization algorithms such as byte-pair encoding (BPE), there has been significant recent work to design learned neural t...
- Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers : Abstract: AI data centers which are GPU centric, have adopted liquid cooling to handle extreme heat loads, but coolant leaks result in substantial energy loss through unplanned shutdowns and extended ...
- Synthetic Financial Data Generation for Enhanced Financial Modelling : Abstract: Data scarcity and confidentiality in finance often impede model development and robust testing. This paper presents a unified multi-criteria evaluation framework for synthetic financial data...
- VAMP-Net: An Interpretable Multi-Path Framework of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance : Abstract: Genomic prediction of drug resistance in Mycobacterium tuberculosis remains challenging due to complex epistatic interactions and highly variable sequencing data quality. We present a novel ...
- Approximation Capabilities of Feedforward Neural Networks with GELU Activations : Abstract: We derive an approximation error bound that holds simultaneously for a function and all its derivatives up to any prescribed order. The bounds apply to elementary functions, including multiv...
- Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning : Abstract: Continual learning aims to acquire new tasks while preserving performance on previously learned ones, but most methods struggle with catastrophic forgetting. Existing approaches typically tr...
- Dictionary-Transform Generative Adversarial Networks : Abstract: Generative adversarial networks (GANs) are widely used for distribution learning, yet their classical formulations remain theoretically fragile, with ill-posed objectives, unstable training ...
- Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models : Abstract: Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computati...
- Causal-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation : Abstract: Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods of...
- Mechanical Strength Prediction of Steel-Polypropylene Fiber-based High-Performance Concrete Using Hybrid Machine Learning Algorithms : Abstract: This research develops and evaluates machine learning models to predict the mechanical properties of steel-polypropylene fiber-reinforced high-performance concrete (HPC). Three model familie...
- MAD-NG: Meta-Auto-Decoder Neural Galerkin Method for Solving Parametric Partial Differential Equations : Abstract: Parametric partial differential equations (PDEs) are fundamental for modeling a wide range of physical and engineering systems influenced by uncertain or varying parameters. Traditional neur...
- A Data-Driven Multi-Objective Approach for Predicting Mechanical Performance, Flowability, and Porosity in Ultra-High-Performance Concrete (UHPC) : Abstract: This study presents a data-driven, multi-objective approach to predict the mechanical performance, flow ability, and porosity of Ultra-High-Performance Concrete (UHPC). Out of 21 machine lea...
- Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care : Abstract: Emergency and intensive care environments require predictive models that are both accurate and computationally efficient, yet clinical data in these settings are often severely imbalanced. S...
- Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations : Abstract: Humans can efficiently extract knowledge and learn skills from the videos within only a few trials and errors. However, it poses a big challenge to replicate this learning process for autono...
- RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models : Abstract: Financial time series forecasting is particularly challenging for transformer-based time series foundation models (TSFMs) due to non-stationarity, heavy-tailed distributions, and high-freque...
- AnchorGK: Anchor-based Incremental and Stratified Graph Learning Framework for Inductive Spatio-Temporal Kriging : Abstract: Spatio-temporal kriging is a fundamental problem in sensor networks, driven by the sparsity of deployed sensors and the resulting missing observations. Although recent approaches model spati...
- Discovering Sparse Recovery Algorithms Using Neural Architecture Search : Abstract: The design of novel algorithms for solving inverse problems in signal processing is an incredibly difficult, heuristic-driven, and time-consuming task. In this short paper, we the idea of au...
- AVP-Fusion: Adaptive Multi-Modal Fusion and Contrastive Learning for Two-Stage Antiviral Peptide Identification : Abstract: Accurate identification of antiviral peptides (AVPs) is critical for accelerating novel drug development. However, current computational methods struggle to capture intricate sequence depend...
- Generative Actor Critic : Abstract: Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online exper...
- First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions : Abstract: Federated Learning (FL) enables collaborative training on decentralized data. Differential privacy (DP) is crucial for FL, but current private methods often rely on unrealistic assumptions (...
- Global-Graph Guided and Local-Graph Weighted Contrastive Learning for Unified Clustering on Incomplete and Noise Multi-View Data : Abstract: Recently, contrastive learning (CL) plays an important role in exploring complementary information for multi-view clustering (MVC) and has attracted increasing attention. Nevertheless, real-...
- Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training : Abstract: Continual Pre-training (CPT) serves as a fundamental approach for adapting foundation models to domain-specific applications. Scaling laws for pre-training define a power-law relationship be...
- Missing Pattern Tree based Decision Grouping and Ensemble for Deep Incomplete Multi-View Clustering : Abstract: Real-world multi-view data usually exhibits highly inconsistent missing patterns which challenges the effectiveness of incomplete multi-view clustering (IMVC). Although existing IMVC methods...
- When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning : Abstract: Functional tensor decomposition can analyze multi-dimensional data with real-valued indices, paving the path for applications in machine learning and signal processing. A limitation of exist...
- Statistical vs. Deep Learning Models for Estimating Substance Overdose Excess Mortality in the US : Abstract: Substance overdose mortality in the United States claimed over 80,000 lives in 2023, with the COVID-19 pandemic exacerbating existing trends through healthcare disruptions and behavioral cha...
- RLLaVA: An RL-central Framework for Language and Vision Assistants : Abstract: We present an RL-central framework for Language and Vision Assistants (RLLaVA) with its formulation of Markov decision process (MDP). RLLaVA decouples RL algorithmic logic from model archite...
- An Equivariance Toolbox for Learning Dynamics : Abstract: Many theoretical results in deep learning can be traced to symmetry or equivariance of neural networks under parameter transformations. However, existing analyses are typically problem-speci...
- DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction : Abstract: Error-bounded lossy compression techniques have become vital for scientific data management and analytics, given the ever-increasing volume of data generated by modern scientific simulations...
- A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning : Abstract: The age of information (AoI) has become a central measure of data freshness in modern wireless systems, yet existing surveys either focus on classical AoI formulations or provide broad discu...
- kooplearn: A Scikit-Learn Compatible Library of Algorithms for Evolution Operator Learning : Abstract: kooplearn is a machine-learning library that implements linear, kernel, and deep-learning estimators of dynamical operators and their spectral decompositions. kooplearn can model both discre...
- A Reinforcement Learning Approach to Synthetic Data Generation : Abstract: Synthetic data generation (SDG) is a promising approach for enabling data sharing in biomedical studies while preserving patient privacy. Yet, state-of-the-art generative models often requir...
- Physics-Informed Neural Solvers for Periodic Quantum Eigenproblems : Abstract: This thesis presents a physics-informed machine learning framework for solving the Floquet-Bloch eigenvalue problem associated with particles in two-dimensional periodic potentials, with a f...
- A Causal Lens for Evaluating Faithfulness Metrics : Abstract: Large Language Models (LLMs) offer natural language explanations as an alternative to feature attribution methods for model interpretability. However, despite their plausibility, they may no...
- An Exploration of Higher Education Course Evaluation by Large Language Models : Abstract: Course evaluation plays a critical role in ensuring instructional quality and guiding curriculum development in higher education. However, traditional evaluation methods, such as student sur...
- GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion : Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abiliti...
- SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments : Abstract: Split Federated Learning (SFL) is a distributed machine learning framework which strategically divides the learning process between a server and clients and collaboratively trains a shared m...
- Pre-training Vision Transformers with Formula-driven Supervised Learning : Abstract: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k and can approach that of the JFT-300M dataset wit...
- Creative Agents: Empowering Agents with Imagination for Creative Tasks : Abstract: We study building embodied agents for open-ended creative tasks. While existing methods build instruction-following agents that can perform diverse open-ended tasks, none of them demonstrate...
- Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications : Abstract: Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. Prior research identifies code- and configurati...
- A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting : Abstract: Automating end-to-end data science pipeline with AI agents still stalls on two gaps: generating insightful, diverse visual evidence and assembling it into a coherent, professional report. We...
- Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis : Abstract: Evaluating the performance of various model architectures, such as transformers, large language models (LLMs), and other NLP systems, requires comprehensive benchmarks that measure performan...
- Unifying Learning Dynamics and Generalization in Transformers Scaling Law : Abstract: The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources. Yet, while empirically validated...
- StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars : Abstract: Real-time, streaming interactive avatars represent a critical yet challenging goal in digital human research. Although diffusion-based human avatar generation methods achieve remarkable succ...
- From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation : Abstract: Hit identification is a critical yet resource-intensive step in the drug discovery pipeline, traditionally relying on high-throughput screening of large compound libraries. Despite advanceme...
- LibContinual: A Comprehensive Library towards Realistic Continual Learning : Abstract: A fundamental challenge in Continual Learning (CL) is catastrophic forgetting, where adapting to new tasks degrades the performance on previous ones. While the field has evolved with diverse...
- Meta-Learning-Based Handover Management in NextG O-RAN : Abstract: While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency ba...
- LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration : Abstract: Unmanned aerial vehicles (UAVs) are crucial tools for post-disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures,...
- LVLM-Aided Alignment of Task-Specific Vision Models : Abstract: In high-stakes domains, small task-specific vision models are crucial due to their low computational requirements and the availability of numerous methods to explain their results. However, ...
- Unsupervised Anomaly Detection in Brain MRI via Disentangled Anatomy Learning : Abstract: Detection of various lesions in brain MRI is clinically critical, but challenging due to the diversity of lesions and variability in imaging conditions. Current unsupervised learning methods...
- Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model : Abstract: Aligning large language models to preference data is commonly implemented by assuming a known link function between the distribution of observed preferences and the unobserved rewards (e.g.,...
- Flexible Multitask Learning with Factorized Diffusion Policy : Abstract: Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task dis...
- MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction : Abstract: Addressing the challenge of multimodal data fusion in high-dimensional biomedical informatics, we propose MMCTOP, a MultiModal Clinical-Trial Outcome Prediction framework that integrates het...
- Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space : Abstract: Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigatio...
- Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models : Abstract: Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. Recently, a distr...
- MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting : Abstract: Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where transparent and reproducible analysis of he...
- CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics : Abstract: Cricket is the second most popular sport globally, commanding a massive following of over 2.5 billion fans globally. Enthusiasts and analysts frequently seek advanced statistical insights, s...
- Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content? : Abstract: Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential cop...
- Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation : Abstract: We propose an explainable, privacy-preserving dataset distillation framework for collaborative financial fraud detection. A trained random forest is converted into transparent, axis-aligned ...
- Balancing Accuracy and Efficiency: CNN Fusion Models for Diabetic Retinopathy Screening : Abstract: Diabetic retinopathy (DR) remains a leading cause of preventable blindness, yet large-scale screening is constrained by limited specialist availability and variable image quality across devi...
- MoonBot: Modular and On-Demand Reconfigurable Robot Toward Moon Base Construction : Abstract: The allure of lunar surface exploration and development has recently captured widespread global attention. Robots have proved to be indispensable for exploring uncharted terrains, uncovering...
- A Comedy of Estimators: On KL Regularization in RL Training of LLMs : Abstract: The reasoning performance of large language models (LLMs) can be substantially improved by training them with reinforcement learning (RL). The RL objective for LLM training involves a regula...
- HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs : Abstract: While Large Language Models (LLMs) have achieved remarkable success in cognitive and reasoning benchmarks, they exhibit a persistent deficit in anthropomorphic intelligence-the capacity to n...
- S&P 500 Stock's Movement Prediction using CNN : Abstract: This paper is about predicting the movement of stock consist of S&P 500 index. Historically there are many approaches have been tried using various methods to predict the stock movement and ...
- CellMamba: Adaptive Mamba for Accurate and Efficient Cell Detection : Abstract: Cell detection in pathological images presents unique challenges due to densely packed objects, subtle inter-class differences, and severe background clutter. In this paper, we propose CellM...
- Applications of synthetic financial data in portfolio and risk modeling : Abstract: Synthetic financial data offers a practical way to address the privacy and accessibility challenges that limit research in quantitative finance. This paper examines the use of generative mod...
- Multi-agent Adaptive Mechanism Design : Abstract: We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. ...
- Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning : Abstract: Between 2021 and 2025, the SciCap project grew from a small seed-funded idea at The Pennsylvania State University (Penn State) into one of the central efforts shaping the scientific figure-c...
- InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation : Abstract: Parameter-Efficient Fine-Tuning of Diffusion Transformers (DiTs) for diverse, multi-conditional tasks often suffers from task interference when using monolithic adapters like LoRA. The Mixtu...
- Inference-based GAN Video Generation : Abstract: Video generation has seen remarkable progresses thanks to advancements in generative deep learning. Generated videos should not only display coherent and continuous movement but also meaning...
- A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets : Abstract: Multimodal medical imaging provides complementary information that is crucial for accurate delineation of pathology, but the development of deep learning models is limited by the scarcity of...
- How Do Agents Perform Code Optimization? An Empirical Study : Abstract: Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifi...
- A Model of Causal Explanation on Neural Networks for Tabular Data : Abstract: The problem of explaining the results produced by machine learning methods continues to attract attention. Neural network (NN) models, along with gradient boosting machines, are expected to ...
- HELP: Hierarchical Embodied Language Planner for Household Tasks : Abstract: Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, ...
- An Information Theoretic Perspective on Agentic System Design : Abstract: Agentic language model (LM) systems power modern applications like "Deep Research" and "Claude Code," and leverage multi-LM architectures to overcome context limitations. Beneath their appar...
- Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities : Abstract: Space-air-ground-integrated network (SAGIN)-enabled multiconnectivity (MC) is emerging as a key enabler for next-generation networks, enabling users to simultaneously utilize multiple links ...
- CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation : Abstract: Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, ...
- Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought : Abstract: Latent tokens are gaining attention for enhancing reasoning in large language models (LLMs), yet their internal mechanisms remain unclear. This paper examines the problem from a reliability ...
- Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers : Abstract: Large language models (LLMs) can produce text that closely resembles human writing. This capability raises concerns about misuse, including disinformation and content manipulation. Detecting...
- Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech : Abstract: Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this causal pathway is key to building natural full-duplex interactive systems...
- Zero-Shot to Zero-Lies: Detecting Bengali Deepfake Audio through Transfer Learning : Abstract: The rapid growth of speech synthesis and voice conversion systems has made deepfake audio a major security concern. Bengali deepfake detection remains largely unexplored. In this work, we st...
- BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks : Abstract: Handwritten Text Recognition (HTR) is a well-established research area. In contrast, Handwritten Text Generation (HTG) is an emerging field with significant potential. This task is challengi...
- RIPCN: A Road Impedance Principal Component Network for Probabilistic Traffic Flow Forecasting : Abstract: Accurate traffic flow forecasting is crucial for intelligent transportation services such as navigation and ride-hailing. In such applications, uncertainty estimation in forecasting is impor...
- Comparative Analysis of Deep Learning Models for Perception in Autonomous Vehicles : Abstract: Recently, a plethora of machine learning (ML) and deep learning (DL) algorithms have been proposed to achieve the efficiency, safety, and reliability of autonomous vehicles (AVs). The AVs us...
- Near-Optimal Coalition Structures in Polynomial Time : Abstract: We study the classical coalition structure generation (CSG) problem and compare the anytime behavior of three algorithmic paradigms: dynamic programming (DP), MILP branch-and-bound, and spar...
- Structural Induced Exploration for Balanced and Scalable Multi-Robot Path Planning : Abstract: Multi-robot path planning is a fundamental yet challenging problem due to its combinatorial complexity and the need to balance global efficiency with fair task allocation among robots. Tradi...
- Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database : Abstract: Multimodal cardiovascular magnetic resonance (CMR) imaging provides comprehensive and non-invasive insights into cardiovascular disease (CVD) diagnosis and underlying mechanisms. Despite dec...
- Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search : Abstract: Monte Carlo Tree Search (MCTS) has profoundly influenced reinforcement learning (RL) by integrating planning and learning in tasks requiring long-horizon reasoning, exemplified by the AlphaZ...
- TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References : Abstract: Understanding natural-language references to objects in dynamic 3D driving scenes is essential for interactive autonomous systems. In practice, many referring expressions describe targets th...
- LLM-I2I: Boost Your Small Item2Item Recommendation Model with Large Language Model : Abstract: Item-to-Item (I2I) recommendation models are widely used in real-world systems due to their scalability, real-time capabilities, and high recommendation quality. Research to enhance I2I perf...
- Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models : Abstract: Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typic...
- A Unified Definition of Hallucination, Or: It's the World Model, Stupid : Abstract: Despite numerous attempts to solve the issue of hallucination since the inception of neural language models, it remains a problem in even frontier large language models today. Why is this th...
- Towards Long-window Anchoring in Vision-Language Model Distillation : Abstract: While large vision-language models (VLMs) demonstrate strong long-context understanding, their prevalent small branches fail on linguistics-photography alignment for a limited window size. W...
- Exploration of Reproducible Generated Image Detection : Abstract: While the technology for detecting AI-Generated Content (AIGC) images has advanced rapidly, the field still faces two core issues: poor reproducibility and insufficient gen eralizability, wh...
- Bidirectional Human-AI Alignment in Education for Trustworthy Learning Environments : Abstract: Artificial intelligence (AI) is transforming education, offering unprecedented opportunities to personalize learning, enhance assessment, and support educators. Yet these opportunities also ...
- Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures : Abstract: The rapid integration of generative AI into everyday life underscores the need to move beyond unidirectional alignment models that only adapt AI to human values. This workshop focuses on bid...
- Hierarchy-Aware Fine-Tuning of Vision-Language Models : Abstract: Vision-Language Models (VLMs) learn powerful multimodal representations through large-scale image-text pretraining, but adapting them to hierarchical classification is underexplored. Standar...
- Selective LLM-Guided Regularization for Enhancing Recommendation Models : Abstract: Large language models provide rich semantic priors and strong reasoning capabilities, making them promising auxiliary signals for recommendation. However, prevailing approaches either deploy...
- DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO : Abstract: Reinforcement learning (RL), particularly GRPO, improves image generation quality significantly by comparing the relative performance of images generated within the same group. However, in t...
- MotionTeller: Multi-modal Integration of Wearable Time-Series with LLMs for Health and Behavioral Understanding : Abstract: As wearable sensing becomes increasingly pervasive, a key challenge remains: how can we generate natural language summaries from raw physiological signals such as actigraphy - minute-level m...
- Oogiri-Master: Benchmarking Humor Understanding via Oogiri : Abstract: Humor is a salient testbed for human-like creative thinking in large language models (LLMs). We study humor using the Japanese creative response game Oogiri, in which participants produce wi...
- Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism : Abstract: The mixture-of-experts (MoE) architecture scales model size with sublinear computational increase but suffers from memory-intensive inference due to KV caches and sparse expert activation. R...
- GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification : Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras, which plays an important role in the pr...
- Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism : Abstract: Ground Penetrating Radar (GPR) has emerged as a pivotal tool for non-destructive evaluation of subsurface road defects. However, conventional GPR image interpretation remains heavily reliant...
- dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning : Abstract: Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticat...
- Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models : Abstract: Moral actions are judged not only by their outcomes but by the context in which they occur. We present COMETH (Contextual Organization of Moral Evaluation from Textual Human inputs), a frame...
- Teaching People LLM's Errors and Getting it Right : Abstract: People use large language models (LLMs) when they should not. This is partly because they see LLMs compose poems and answer intricate questions, so they understandably, but incorrectly, assu...
- LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors : Abstract: The rapid growth in both the scale and complexity of Android malware has driven the widespread adoption of machine learning (ML) techniques for scalable and accurate malware detection. Despi...
- Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks : Abstract: Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadow...
- AInsteinBench: Benchmarking Coding Agents on Scientific Repositories : Abstract: We introduce AInsteinBench, a large-scale benchmark for evaluating whether large language model (LLM) agents can operate as scientific computing development agents within real research softw...
- Reflection-Driven Control for Trustworthy Code Agents : Abstract: Contemporary large language model (LLM) agents are remarkably capable, but they still lack reliable safety controls and can produce unconstrained, unpredictable, and even actively harmful ou...
- Multi-Agent LLM Committees for Autonomous Software Beta Testing : Abstract: Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-a...
- CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation : Abstract: Building on the affective dream-replay reinforcement learning framework of CosmoCore, we introduce CosmoCore-Evo, an extension that incorporates evolutionary algorithms to enhance adaptabili...
- Fairness Is Not Just Ethical: Performance Trade-Off via Data Correlation Tuning to Mitigate Bias in ML Software : Abstract: Traditional software fairness research typically emphasizes ethical and social imperatives, neglecting that fairness fundamentally represents a core software quality issue arising directly f...
- Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks : Abstract: Text-to-SQL systems allow non-SQL experts to interact with relational databases using natural language. However, their tendency to generate executable SQL for ambiguous, out-of-scope, or una...
- Atomistic Simulation Guided Convolutional Neural Networks for Thermal Modeling of Friction Stir Welding : Abstract: Accurate prediction of temperature evolution is essential for understanding thermomechanical behavior in friction stir welding. In this study, molecular dynamics simulations were performed u...
- EcoNet: Multiagent Planning and Control Of Household Energy Resources Using Active Inference : Abstract: Advances in automated systems afford new opportunities for intelligent management of energy at household, local area, and utility scales. Home Energy Management Systems (HEMS) can play a rol...
- Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks : Abstract: Neural network pruning is widely used to reduce model size and computational cost. Yet, most existing methods treat sparsity as an externally imposed constraint, enforced through heuristic i...
- SpatialBench: Can Agents Analyze Real-World Spatial Biology Data? : Abstract: Spatial transcriptomics assays are rapidly increasing in scale and complexity, making computational analysis a major bottleneck in biological discovery. Although frontier AI agents have impr...
- Accelerating Scientific Discovery with Autonomous Goal-evolving Agents : Abstract: There has been unprecedented interest in developing agents that expand the boundary of scientific discovery, primarily by optimizing quantitative objective functions specified by scientists....
- Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets : Abstract: Generative Artificial Intelligence (GAI) has experienced exponential growth in recent years, partly facilitated by the abundance of large-scale open-source datasets. These datasets are often...
- Towards Responsible and Explainable AI Agents with Consensus-Driven Reasoning : Abstract: Agentic AI represents a major shift in how autonomous systems reason, plan, and execute multi-step tasks through the coordination of Large Language Models (LLMs), Vision Language Models (VLM...
- Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing : Abstract: This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of $M$...
- Democratizing Drug Discovery with an Orchestrated, Knowledge-Driven Multi-Agent Team for User-Guided Therapeutic Design : Abstract: Therapeutic discovery remains a formidable challenge, impeded by the fragmentation of specialized domains and the execution gap between computational design and physiological validation. Alt...
- AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design : Abstract: In this paper, we propose AMS-IO-Agent, a domain-specialized LLM-based agent for structure-aware input/output (I/O) subsystem generation in analog and mixed-signal (AMS) integrated circuits ...
- A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning : Abstract: With the rapid growth of large language models (LLMs) and vision-language models (VLMs) in medicine, simply integrating clinical text and medical imaging does not guarantee reliable reasonin...
- NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent : Abstract: We present the development and optimization of PayPal's Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Thro...
- Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model : Abstract: Existing approaches typically rely on fixed length penalties, but such penalties are hard to tune and fail to adapt to the evolving reasoning abilities of LLMs, leading to suboptimal trade-o...
- LogicLens: Visual-Logical Co-Reasoning for Text-Centric Forgery Analysis : Abstract: Sophisticated text-centric forgeries, fueled by rapid AIGC advancements, pose a significant threat to societal security and information authenticity. Current methods for text-centric forgery...
- Three-way decision with incomplete information based on similarity and satisfiability : Abstract: Three-way decision is widely applied with rough set theory to learn classification or decision rules. The approaches dealing with complete information are well established in the literature,...
- Feasible strategies in three-way conflict analysis with three-valued ratings : Abstract: Most existing work on three-way conflict analysis has focused on trisecting agent pairs, agents, or issues, which contributes to understanding the nature of conflicts but falls short in addr...
- Three-way conflict analysis based on alliance and conflict functions : Abstract: Trisecting agents, issues, and agent pairs are essential topics of three-way conflict analysis. They have been commonly studied based on either a rating or an auxiliary function. A rating fu...
- A Study of Solving Life-and-Death Problems in Go Using Relevance-Zone Based Solvers : Abstract: This paper analyzes the behavior of solving Life-and-Death (L&D) problems in the game of Go using current state-of-the-art computer Go solvers with two techniques: the Relevance-Zone Based S...
- From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration : Abstract: Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenge...
Research Sources: 264 | Generated: 12/29/2025
