AI Research News Feeds for November 19th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

IBGS: Image-Based Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) has recently emerged as a fast, high-quality method for novel view synthesis (NVS). However, its use of low-degree spherical harmonics limits its ability to capt...
Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection : Abstract: Infrared unmanned aerial vehicle (UAV) target images often suffer from motion blur degradation caused by rapid sensor movement, significantly reducing contrast between target and background....
A Quantitative Method for Shoulder Presentation Evaluation in Biometric Identity Documents : Abstract: International standards for biometric identity documents mandate strict compliance with pose requirements, including the square presentation of a subject's shoulders. However, the literature...
Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition : Abstract: Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse...
BEDLAM2.0: Synthetic Humans and Cameras in Motion : Abstract: Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human ...
Stage Aware Diagnosis of Diabetic Retinopathy via Ordinal Regression : Abstract: Diabetic Retinopathy (DR) has emerged as a major cause of preventable blindness in recent times. With timely screening and intervention, the condition can be prevented from causing irreversi...
Language as an Anchor: Preserving Relative Visual Geometry for Domain Incremental Learning : Abstract: A key challenge in Domain Incremental Learning (DIL) is to continually learn under shifting distributions while preserving knowledge from previous domains. Existing methods face a fundamenta...
Cranio-ID: Graph-Based Craniofacial Identification via Automatic Landmark Annotation in 2D Multi-View X-rays : Abstract: In forensic craniofacial identification and in many biomedical applications, craniometric landmarks are important. Traditional methods for locating landmarks are time-consuming and require s...
Learning to See Through a Baby's Eyes: Early Visual Diets Enable Robust Visual Intelligence in Humans and Machines : Abstract: Newborns perceive the world with low-acuity, color-degraded, and temporally continuous vision, which gradually sharpens as infants develop. To explore the ecological advantages of such stage...
DIR-TIR: Dialog-Iterative Refinement for Text-to-Image Retrieval : Abstract: This paper addresses the task of interactive, conversational text-to-image retrieval. Our DIR-TIR framework progressively refines the target image search through two specialized modules: t...
CompEvent: Complex-valued Event-RGB Fusion for Low-light Video Enhancement and Deblurring : Abstract: Low-light video deblurring poses significant challenges in applications like nighttime surveillance and autonomous driving due to dim lighting and long exposures. While event cameras offer p...
Learning Subglacial Bed Topography from Sparse Radar with Physics-Guided Residuals : Abstract: Accurate subglacial bed topography is essential for ice sheet modeling, yet radar observations are sparse and uneven. We propose a physics-guided residual learning framework that predicts be...
2D Gaussians Spatial Transport for Point-supervised Density Regression : Abstract: This paper introduces Gaussian Spatial Transport (GST), a novel framework that leverages Gaussian splatting to facilitate transport from the probability measure in the image coordinate space...
Segmentation-Aware Latent Diffusion for Satellite Image Super-Resolution: Enabling Smallholder Farm Boundary Delineation : Abstract: Delineating farm boundaries through segmentation of satellite images is a fundamental step in many agricultural applications. The task is particularly challenging for smallholder farms, wher...
Enhancing End-to-End Autonomous Driving with Risk Semantic Distillaion from VLM : Abstract: The autonomous driving (AD) system has exhibited remarkable performance in complex driving scenarios. However, generalization is still a key limitation for the current system, which refers t...
Parameter Aware Mamba Model for Multi-task Dense Prediction : Abstract: Understanding the inter-relations and interactions between tasks is crucial for multi-task dense prediction. Existing methods predominantly utilize convolutional layers and attention mechani...
D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images : Abstract: Low Dose Computed Tomography (LDCT) is widely used as an imaging solution to aid diagnosis and other clinical tasks. However, this comes at the price of a deterioration in image quality due ...
A Generative Data Framework with Authentic Supervision for Underwater Image Restoration and Enhancement : Abstract: Underwater image restoration and enhancement are crucial for correcting color distortion and restoring image details, thereby establishing a fundamental basis for subsequent underwater visua...
Learning Compact Latent Space for Representing Neural Signed Distance Functions with High-fidelity Geometry Details : Abstract: Neural signed distance functions (SDFs) have been a vital representation to represent 3D shapes or scenes with neural networks. An SDF is an implicit function that can query signed distances...
Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction : Abstract: This paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors. We follow the trend of dynamic 3D ...
Explaining Digital Pathology Models via Clustering Activations : Abstract: We present a clustering-based explainability technique for digital pathology models based on convolutional neural networks. Unlike commonly used methods based on saliency maps, such as occlu...
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models : Abstract: Omnimodal large language models (OmniLLMs) have attracted increasing research attention of late towards unified audio-video understanding, wherein processing audio-video token sequences crea...
XAttn-BMD: Multimodal Deep Learning with Cross-Attention for Femoral Neck Bone Mineral Density Estimation : Abstract: Poor bone health is a significant public health concern, and low bone mineral density (BMD) leads to an increased fracture risk, a key feature of osteoporosis. We present XAttn-BMD (Cross-At...
3D-Guided Scalable Flow Matching for Generating Volumetric Tissue Spatial Transcriptomics from Serial Histology : Abstract: A scalable and robust 3D tissue transcriptomics profile can enable a holistic understanding of tissue organization and provide deeper insights into human biology and disease. Most predictive...
Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap : Abstract: Falls are a leading cause of injury and loss of independence among older adults. Vision-based fall prediction systems offer a non-invasive solution to anticipate falls seconds before impact,...
SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction : Abstract: Recent advances in optimizing Gaussian Splatting for scene geometry have enabled efficient reconstruction of detailed surfaces from images. However, when input views are sparse, such optimiz...
SLAM-AGS: Slide-Label Aware Multi-Task Pretraining Using Adaptive Gradient Surgery in Computational Cytology : Abstract: Computational cytology faces two major challenges: i) instance-level labels are unreliable and prohibitively costly to obtain, ii) witness rates are extremely low. We propose SLAM-AGS, a Sli...
RepAir: A Framework for Airway Segmentation and Discontinuity Correction in CT : Abstract: Accurate airway segmentation from chest computed tomography (CT) scans is essential for quantitative lung analysis, yet manual annotation is impractical and many automated U-Net-based method...
FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation : Abstract: The quadratic time and memory complexity of the attention mechanism in modern Transformer based video generators makes end-to-end training for ultra high resolution videos prohibitively expe...
Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model : Abstract: Standard Latent Diffusion Models rely on a complex, three-part architecture consisting of a separate encoder, decoder, and diffusion network, which are trained in multiple stages. This modul...
A Neural Field-Based Approach for View Computation & Data Exploration in 3D Urban Environments : Abstract: Despite the growing availability of 3D urban datasets, extracting insights remains challenging due to computational bottlenecks and the complexity of interacting with data. In fact, the intr...
Vision Large Language Models Are Good Noise Handlers in Engagement Analysis : Abstract: Engagement recognition in video datasets, unlike traditional image classification tasks, is particularly challenged by subjective labels and noise limiting model performance. To overcome the...
Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers : Abstract: We propose Confidence-Guided Token Merging (Co-Me), an acceleration mechanism for visual geometric transformers without retraining or finetuning the base model. Co-Me distilled a light-weigh...
UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning : Abstract: We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understanding, generation and editing. Building upon UniGen, we comprehensively enhance the model a...
PoCGM: Poisson-Conditioned Generative Model for Sparse-View CT Reconstruction : Abstract: In computed tomography (CT), reducing the number of projection views is an effective strategy to lower radiation exposure and/or improve temporal resolution. However, this often results in s...
ELiC: Efficient LiDAR Geometry Compression via Cross-Bit-depth Feature Propagation and Bag-of-Encoders : Abstract: Hierarchical LiDAR geometry compression encodes voxel occupancies from low to high bit-depths, yet prior methods treat each depth independently and re-estimate local context from coordinates...
RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action : Abstract: Household tidying is an important application area, yet current benchmarks neither model user preferences nor support mobility, and they generalize poorly, making it hard to comprehensively ...
MindCross: Fast New Subject Adaptation with Limited Data for Cross-subject Video Reconstruction from Brain Signals : Abstract: Reconstructing video from brain signals is an important brain decoding task. Existing brain decoding frameworks are primarily built on a subject-dependent paradigm, which requires large amou...
On the Topological Foundation of Learning and Memory : Abstract: We propose a formal foundation for cognition rooted in algebraic topology, built on a Homological Parity Principle. This posits that even-dimensional homology represents stable Structure/Con...
MoReFun: Past-Movement Guided Motion Representation Learning for Future Motion Prediction and Understanding : Abstract: 3D human motion prediction aims to generate coherent future motions from observed sequences, yet existing end-to-end regression frameworks often fail to capture complex dynamics and tend to ...
LED: Light Enhanced Depth Estimation at Night : Abstract: Nighttime camera-based depth estimation is a highly challenging task, especially for autonomous driving applications, where accurate depth perception is essential for ensuring safe navigatio...
Deep Learning and Machine Learning -- Object Detection and Semantic Segmentation: From Theory to Applications : Abstract: An in-depth exploration of object detection and semantic segmentation is provided, combining theoretical foundations with practical applications. State-of-the-art advancements in machine lea...
MAVias: Mitigate any Visual Bias : Abstract: Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of prede...
RelTopo: Multi-Level Relational Modeling for Driving Scene Topology Reasoning : Abstract: Accurate road topology reasoning is critical for autonomous driving, as it requires both perceiving road elements and understanding how lanes connect to each other (L2L) and to traffic eleme...
Real-Time Sign Language to text Translation using Deep Learning: A Comparative study of LSTM and 3D CNN : Abstract: This study investigates the performance of 3D Convolutional Neural Networks (3D CNNs) and Long Short-Term Memory (LSTM) networks for real-time American Sign Language (ASL) recognition. Thoug...
SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames : Abstract: For high-level geo-spatial applications and intelligent robotics, accurate global pose information is of crucial importance. Map-aided localization is a universal approach to overcome the li...
A Specialized Large Language Model for Clinical Reasoning and Diagnosis in Rare Diseases : Abstract: Rare diseases affect hundreds of millions worldwide, yet diagnosis often spans years. Convectional pipelines decouple noisy evidence extraction from downstream inferential diagnosis, and gen...
Graded strength of comparative illusions is explained by Bayesian inference : Abstract: Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More studen...
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models : Abstract: Annotation bias in NLP datasets remains a major challenge for developing multilingual Large Language Models (LLMs), particularly in culturally diverse settings. Bias from task framing, annot...
Streamlining Industrial Contract Management with Retrieval-Augmented LLMs : Abstract: Contract management involves reviewing and negotiating provisions, individual clauses that define rights, obligations, and terms of agreement. During this process, revisions to provisions ar...
Quadratic Term Correction on Heaps' Law : Abstract: Heaps' or Herdan's law characterizes the word-type vs. word-token relation by a power-law function, which is concave in linear-linear scale but a straight line in log-log scale. However, it ...
SMRC: Aligning Large Language Models with Student Reasoning for Mathematical Error Correction : Abstract: Large language models (LLMs) often make reasoning errors when solving mathematical problems, and how to automatically detect and correct these errors has become an important research directi...
Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries : Abstract: Large Language Models have demonstrated the ability to generalize well at many levels across domains, modalities, and even shown in-context learning capabilities. This enables research quest...
Talk, Snap, Complain: Validation-Aware Multimodal Expert Framework for Fine-Grained Customer Grievances : Abstract: Existing approaches to complaint analysis largely rely on unimodal, short-form content such as tweets or product reviews. This work advances the field by leveraging multimodal, multi-turn cu...
Subword Tokenization Strategies for Kurdish Word Embeddings : Abstract: We investigate tokenization strategies for Kurdish word embeddings by comparing word-level, morpheme-based, and BPE approaches on morphological similarity preservation tasks. We develop a Bi...
Strategic Innovation Management in the Age of Large Language Models Market Intelligence, Adaptive R&D, and Ethical Governance : Abstract: This study analyzes the multiple functions of Large Language Models (LLMs) in transforming research and development (R&D) processes. By automating knowledge discovery, boosting hypothesis cr...
Rdgai: Classifying transcriptional changes using Large Language Models with a test case from an Arabic Gospel tradition : Abstract: Application of phylogenetic methods to textual traditions has traditionally treated all changes as equivalent even though it is widely recognized that certain types of variants were more lik...
Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation : Abstract: Language models generate functionally correct code that tends toward excessive verbosity, with elaborate documentation and defensive patterns that diverge from human baselines. Two prompting...
SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature : Abstract: The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmen...
Linguistic Structure from a Bottleneck on Sequential Information Processing : Abstract: Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systemati...
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance : Abstract: Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whethe...
Surprisingly Fragile: Assessing and Addressing Prompt Instability in Multimodal Foundation Models : Abstract: Multimodal foundation models (MFMs) such as OFASys show the potential to unlock analysis of complex data such as images, videos, and audio data via text prompts alone. However, their perform...
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum : Abstract: There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP f...
Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application : Abstract: With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. A...
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark : Abstract: While Chain-of-Thought (CoT) prompting enables sophisticated symbolic reasoning in LLMs, it remains confined to discrete text and cannot simulate the continuous, physics-governed dynamics of...
RSPose: Ranking Based Losses for Human Pose Estimation : Abstract: While heatmap-based human pose estimation methods have shown strong performance, they suffer from three main problems: (P1) "Commonly used Mean Squared Error (MSE)" Loss may not always impro...
Segmenting Collision Sound Sources in Egocentric Videos : Abstract: Humans excel at multisensory perception and can often recognise object properties from the sound of their interactions. Inspired by this, we propose the novel task of Collision Sound Source ...
GRLoc: Geometric Representation Regression for Visual Localization : Abstract: Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a que...
QwenCLIP: Boosting Medical Vision-Language Pretraining via LLM Embeddings and Prompt tuning : Abstract: Contrastive Language-Image Pretraining (CLIP) has demonstrated strong generalization for vision-language tasks in computer vision and medical domains, yet its text encoder accepts only up to...
VLMs Guided Interpretable Decision Making for Autonomous Driving : Abstract: Recent advancements in autonomous driving (AD) have explored the use of vision-language models (VLMs) within visual question answering (VQA) frameworks for direct driving decision-making. Ho...
Revisiting Data Scaling Law for Medical Segmentation : Abstract: The population loss of trained deep neural networks often exhibits power law scaling with the size of the training dataset, guiding significant performance advancements in deep learning appl...
Weakly Supervised Ephemeral Gully Detection In Remote Sensing Images Using Vision Language Models : Abstract: Among soil erosion problems, Ephemeral Gullies are one of the most concerning phenomena occurring in agricultural fields. Their short temporal cycles increase the difficulty in automatically...
Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors : Abstract: Temporal realism remains a central weakness of current generative video models, as most evaluation metrics prioritize spatial appearance and offer limited sensitivity to motion. We introduce...
SAE-MCVT: A Real-Time and Scalable Multi-Camera Vehicle Tracking Framework Powered by Edge Computing : Abstract: In modern Intelligent Transportation Systems (ITS), cameras are a key component due to their ability to provide valuable information for multiple stakeholders. A central task is Multi-Camera...
Mind the Gap: Evaluating LLM Understanding of Human-Taught Road Safety Principles : Abstract: Following road safety norms is non-negotiable not only for humans but also for the AI systems that govern autonomous vehicles. In this work, we evaluate how well multi-modal large language m...
Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding : Abstract: Chain-of-Thought (CoT) prompting has recently shown significant promise across various NLP and computer vision tasks by explicitly generating intermediate reasoning steps. However, we find t...
Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets : Abstract: We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, vali...
Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers : Abstract: Transformers show remarkable versatility across domains, suggesting the existence of inductive biases beneficial across modalities. In this work, we explore a new way to instil such generic ...
Learning Skill-Attributes for Transferable Assessment in Video : Abstract: Skill assessment from video entails rating the quality of a person's physical performance and explaining what could be done better. Today's models specialize for an individual sport, and suf...
CD-DPE: Dual-Prompt Expert Network based on Convolutional Dictionary Feature Decoupling for Multi-Contrast MRI Super-Resolution : Abstract: Multi-contrast magnetic resonance imaging (MRI) super-resolution intends to reconstruct high-resolution (HR) images from low-resolution (LR) scans by leveraging structural information presen...
RISE: Single Static Radar-based Indoor Scene Understanding : Abstract: Robust and privacy-preserving indoor scene understanding remains a fundamental open problem. While optical sensors such as RGB and LiDAR offer high spatial fidelity, they suffer from severe ...
LINGUAL: Language-INtegrated GUidance in Active Learning for Medical Image Segmentation : Abstract: Although active learning (AL) in segmentation tasks enables experts to annotate selected regions of interest (ROIs) instead of entire images, it remains highly challenging, labor-intensive, ...
FashionMAC: Deformation-Free Fashion Image Generation with Fine-Grained Model Appearance Customization : Abstract: Garment-centric fashion image generation aims to synthesize realistic and controllable human models dressing a given garment, which has attracted growing interest due to its practical applic...
Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping : Abstract: Flood prediction is critical for emergency planning and response to mitigate human and economic losses. Traditional physics-based hydrodynamic models generate high-resolution flood maps usin...
Saliency-Guided Deep Learning for Bridge Defect Detection in Drone Imagery : Abstract: Anomaly object detection and classification are one of the main challenging tasks in computer vision and pattern recognition. In this paper, we propose a new method to automatically detect, ...
Semantic Context Matters: Improving Conditioning for Autoregressive Models : Abstract: Recently, autoregressive (AR) models have shown strong potential in image generation, offering better scalability and easier integration with unified multi-modal systems compared to diffusio...
CORE: Compact Object-centric REpresentations as a New Paradigm for Token Merging in LVLMs : Abstract: Large Vision-Language Models (LVLMs) usually suffer from prohibitive computational and memory costs due to the quadratic growth of visual tokens with image resolution. Existing token compres...
SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts : Abstract: Cross-view object Geo-localization aims to precisely pinpoint the same object across large-scale satellite imagery based on drone images. Due to significant differences in viewpoint and scal...
BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-tailed Recognition : Abstract: For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier ve...
Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations : Abstract: Text-driven video editing enables users to modify video content only using text queries. While existing methods can modify video content if explicit descriptions of editing targets with prec...
RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment : Abstract: Depth information is crucial for autonomous driving and intelligent robot navigation. The simplicity and flexibility of self-supervised monocular depth estimation are conducive to its role i...
$A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors : Abstract: Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Op...
Coffee: Controllable Diffusion Fine-tuning : Abstract: Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided dat...
Attention Via Convolutional Nearest Neighbors : Abstract: The shift from Convolutional Neural Networks to Transformers has reshaped computer vision, yet these two architectural families are typically viewed as fundamentally distinct. We argue that ...
iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion : Abstract: Recent trends in SLAM and visual navigation have embraced 3D Gaussians as the preferred scene representation, highlighting the importance of estimating camera poses from a single image using...
Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion : Abstract: We present Wave-Former, a novel method capable of high-accuracy 3D shape reconstruction for completely occluded, diverse, everyday objects. This capability can open new applications spanning...
Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing : Abstract: Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation than unimodal FAS when deployed in unseen domai...
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs : Abstract: Evaluating the robustness of Large Vision-Language Models (LVLMs) is essential for their continued development and responsible deployment in real-world applications. However, existing robust...
DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition : Abstract: Existing self-supervised contrastive learning methods for skeleton-based action recognition often process all skeleton regions uniformly, and adopt a first-in-first-out (FIFO) queue to store...
UniSER: A Foundation Model for Unified Soft Effects Removal : Abstract: Digital images are often degraded by soft effects such as lens flare, haze, shadows, and reflections, which reduce aesthetics even though the underlying pixels remain partially visible. The ...
GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation : Abstract: Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent repres...
PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation : Abstract: Most existing autonomous-driving datasets (e.g., KITTI, nuScenes, and the Waymo Perception Dataset), collected by human-driving mode or unidentified driving mode, can only serve as early tra...
Hierarchical Semantic Learning for Multi-Class Aorta Segmentation : Abstract: The aorta, the body's largest artery, is prone to pathologies such as dissection, aneurysm, and atherosclerosis, which often require timely intervention. Minimally invasive repairs involving...
Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision : Abstract: High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends ...
InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior : Abstract: Video inverse problems are fundamental to streaming, telepresence, and AR/VR, where high perceptual quality must coexist with tight latency constraints. Diffusion-based priors currently deli...
Measurement-Constrained Sampling for Text-Prompted Blind Face Restoration : Abstract: Blind face restoration (BFR) may correspond to multiple plausible high-quality (HQ) reconstructions under extremely low-quality (LQ) inputs. However, existing methods typically produce deter...
StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model : Abstract: This paper focuses on the task of speech-driven 3D facial animation, which aims to generate realistic and synchronized facial motions driven by speech inputs.Recent methods have employed aud...
Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction : Abstract: Forecasting 3D human motion is an important embodiment of fine-grained understanding and cognition of human behavior by artificial agents. Current approaches excessively rely on implicit net...
V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization : Abstract: Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-...
ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation : Abstract: With the rapid advancement of generative models, powerful image editing methods now enable diverse and highly realistic image manipulations that far surpass traditional deepfake techniques, ...
Gaussian Splatting-based Low-Rank Tensor Representation for Multi-Dimensional Image Recovery : Abstract: Tensor singular value decomposition (t-SVD) is a promising tool for multi-dimensional image representation, which decomposes a multi-dimensional image into a latent tensor and an accompanyin...
Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation : Abstract: Text-to-3D generation has advanced rapidly, yet state-of-the-art models, encompassing both optimization-based and feed-forward architectures, still face two fundamental limitations. First, t...
Free Lunch to Meet the Gap: Intermediate Domain Reconstruction for Cross-Domain Few-Shot Learning : Abstract: Cross-Domain Few-Shot Learning (CDFSL) endeavors to transfer generalized knowledge from the source domain to target domains using only a minimal amount of training data, which faces a triple...
NeuralBoneReg: A Novel Self-Supervised Method for Robust and Accurate Multi-Modal Bone Surface Registration : Abstract: In computer- and robot-assisted orthopedic surgery (CAOS), patient-specific surgical plans derived from preoperative imaging define target locations and implant trajectories. During surgery,...
Iterative Diffusion-Refined Neural Attenuation Fields for Multi-Source Stationary CT Reconstruction: NAF Meets Diffusion Model : Abstract: Multi-source stationary computed tomography (CT) has recently attracted attention for its ability to achieve rapid image reconstruction, making it suitable for time-sensitive clinical and in...
Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs : Abstract: Intraoral 3D reconstruction is fundamental to digital orthodontics, yet conventional methods like intraoral scanning are inaccessible for remote tele-orthodontics, which typically relies on ...
Step by Step Network : Abstract: Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual co...
ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding : Abstract: A structured understanding of intraoral 3D scans is essential for digital orthodontics. However, existing deep-learning approaches rely heavily on modality-specific training, large annotated...
Silhouette-to-Contour Registration: Aligning Intraoral Scan Models with Cephalometric Radiographs : Abstract: Reliable 3D-2D alignment between intraoral scan (IOS) models and lateral cephalometric radiographs is critical for orthodontic diagnosis, yet conventional intensity-driven registration metho...
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries : Abstract: The proliferation of hour-long videos (e.g., lectures, podcasts, documentaries) has intensified demand for efficient content structuring. However, existing approaches are constrained by smal...
ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels : Abstract: Inter-GPU communication has become a major bottleneck for modern AI workloads as models scale and improvements in hardware compute throughput outpace improvements in interconnect bandwidth. ...
Single Tensor Cell Segmentation using Scalar Field Representations : Abstract: We investigate image segmentation of cells under the lens of scalar fields. Our goal is to learn a continuous scalar field on image domains such that its segmentation produces robust instanc...
EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation : Abstract: Purpose: Echocardiographic interpretation requires video-level reasoning and guideline-based measurement analysis, which current deep learning models for cardiac ultrasound do not support. W...
A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition : Abstract: Human emotions are difficult to convey through words and are often abstracted in the process; however, electroencephalogram (EEG) signals can offer a more direct lens into emotional brain ac...
Splat Regression Models : Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, ea...
The CHASM-SWPC Dataset for Coronal Hole Detection & Analysis : Abstract: Coronal holes (CHs) are low-activity, low-density solar coronal regions with open magnetic field lines (Cranmer 2009). In the extreme ultraviolet (EUV) spectrum, CHs appear as dark patches. ...
Wasserstein Distributionally Robust Nash Equilibrium Seeking with Heterogeneous Data: A Lagrangian Approach : Abstract: We study a class of distributionally robust games where agents are allowed to heterogeneously choose their risk aversion with respect to distributional shifts of the uncertainty. In our form...
LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering : Abstract: Log anomaly detection, which is critical for identifying system failures and preempting security breaches, detects irregular patterns within large volumes of log data, and impacts domains su...
Dynamic Black-box Backdoor Attacks on IoT Sensory Data : Abstract: Sensor data-based recognition systems are widely used in various applications, such as gait-based authentication and human activity recognition (HAR). Modern wearable and smart devices featu...
A Patient-Independent Neonatal Seizure Prediction Model Using Reduced Montage EEG and ECG : Abstract: Neonates are highly susceptible to seizures, often leading to short or long-term neurological impairments. However, clinical manifestations of neonatal seizures are subtle and often lead to ...
10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training : Abstract: Training large language models (LLMs) in the cloud faces growing memory bottlenecks due to the limited capacity and high cost of GPUs. While GPU memory offloading to CPU and NVMe has made la...
MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification : Abstract: Fine-grained identification of IDS-flagged suspicious traffic is crucial in cybersecurity. In practice, cyber threats evolve continuously, making the discovery of novel malicious traffic a c...
From Graphs to Hypergraphs: Enhancing Aspect-Based Sentiment Analysis via Multi-Level Relational Modeling : Abstract: Aspect-Based Sentiment Analysis (ABSA) predicts sentiment polarity for specific aspect terms, a task made difficult by conflicting sentiments across aspects and the sparse context of short t...
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation : Abstract: We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case w...
Imaging with super-resolution in changing random media : Abstract: We develop an imaging algorithm that exploits strong scattering to achieve super-resolution in changing random media. The method processes large and diverse array datasets using sparse dicti...
Causal Discovery on Higher-Order Interactions : Abstract: Causal discovery combines data with knowledge provided by experts to learn the DAG representing the causal relationships between a given set of variables. When data are scarce, bagging is us...
Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization : Abstract: The emergence of foundation models has substantially advanced zero-shot generalization in monocular depth estimation (MDE), as exemplified by the Depth Anything series. However, given access...
Count The Notes: Histogram-Based Supervision for Automatic Music Transcription : Abstract: Automatic Music Transcription (AMT) converts audio recordings into symbolic musical representations. Training deep neural networks (DNNs) for AMT typically requires strongly aligned training...
Statistically controllable microstructure reconstruction framework for heterogeneous materials using sliced-Wasserstein metric and neural networks : Abstract: Heterogeneous porous materials play a crucial role in various engineering systems. Microstructure characterization and reconstruction provide effective means for modeling these materials, wh...
NeuralSSD: A Neural Solver for Signed Distance Surface Reconstruction : Abstract: We proposed a generalized method, NeuralSSD, for reconstructing a 3D implicit surface from the widely-available point cloud data. NeuralSSD is a solver-based on the neural Galerkin method, a...
Segmentwise Pruning in Audio-Language Models : Abstract: Recent audio-language models have shown impressive performance across a wide range of audio tasks and are increasingly capable of handling long audio inputs. However, the computing costs in ...
Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion : Abstract: Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks introduced through poisoned data, which implant hidden behavi...
Audio Question Answering with GRPO-Based Fine-Tuning and Calibrated Segment-Level Predictions : Abstract: In this report, we describe our submission to Track 5 of the DCASE 2025 Challenge for the task of Audio Question Answering(AQA). Our system leverages the SSL backbone BEATs to extract frame-...
O3SLM: Open Weight, Open Data, and Open Vocabulary Sketch-Language Model : Abstract: While Large Vision Language Models (LVLMs) are increasingly deployed in real-world applications, their ability to interpret abstract visual inputs remains limited. Specifically, they struggl...
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning : Abstract: Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisenso...
Skewness-Robust Causal Discovery in Location-Scale Noise Models : Abstract: To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ i...
Gradient-Based Join Ordering : Abstract: Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query exec...
Improved Convergence in Parameter-Agnostic Error Feedback through Momentum : Abstract: Communication compression is essential for scalable distributed training of modern machine learning models, but it often degrades convergence due to the noise it introduces. Error Feedback (...
DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation : Abstract: Existing video Variational Autoencoders (VAEs) generally overlook the similarity between frame contents, leading to redundant latent modeling. In this paper, we propose decoupled VAE (DeCo-V...
DeepBlip: Estimating Conditional Average Treatment Effects Over Time : Abstract: Structural nested mean models (SNMMs) are a principled approach to estimate the treatment effects over time. A particular strength of SNMMs is to break the joint effect of treatment sequence...
ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection : Abstract: Deepfakes generated by advanced GANs and autoencoders severely threaten information integrity and societal stability. Single-stream CNNs fail to capture multi-scale forgery artifacts across ...
Online learning of subgrid-scale models for quasi-geostrophic turbulence in planetary interiors : Abstract: The use of machine learning to represent subgrid-scale (SGS) dynamics is now well established in weather forecasting and climate modelling. Recent advances have demonstrated that SGS models ...
Bridging Human and Model Perspectives: A Comparative Analysis of Political Bias Detection in News Media Using Large Language Models : Abstract: Detecting political bias in news media is a complex task that requires interpreting subtle linguistic and contextual cues. Although recent advances in Natural Language Processing (NLP) have ...
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning : Abstract: Reinforcement Learning (RL) has become critical for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase,...
Doppler Invariant CNN for Signal Classification : Abstract: Radio spectrum monitoring in contested environments motivates the need for reliable automatic signal classification technology. Prior work highlights deep learning as a promising approach, b...
Derivative of the truncated singular value and eigen decomposition : Abstract: Recently developed applications in the field of machine learning and computational physics rely on automatic differentiation techniques, that require stable and efficient linear algebra grad...
HyMAD: A Hybrid Multi-Activity Detection Approach for Border Surveillance and Monitoring : Abstract: Seismic sensing has emerged as a promising solution for border surveillance and monitoring; the seismic sensors that are often buried underground are small and cannot be noticed easily, maki...
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization : Abstract: We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by ...
Robust Verification of Controllers under State Uncertainty via Hamilton-Jacobi Reachability Analysis : Abstract: As perception-based controllers for autonomous systems become increasingly popular in the real world, it is important that we can formally verify their safety and performance despite percept...
Equivariant neural networks and equivarification : Abstract: Equivariant neural networks are a class of neural networks designed to preserve symmetries inherent in the data. In this paper, we introduce a general method for modifying a neural network t...
High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers : Abstract: Adversarial attacks pose a major challenge to distributed learning systems, prompting the development of numerous robust learning methods. However, most existing approaches suffer from the c...
Improved Sample Complexity Bounds for Diffusion Model Training : Abstract: Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a num...
Bayes optimal learning of attention-indexed models : Abstract: We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs ...
Beyond Correlation: Causal Multi-View Unsupervised Feature Selection Learning : Abstract: Multi-view unsupervised feature selection (MUFS) has recently received increasing attention for its promising ability in dimensionality reduction on multi-view unlabeled data. Existing MUFS ...
Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities : Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approxima...
Iterative Explainability for Weakly Supervised Segmentation in Medical PE Detection : Abstract: Pulmonary Embolism (PE) are a leading cause of cardiovascular death. Computed tomographic pulmonary angiography (CTPA) is the gold standard for PE diagnosis, with growing interest in AI-base...
Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks : Abstract: Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochasti...
Concentration inequalities for semidefinite least squares based on data : Abstract: We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are ...
Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning : Abstract: Open-domain visual entity recognition aims to identify and link entities depicted in images to a vast and evolving set of real-world concepts, such as those found in Wikidata. Unlike convent...
Hint-Augmented Re-ranking: Efficient Product Search using LLM-Based Query Decomposition : Abstract: Search queries with superlatives (e.g., best, most popular) require comparing candidates across multiple dimensions, demanding linguistic understanding and domain knowledge. We show that LLM...
HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection : Abstract: Recent advancements in multimodal out-of-context (OOC) misinformation detection have made remarkable progress in checking the consistencies between different modalities for supporting or ref...
Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement : Abstract: Multi-label sentiment classification plays a vital role in natural language processing by detecting multiple emotions within a single text. However, existing datasets like GoEmotions often s...
Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT : Abstract: Reasoning-augmented Vision-Language Models (RVLMs) rely on safety alignment to prevent harmful behavior, yet their exposed chain-of-thought (CoT) traces introduce new attack surfaces. In thi...
Applying Relation Extraction and Graph Matching to Answering Multiple Choice Questions : Abstract: In this research, we combine Transformer-based relation extraction with matching of knowledge graphs (KGs) and apply them to answering multiple-choice questions (MCQs) while maintaining the ...
Harnessing Deep LLM Participation for Robust Entity Linking : Abstract: Entity Linking (EL), the task of mapping textual entity mentions to their corresponding entries in knowledge bases, constitutes a fundamental component of natural language understanding. Rec...
MuCPT: Music-related Natural Language Model Continued Pretraining : Abstract: Large language models perform strongly on general tasks but remain constrained in specialized settings such as music, particularly in the music-entertainment domain, where corpus scale, puri...
Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning : Abstract: The automatic movie dubbing model generates vivid speech from given scripts, replicating a speaker's timbre from a brief timbre prompt while ensuring lip-sync with the silent video. Existing...
AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR : Abstract: Recent advances in speech-enabled AI, including Google's NotebookLM and OpenAI's speech-to-speech API, are driving widespread interest in voice interfaces globally. Despite this momentum, th...
Entropy-Guided Reasoning Compression : Abstract: Large reasoning models have demonstrated remarkable performance on complex reasoning tasks, yet the excessive length of their chain-of-thought outputs remains a major practical bottleneck du...
Don't Miss the Forest for the Trees: In-Depth Confidence Estimation for LLMs via Reasoning over the Answer Space : Abstract: Knowing the reliability of a model's response is essential in application. With the strong generation capabilities of LLMs, research has focused on generating verbalized confidence. This is ...
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions : Abstract: Instruction-following is a critical capability of Large Language Models (LLMs). While existing works primarily focus on assessing how well LLMs adhere to user instructions, they often overlo...
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning : Abstract: The rapid advancement of Large Language Models (LLMs) has led to performance saturation on many established benchmarks, questioning their ability to distinguish frontier models. Concurrently...
Mitigating Label Length Bias in Large Language Models : Abstract: Large language models (LLMs) are powerful zero- and few-shot learners. However, when predicting over a set of candidate options, LLMs suffer from label biases, and existing calibration metho...
Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education : Abstract: Large Language Models (LLMs) are increasingly integrated into educational applications. However, they remain vulnerable to jailbreak and fine-tuning attacks, which can compromise safety alig...
MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents : Abstract: Recent advances in medical large language models (LLMs), multimodal models, and agents demand evaluation frameworks that reflect real clinical workflows and safety constraints. We present Me...
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning : Abstract: Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Lea...
LiveRAG: A diverse Q&A dataset with varying difficulty level for RAG evaluation : Abstract: With Retrieval Augmented Generation (RAG) becoming more and more prominent in generative AI solutions, there is an emerging need for systematically evaluating their effectiveness. We introdu...
Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource Languages : Abstract: High quality summarization data remains scarce in under-represented languages. However, historical newspapers, made available through recent digitization efforts, offer an abundant source of...
Efficient Decoding Methods for Language Models on Encrypted Data : Abstract: Large language models (LLMs) power modern AI applications, but processing sensitive data on untrusted servers raises privacy concerns. Homomorphic encryption (HE) enables computation on encr...
Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs : Abstract: Retrosynthesis planning aims to decompose target molecules into available building blocks, forming a synthetic tree where each internal node represents an intermediate compound and each leaf...
Resilient by Design -- Active Inference for Distributed Continuum Intelligence : Abstract: Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computi...
Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis : Abstract: This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within comple...
Tele-LLM-Hub: Building Context-Aware Multi-Agent LLM Systems for Telecom Networks : Abstract: This paper introduces Tele-LLM-Hub, a user friendly low-code solution for rapid prototyping and deployment of context aware multi-agent (MA) Large Language Model (LLM) systems tailored for 5...
MedBuild AI: An Agent-Based Hybrid Intelligence Framework for Reshaping Agency in Healthcare Infrastructure Planning through Generative Design for Medical Architecture : Abstract: Globally, disparities in healthcare infrastructure remain stark, leaving countless communities without access to even basic services. Traditional infrastructure planning is often slow and in...
Embedding Explainable AI in NHS Clinical Safety: The Explainability-Enabled Clinical Safety Framework (ECSF) : Abstract: Artificial intelligence (AI) is increasingly embedded in NHS workflows, but its probabilistic and adaptive behaviour conflicts with the deterministic assumptions underpinning existing clinic...
A Meta-Heuristic Load Balancer for Cloud Computing Systems : Abstract: This paper presents a strategy to allocate services on a Cloud system without overloading nodes and maintaining the system stability with minimum cost. We specify an abstract model of cloud ...
VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization : Abstract: The widespread reliance on open-source software dramatically increases the risk of vulnerability exploitation, underscoring the need for effective and scalable vulnerability detection (VD). ...
Dimension vs. Precision: A Comparative Analysis of Autoencoders and Quantization for Efficient Vector Retrieval on BEIR SciFact : Abstract: Dense retrieval models have become a standard for state-of-the-art information retrieval. However, their high-dimensional, high-precision (float32) vector embeddings create significant stora...
Physics-Informed Neural Networks for Nonlinear Output Regulation : Abstract: This work addresses the full-information output regulation problem for nonlinear systems, assuming the states of both the plant and the exosystem are known. In this setting, perfect tracking...
Extended Physics Informed Neural Network for Hyperbolic Two-Phase Flow in Porous Media : Abstract: The accurate solution of nonlinear hyperbolic partial differential equations (PDEs) remains a central challenge in computational science due to the presence of steep gradients, discontinuiti...
Blurred Encoding for Trajectory Representation Learning : Abstract: Trajectory representation learning (TRL) maps trajectories to vector embeddings and facilitates tasks such as trajectory classification and similarity search. State-of-the-art (SOTA) TRL met...
Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels : Abstract: The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization. Achieving near-peak performan...
Compiling to linear neurons : Abstract: We don't program neural networks directly. Instead, we rely on an indirect style where learning algorithms, like gradient descent, determine a neural network's function by learning from data...
Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture : Abstract: This paper presents a mathematical interpretation of self-attention by connecting it to distributional semantics principles. We show that self-attention emerges from projecting corpus-level ...
Exploring Transferability of Self-Supervised Learning by Task Conflict Calibration : Abstract: In this paper, we explore the transferability of SSL by addressing two central questions: (i) what is the representation transferability of SSL, and (ii) how can we effectively model this tr...
Beat the long tail: Distribution-Aware Speculative Decoding for RL Training : Abstract: Reinforcement learning(RL) post-training has become essential for aligning large language models (LLMs), yet its efficiency is increasingly constrained by the rollout phase, where long traje...
AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection : Abstract: This paper studies the problem of class-incremental learning (CIL), a core setting within continual learning where a model learns a sequence of tasks, each containing a distinct set of class...
Tractable Probabilistic Models for Investment Planning : Abstract: Investment planning in power utilities, such as generation and transmission expansion, requires decade-long forecasts under profound uncertainty. Forecasting of energy mix and energy use dec...
Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis : Abstract: In differentially private (DP) tabular data synthesis, the consensus is that statistical models are better than neural network (NN)-based methods. However, we argue that this conclusion is i...
Weather Maps as Tokens: Transformers for Renewable Energy Forecasting : Abstract: Accurate renewable energy forecasting is essential to reduce dependence on fossil fuels and enabling grid decarbonization. However, current approaches fail to effectively integrate the rich ...
Complex-Weighted Convolutional Networks: Provable Expressiveness via Complex Diffusion : Abstract: Graph Neural Networks (GNNs) have achieved remarkable success across diverse applications, yet they remain limited by oversmoothing and poor performance on heterophilic graphs. To address th...
The Impact of Bootstrap Sampling Rate on Random Forest Performance in Regression Tasks : Abstract: Random Forests (RFs) typically train each tree on a bootstrap sample of the same size as the training set, i.e., bootstrap rate (BR) equals 1.0. We systematically examine how varying BR from...
Efficient reconstruction of multidimensional random field models with heterogeneous data using stochastic neural networks : Abstract: In this paper, we analyze the scalability of a recent Wasserstein-distance approach for training stochastic neural networks (SNNs) to reconstruct multidimensional random field models. We pro...
On the Gradient Complexity of Private Optimization with Private Oracles : Abstract: We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting ...
Certified but Fooled! Breaking Certified Defences with Ghost Certificates : Abstract: Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisio...
SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics : Abstract: Small and medium-sized enterprises (SMEs) represent 99.9% of U.S. businesses yet remain systematically excluded from AI due to a mismatch between their operational scale and modern machine l...
Meta-SimGNN: Adaptive and Robust WiFi Localization Across Dynamic Configurations and Diverse Scenarios : Abstract: To promote the practicality of deep learning-based localization, existing studies aim to address the issue of scenario dependence through meta-learning. However, these studies primarily focu...
Observational Auditing of Label Privacy : Abstract: Differential privacy (DP) auditing is essential for evaluating privacy guarantees in machine learning systems. Existing auditing methods, however, pose a significant challenge for large-scal...
MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts : Abstract: The immense memory requirements of state-of-the-art Mixture-of-Experts (MoE) models present a significant challenge for inference, often exceeding the capacity of a single accelerator. While...
Synthetic Survival Control: Extending Synthetic Controls for "When-If" Decision : Abstract: Estimating causal effects on time-to-event outcomes from observational data is particularly challenging due to censoring, limited sample sizes, and non-random treatment assignment. The need ...
A Comprehensive Study of Implicit and Explicit Biases in Large Language Models : Abstract: Large Language Models (LLMs) inherit explicit and implicit biases from their training datasets. Identifying and mitigating biases in LLMs is crucial to ensure fair outputs, as they can perpe...
N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator : Abstract: Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches a...
EBind: a practical approach to space binding : Abstract: We simplify space binding by focusing on two core components, a single encoder per modality and high-quality data; enabling training state-of-the-art models on a single GPU in a few hours as...
Algebraformer: A Neural Approach to Linear Systems : Abstract: Recent work in deep learning has opened new possibilities for solving classical algorithmic tasks using end-to-end learned models. In this work, we investigate the fundamental task of solvin...
Unified Multimodal Vessel Trajectory Prediction with Explainable Navigation Intention : Abstract: Vessel trajectory prediction is fundamental to intelligent maritime systems. Within this domain, short-term prediction of rapid behavioral changes in complex maritime environments has establ...
Intervention Efficiency and Perturbation Validation Framework: Capacity-Aware and Robust Clinical Model Selection under the Rashomon Effect : Abstract: In clinical machine learning, the coexistence of multiple models with comparable performance -- a manifestation of the Rashomon Effect -- poses fundamental challenges for trustworthy deploym...
Learning with Statistical Equality Constraints : Abstract: As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to ...
Enforcing hidden physics in physics-informed neural networks : Abstract: Physics-informed neural networks (PINNs) represent a new paradigm for solving partial differential equations (PDEs) by integrating physical laws into the learning process of neural networks....
Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation : Abstract: Large models adaptation through Federated Learning (FL) addresses a wide range of use cases and is enabled by Parameter-Efficient Fine-Tuning techniques such as Low-Rank Adaptation (LoRA). H...
Toward Robust and Harmonious Adaptation for Cross-modal Retrieval : Abstract: Recently, the general-to-customized paradigm has emerged as the dominant approach for Cross-Modal Retrieval (CMR), which reconciles the distribution shift problem between the source domain a...
FlowRoI A Fast Optical Flow Driven Region of Interest Extraction Framework for High-Throughput Image Compression in Immune Cell Migration Analysis : Abstract: Autonomous migration is essential for the function of immune cells such as neutrophils and plays a pivotal role in diverse diseases. Recently, we introduced ComplexEye, a multi-lens array mi...
Nonparametric estimation of conditional probability distributions using a generative approach based on conditional push-forward neural networks : Abstract: We introduce conditional push-forward neural networks (CPFN), a generative framework for conditional distribution estimation. Instead of directly modeling the conditional density $f_{Y|X}$, ...
Notes on Kernel Methods in Machine Learning : Abstract: These notes provide a self-contained introduction to kernel methods and their geometric foundations in machine learning. Starting from the construction of Hilbert spaces, we develop the theo...
CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design : Abstract: The growth of million-token LLMs exposes the scalability limits of inference systems, where the KVCache dominates memory usage and data transfer overhead. Recent offloading systems migrate t...
Full Atom Peptide Design via Riemannian Euclidean Bayesian Flow Networks : Abstract: Diffusion and flow matching models have recently emerged as promising approaches for peptide binder design. Despite their progress, these models still face two major challenges. First, categ...
Mind the Gaps: Measuring Visual Artifacts in Dimensionality Reduction : Abstract: Dimensionality Reduction (DR) techniques are commonly used for the visual exploration and analysis of high-dimensional data due to their ability to project datasets of high-dimensional point...
Task Addition and Weight Disentanglement in Closed-Vocabulary Models : Abstract: Task arithmetic has recently emerged as a promising method for editing pre-trained \textit{open-vocabulary} models, offering a cost-effective alternative to standard multi-task fine-tuning. ...
Machine Learning Models for Predicting Smoking-Related Health Decline and Disease Risk : Abstract: Smoking continues to be a major preventable cause of death worldwide, affecting millions through damage to the heart, metabolism, liver, and kidneys. However, current medical screening metho...
AdamHD: Decoupled Huber Decay Regularization for Language Model Pre-Training : Abstract: Adaptive optimizers with decoupled weight decay, such as AdamW, are the de facto standard for pre-training large transformer-based generative models. Yet the quadratic nature of the $\ell_2$...
LAUD: Integrating Large Language Models with Active Learning for Unlabeled Data : Abstract: Large language models (LLMs) have shown a remarkable ability to generalize beyond their pre-training data, and fine-tuning LLMs can elevate performance to human-level and beyond. However, in...
Beyond Means: A Dynamic Framework for Predicting Customer Satisfaction : Abstract: Online ratings influence customer decision-making, yet standard aggregation methods, such as the sample mean, fail to adapt to quality changes over time and ignore review heterogeneity (e.g....
Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge : Abstract: Deep learning's rise since the early 2010s has transformed fields like computer vision and natural language processing and strongly influenced biomedical research. For drug discovery specifi...
Look-Ahead Reasoning on Learning Platforms : Abstract: On many learning platforms, the optimization criteria guiding model training reflect the priorities of the designer rather than those of the individuals they affect. Consequently, users may ...
SparseST: Exploiting Data Sparsity in Spatiotemporal Modeling and Prediction : Abstract: Spatiotemporal data mining (STDM) has a wide range of applications in various complex physical systems (CPS), i.e., transportation, manufacturing, healthcare, etc. Among all the proposed met...
$\pi^{*}_{0.6}$: a VLA That Learns From Experience : Abstract: We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corre...
Feature weighting for data analysis via evolutionary simulation : Abstract: We analyze an algorithm for assigning weights prior to scalarization in discrete multi-objective problems arising from data analysis. The algorithm evolves the weights (the relevance of feat...
GegenbauerNet: Finding the Optimal Compromise in the GNN Flexibility-Stability Trade-off : Abstract: Spectral Graph Neural Networks (GNNs) operating in the canonical [-1, 1] domain (like ChebyNet and its adaptive generalization, L-JacobiNet) face a fundamental Flexibility-Stability Trade-of...
Principled Coarse-Grained Acceptance for Speculative Decoding in Speech : Abstract: Speculative decoding accelerates autoregressive speech generation by letting a fast draft model propose tokens that a larger target model verifies. However, for speech LLMs that generate aco...
THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations : Abstract: Large-scale pre-trained models hold significant potential for learning universal EEG representations. However, most existing methods, particularly autoregressive (AR) frameworks, primarily r...
A Deep Learning Density Shaping Model Predictive Gust Load Alleviation Control of a Compliant Wing Subjected to Atmospheric Turbulence : Abstract: This study presents a novel deep learning approach aimed at enhancing stochastic Gust Load Alleviation (GLA) specifically for compliant wings. The approach incorporates the concept of smooth...
Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants : Abstract: We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically,...
CellStream: Dynamical Optimal Transport Informed Embeddings for Reconstructing Cellular Trajectories from Snapshots Data : Abstract: Single-cell RNA sequencing (scRNA-seq), especially temporally resolved datasets, enables genome-wide profiling of gene expression dynamics at single-cell resolution across discrete time poin...
Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora : Abstract: A classifier using byte n-grams as features is the only approach we have found fast enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency (sub 10 ms) for deploym...
QUASAR: An Evolutionary Algorithm to Accelerate High-Dimensional Optimization : Abstract: High-dimensional numerical optimization presents a persistent challenge. This paper introduces Quasi-Adaptive Search with Asymptotic Reinitialization (QUASAR), an evolutionary algorithm to a...
Uni-Hema: Unified Model for Digital Hematopathology : Abstract: Digital hematopathology requires cell-level analysis across diverse disease categories, including malignant disorders (e.g., leukemia), infectious conditions (e.g., malaria), and non-maligna...
A Disentangled Low-Rank RNN Framework for Uncovering Neural Connectivity and Dynamics : Abstract: Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity i...
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands : Abstract: Despite recent progress in predicting biomarker trajectories from real clinical data, uncertainty in the predictions poses high-stakes risks (e.g., misdiagnosis) that limit their clinical de...
Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar : Abstract: Real-time imaging sonar has become an important tool for underwater monitoring in environments where optical sensing is unreliable. Its broader use is constrained by two coupled challenges: ...
Empirical Likelihood for Random Forests and Ensembles : Abstract: We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting...
Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning : Abstract: Deep neural networks achieve outstanding performance in visual recognition tasks, yet their large number of parameters makes them less practical for real-world applications. Recently, one-sh...
GEN3D: Generating Domain-Free 3D Scenes from a Single Image : Abstract: Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. Additionally, 3D scene generation is vital for adv...
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models : Abstract: We present AraLingBench: a fully human annotated benchmark for evaluating the Arabic linguistic competence of large language models (LLMs). The benchmark spans five core categories: grammar,...
SAM-Fed: SAM-Guided Federated Semi-Supervised Learning for Medical Image Segmentation : Abstract: Medical image segmentation is clinically important, yet data privacy and the cost of expert annotation limit the availability of labeled data. Federated semi-supervised learning (FSSL) offer...
H-LDM: Hierarchical Latent Diffusion Models for Controllable and Interpretable PCG Synthesis from Clinical Metadata : Abstract: Phonocardiogram (PCG) analysis is vital for cardiovascular disease diagnosis, yet the scarcity of labeled pathological data hinders the capability of AI systems. To bridge this, we introduce...
LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices : Abstract: With the rise in sedentary behavior, health problems caused by poor sitting posture have drawn increasing attention. Most existing methods, whether using invasive sensors or computer vision,...
Going Places: Place Recognition in Artificial and Natural Systems : Abstract: Place recognition, the ability to identify previously visited locations, is critical for both biological navigation and autonomous systems. This review synthesizes findings from robotic syst...
Clinically-Validated Innovative Mobile Application for Assessing Blinking and Eyelid Movements : Abstract: Blinking is a vital physiological process that protects and maintains the health of the ocular surface. Objective assessment of eyelid movements remains challenging due to the complexity, co...
The Tokenization Bottleneck: How Vocabulary Extension Improves Chemistry Representation Learning in Pretrained Language Models : Abstract: The application of large language models (LLMs) to chemistry is frequently hampered by a "tokenization bottleneck", where tokenizers tuned on general-domain text tend to fragment chemical re...
Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving : Abstract: Though deep neural models adopted to realize the perception of autonomous driving have proven vulnerable to adversarial examples, known attacks often leverage 2D patches and target mostly mo...
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning : Abstract: Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embo...
Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection : Abstract: In decentralized machine learning paradigms such as Split Federated Learning (SFL) and its variant U-shaped SFL, the server's capabilities are severely restricted. Although this enhances cli...
MiAD: Mirage Atom Diffusion for De Novo Crystal Generation : Abstract: In recent years, diffusion-based models have demonstrated exceptional performance in searching for simultaneously stable, unique, and novel (S.U.N.) crystalline materials. However, most of t...
Context-aware, Ante-hoc Explanations of Driving Behaviour : Abstract: Autonomous vehicles (AVs) must be both safe and trustworthy to gain social acceptance and become a viable option for everyday public transportation. Explanations about the system behaviour c...
Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems : Abstract: Assuring the safety and trustworthiness of autonomous systems is particularly difficult when learning-enabled components and open environments are involved. Formal methods provide strong gua...
Tell Me: An LLM-powered Mental Well-being Assistant with RAG, Synthetic Dialogue Generation, and Agentic Planning : Abstract: We present Tell Me, a mental well-being system that leverages advances in large language models to provide accessible, context-aware support for users and researchers. The system integrates ...
Agentic Video Intelligence: A Flexible Framework for Advanced Video Exploration and Understanding : Abstract: Video understanding requires not only visual recognition but also complex reasoning. While Vision-Language Models (VLMs) demonstrate impressive capabilities, they typically process videos la...
Hybrid Modeling of Photoplethysmography for Non-invasive Monitoring of Cardiovascular Parameters : Abstract: Continuous cardiovascular monitoring can play a key role in precision health. However, some fundamental cardiac biomarkers of interest, including stroke volume and cardiac output, require in...
Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning : Abstract: Federated learning (FL) is a new paradigm for training machine learning (ML) models without sharing data. While applying FL in cross-silo scenarios, where organizations collaborate, it is ne...
Effective Diversification of Multi-Carousel Book Recommendation : Abstract: Using multiple carousels, lists that wrap around and can be scrolled, is the basis for offering content in most contemporary movie streaming platforms. Carousels allow for highlighting diffe...
nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers : Abstract: Mechanistic interpretability research requires reliable tools for analyzing transformer internals across diverse architectures. Current approaches face a fundamental tradeoff: custom impleme...
Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges : Abstract: Agentic AI systems have recently emerged as a critical and transformative approach in artificial intelligence, offering capabilities that extend far beyond traditional AI agents and contempo...
Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching : Abstract: Time series generation is critical for a wide range of applications, which greatly supports downstream analytical and decision-making tasks. However, the inherent temporal heterogeneous indu...
IMSE: Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention : Abstract: Achieving a balance between lightweight design and high performance remains a significant challenge for speech enhancement (SE) tasks on resource-constrained devices. Existing state-of-the-a...
MissHDD: Hybrid Deterministic Diffusion for Hetrogeneous Incomplete Data Imputation : Abstract: Incomplete data are common in real-world tabular applications, where numerical, categorical, and discrete attributes coexist within a single dataset. This heterogeneous structure presents si...
DecNefLab: A Modular and Interpretable Simulation Framework for Decoded Neurofeedback : Abstract: Decoded Neurofeedback (DecNef) is a flourishing non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in Dec...
Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models : Abstract: Deep generative models are rapidly advancing structure-based drug design, offering substantial promise for generating small molecule ligands that bind to specific protein targets. However, m...
Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language : Abstract: Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. Thi...
Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak : Abstract: Document-level claim extraction remains an open challenge in the field of fact-checking, and subsequently, methods for evaluating extracted claims have received limited attention. In this wo...
SweeperBot: Making 3D Browsing Accessible through View Analysis and Visual Question Answering : Abstract: Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3...
ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents : Abstract: Enabling agents to learn from experience and generalize across diverse tasks without task-specific training remains a fundamental challenge in reinforcement learning and decision-making. Whi...
Deep Learning-Based Regional White Matter Hyperintensity Mapping as a Robust Biomarker for Alzheimer's Disease : Abstract: White matter hyperintensities (WMH) are key imaging markers in cognitive aging, Alzheimer's disease (AD), and related dementias. Although automated methods for WMH segmentation have advanced...
Biased Minds Meet Biased AI: How Class Imbalance Shapes Appropriate Reliance and Interacts with Human Base Rate Neglect : Abstract: Humans increasingly interact with artificial intelligence (AI) in decision-making. However, both AI and humans are prone to biases. While AI and human biases have been studied extensively in...
Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks : Abstract: Vision-Language Models (VLMs) show great promise for autonomous driving, but their suitability for safety-critical scenarios is largely unexplored, raising safety concerns. This issue arises...
CCSD: Cross-Modal Compositional Self-Distillation for Robust Brain Tumor Segmentation with Missing Modalities : Abstract: The accurate segmentation of brain tumors from multi-modal MRI is critical for clinical diagnosis and treatment planning. While integrating complementary information from various MRI sequenc...
MRI Embeddings Complement Clinical Predictors for Cognitive Decline Modeling in Alzheimer's Disease Cohorts : Abstract: Accurate modeling of cognitive decline in Alzheimer's disease is essential for early stratification and personalized management. While tabular predictors provide robust markers of global ris...
A Method for Characterizing Disease Progression from Acute Kidney Injury to Chronic Kidney Disease : Abstract: Patients with acute kidney injury (AKI) are at high risk of developing chronic kidney disease (CKD), but identifying those at greatest risk remains challenging. We used electronic health rec...
Expert-Guided POMDP Learning for Data-Efficient Modeling in Healthcare : Abstract: Learning the parameters of Partially Observable Markov Decision Processes (POMDPs) from limited data is a significant challenge. We introduce the Fuzzy MAP EM algorithm, a novel approach tha...
Active Matter as a framework for living systems-inspired Robophysics : Abstract: Robophysics investigates the physical principles that govern living-like robots operating in complex, realworld environments. Despite remarkable technological advances, robots continue to fa...
Failure to Mix: Large language models struggle to answer according to desired probability distributions : Abstract: Scientific idea generation and selection requires exploration following a target probability distribution. In contrast, current AI benchmarks have objectively correct answers, and training l...
Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities : Abstract: We show that multi-agent systems guided by vision-language models (VLMs) improve end-to-end autonomous scientific discovery. By treating plots as verifiable checkpoints, a VLM-as-a-judge eva...
Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting : Abstract: In multivariate time series forecasting (MTSF), accurately modeling the intricate dependencies among multiple variables remains a significant challenge due to the inherent limitations of tra...
Improving segmentation of retinal arteries and veins using cardiac signal in doppler holograms : Abstract: Doppler holography is an emerging retinal imaging technique that captures the dynamic behavior of blood flow with high temporal resolution, enabling quantitative assessment of retinal hemody...
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards : Abstract: Vision--language--action (VLA) models have recently shown promising performance on a variety of embodied tasks, yet they still fall short in reliability and generalization, especially when d...
Ground Truth Generation for Multilingual Historical NLP using LLMs : Abstract: Historical and low-resource NLP remains challenging due to limited annotated data and domain mismatches with modern, web-sourced corpora. This paper outlines our work in using large language...
Impact of Image Resolution on Age Estimation with DeepFace and InsightFace : Abstract: Automatic age estimation is widely used for age verification, where input images often vary considerably in resolution. This study evaluates the effect of image resolution on age estimation ...
Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer : Abstract: Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transform...
Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models : Abstract: Trained on massive cross-species DNA corpora, DNA large language models (LLMs) learn the fundamental "grammar" and evolutionary patterns of genomic sequences. This makes them powerful priors...
Seeing Beyond the Image: ECG and Anatomical Knowledge-Guided Myocardial Scar Segmentation from Late Gadolinium-Enhanced Images : Abstract: Accurate segmentation of myocardial scar from late gadolinium enhanced (LGE) cardiac MRI is essential for evaluating tissue viability, yet remains challenging due to variable contrast and im...
\textit{FLARE}: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning : Abstract: Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzanti...
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising : Abstract: We propose an approach to enhancing synthetic video realism, which can re-render synthetic videos from a simulator in photorealistic fashion. Our realism enhancement approach is a zero-shot ...
Automated proving in planar geometry based on the complex number identity method and elimination : Abstract: We improve the complex number identity proving method to a fully automated procedure, based on elimination ideals. By using declarative equations or rewriting each real-relational hypothesis...
ARC Is a Vision Problem! : Abstract: The Abstraction and Reasoning Corpus (ARC) is designed to promote research on abstract reasoning, a fundamental aspect of human intelligence. Common approaches to ARC treat it as a language-...
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions : Abstract: Driven by the rapid advancements of Large Language Models (LLMs), LLM-based agents have emerged as powerful intelligent systems capable of human-like cognition, reasoning, and interaction. T...
Chronic Kidney Disease Prognosis Prediction Using Transformer : Abstract: Chronic Kidney Disease (CKD) affects nearly 10\% of the global population and often progresses to end-stage renal failure. Accurate prognosis prediction is vital for timely interventions and...
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO : Abstract: Multi-agent systems perform well on general reasoning tasks. However, the lack of training in specialized areas hinders their accuracy. Current training methods train a unified large languag...
SemCo: Toward Semantic Coherent Visual Relationship Forecasting : Abstract: Visual Relationship Forecasting (VRF) aims to anticipate relations among objects without observing future visual content. The task relies on capturing and modeling the semantic coherence in ...
NAIST Academic Travelogue Dataset : Abstract: We have constructed NAIST Academic Travelogue Dataset (ATD) and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million word...
Benchmark on Drug Target Interaction Modeling from a Drug Structure Perspective : Abstract: The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods...
KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer : Abstract: This paper explores the adaptation of Transformerbased models for edge devices through the quantisation and hardware acceleration of the ARM Keyword Transformer (KWT) model on a RISC-V platf...
Can Machines Think Like Humans? A Behavioral Evaluation of LLM Agents in Dictator Games : Abstract: As Large Language Model (LLM)-based agents increasingly engage with human society, how well do we understand their prosocial behaviors? We (1) investigate how LLM agents' prosocial behaviors...
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning : Abstract: A hallmark of advanced artificial intelligence is the capacity to progress from passive visual perception to the strategic modification of visual information to facilitate complex reasoning....
Higher-Order Transformers With Kronecker-Structured Attention : Abstract: Modern datasets are increasingly high-dimensional and multiway, often represented as tensor-valued data with multi-indexed variables. While Transformers excel in sequence modeling and high-d...
Automated Materials Discovery Platform Realized: Scanning Probe Microscopy of Combinatorial Libraries : Abstract: Combinatorial materials libraries provide a powerful platform for mapping how physical properties evolve across binary and ternary cross-sections of multicomponent phase diagrams. While synt...
Rethinking Token-wise Feature Caching: Accelerating Diffusion Transformers with Dual Feature Caching : Abstract: Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, f...
MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration : Abstract: The integration of Monte Carlo Tree Search (MCTS) with Large Language Models (LLMs) has demonstrated significant success in structured, problem-oriented tasks. However, applying these method...
Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement : Abstract: Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multimodal training, due to modality bias, the advantaged modality often dominates back...
Multi-Horizon Time Series Forecasting of non-parametric CDFs with Deep Lattice Networks : Abstract: Probabilistic forecasting is not only a way to add more information to a prediction of the future, but it also builds on weaknesses in point prediction. Sudden changes in a time series can s...
VitalBench: A Rigorous Multi-Center Benchmark for Long-Term Vital Sign Prediction in Intraoperative Care : Abstract: Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical ...
ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space : Abstract: Deep learning-based molecular generation models have shown great potential in efficiently exploring vast chemical spaces by generating potential drug candidates with desired properties. Howe...
Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection : Abstract: Accurate detection of offensive content on social media demands high-quality labeled data; however, such data is often scarce due to the low prevalence of offensive instances and the high co...
MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm : Abstract: Test-Time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world dep...
What happens when nanochat meets DiLoCo? : Abstract: Although LLM training is typically centralized with high-bandwidth interconnects and large compute budgets, emerging methods target communication-constrained training in distributed environm...
Gene Incremental Learning for Single-Cell Transcriptomics : Abstract: Classes, as fundamental elements of Computer Vision, have been extensively studied within incremental learning frameworks. In contrast, tokens, which play essential roles in many research fi...
PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning : Abstract: Offline imitation learning (offline IL) enables training effective policies without requiring explicit reward annotations. Recent approaches attempt to estimate rewards for unlabeled dataset...
Credal Ensemble Distillation for Uncertainty Quantification : Abstract: Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness a...
Dynamic Temperature Scheduler for Knowledge Distillation : Abstract: Knowledge Distillation (KD) trains a smaller student model using a large, pre-trained teacher model, with temperature as a key hyperparameter controlling the softness of output probabilities...
ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning : Abstract: Large Language Models (LLMs) are increasingly vulnerable to adversarial attacks that can subtly manipulate their outputs. While various defense mechanisms have been proposed, many operate as...
Can LLMs Create Legally Relevant Summaries and Analyses of Videos? : Abstract: Understanding the legally relevant factual basis of an event and conveying it through text is a key skill of legal professionals. This skill is important for preparing forms (e.g., insurance...
Known Meets Unknown: Mitigating Overconfidence in Open Set Recognition : Abstract: Open Set Recognition (OSR) requires models not only to accurately classify known classes but also to effectively reject unknown samples. However, when unknown samples are semantically simila...
Semantic Multiplexing : Abstract: Mobile devices increasingly require the parallel execution of several computing tasks offloaded at the wireless edge. Existing communication systems only support parallel transmissions at th...
Temporal Object-Aware Vision Transformer for Few-Shot Video Object Detection : Abstract: Few-shot Video Object Detection (FSVOD) addresses the challenge of detecting novel objects in videos with limited labeled examples, overcoming the constraints of traditional detection method...
Quantifying Distribution Shift in Traffic Signal Control with Histogram-Based GEH Distance : Abstract: Traffic signal control algorithms are vulnerable to distribution shift, where performance degrades under traffic conditions that differ from those seen during design or training. This paper ...
Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments : Abstract: Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially...
Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks : Abstract: Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor t...
GeoPl@ntNet: A Platform for Exploring Essential Biodiversity Variables : Abstract: This paper describes GeoPl@ntNet, an interactive web application designed to make Essential Biodiversity Variables accessible and understandable to everyone through dynamic maps and fact she...
XAI-Driven Deep Learning for Protein Sequence Functional Group Classification : Abstract: Proteins perform essential biological functions, and accurate classification of their sequences is critical for understanding structure-function relationships, enzyme mechanisms, and molecul...
Modeling Fairness in Recruitment AI via Information Flow : Abstract: Avoiding bias and understanding the real-world consequences of AI-supported decision-making are critical to address fairness and assign accountability. Existing approaches often focus either...
FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching : Abstract: Current multi-modal image fusion methods typically rely on task-specific models, leading to high training costs and limited scalability. While generative methods provide a unified modeling p...
A Trajectory-free Crash Detection Framework with Generative Approach and Segment Map Diffusion : Abstract: Real-time crash detection is essential for developing proactive safety management strategy and enhancing overall traffic efficiency. To address the limitations associated with trajectory acq...
MAT-MPNN: A Mobility-Aware Transformer-MPNN Model for Dynamic Spatiotemporal Prediction of HIV Diagnoses in California, Florida, and New England : Abstract: Human Immunodeficiency Virus (HIV) has posed a major global health challenge for decades, and forecasting HIV diagnoses continues to be a critical area of research. However, capturing the co...
Synergizing Multigrid Algorithms with Vision Transformer: A Novel Approach to Enhance the Seismic Foundation Model : Abstract: Due to the emergency and homogenization of Artificial Intelligence (AI) technology development, transformer-based foundation models have revolutionized scientific applications, such as drug ...
Passive Dementia Screening via Facial Temporal Micro-Dynamics Analysis of In-the-Wild Talking-Head Video : Abstract: We target passive dementia screening from short camera-facing talking head video, developing a facial temporal micro dynamics analysis for language free detection of early neuro cognitive ch...
GAEA: Experiences and Lessons Learned from a Country-Scale Environmental Digital Twin : Abstract: This paper describes the experiences and lessons learned after the deployment of a country-scale environmental digital twin on the island of Cyprus for three years. This digital twin, called...
ScoresActivation: A New Activation Function for Model Agnostic Global Explainability by Design : Abstract: Understanding the decision of large deep learning models is a critical challenge for building transparent and trustworthy systems. Although the current post hoc explanation methods offer val...
Randomized Controlled Trials for Phishing Triage Agent : Abstract: Security operations centers (SOCs) face a persistent challenge: efficiently triaging a high volume of user-reported phishing emails while maintaining robust protection against threats. This ...
Randomized Controlled Trials for Conditional Access Optimization Agent : Abstract: AI agents are increasingly deployed to automate complex enterprise workflows, yet evidence of their effectiveness in identity governance is limited. We report results from the first randomiz...
H-CNN-ViT: A Hierarchical Gated Attention Multi-Branch Model for Bladder Cancer Recurrence Prediction : Abstract: Bladder cancer is one of the most prevalent malignancies worldwide, with a recurrence rate of up to 78%, necessitating accurate post-operative monitoring for effective patient management. Mu...
Hybrid Convolution Neural Network Integrated with Pseudo-Newton Boosting for Lumbar Spine Degeneration Detection : Abstract: This paper proposes a new enhanced model architecture to perform classification of lumbar spine degeneration with DICOM images while using a hybrid approach, integrating EfficientNet and VGG...
Can QE-informed (Re)Translation lead to Error Correction? : Abstract: The paper presents two approaches submitted to the WMT 2025 Automated Translation Quality Evaluation Systems Task 3 - Quality Estimation (QE)-informed Segment-level Error Correction. While j...
What Works for 'Lost-in-the-Middle' in LLMs? A Study on GM-Extract and Mitigations : Abstract: The diminishing ability of large language models (LLMs) to effectively utilize long-range context-the "lost-in-the-middle" phenomenon-poses a significant challenge in retrieval-based LLM app...
Compute-in-Memory Implementation of State Space Models for Event Sequence Processing : Abstract: State space models (SSMs) have recently emerged as a powerful framework for long sequence processing, outperforming traditional methods on diverse benchmarks. Fundamentally, SSMs can general...
Preference-Based Learning in Audio Applications: A Systematic Analysis : Abstract: Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a...
Data Whitening Improves Sparse Autoencoder Learning : Abstract: Sparse autoencoders (SAEs) have emerged as a promising approach for learning interpretable features from neural network activations. However, the optimization landscape for SAE training can ...
Node-Level Uncertainty Estimation in LLM-Generated SQL : Abstract: We present a practical framework for detecting errors in LLM-generated SQL by estimating uncertainty at the level of individual nodes in the query's abstract syntax tree (AST). Our approach ...
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering : Abstract: As large language models (LLMs) evolve into sophisticated autonomous agents capable of complex software development tasks, evaluating their real-world capabilities becomes critical. While ex...
How to Marginalize in Causal Structure Learning? : Abstract: Bayesian networks (BNs) are a widely used class of probabilistic graphical models employed in numerous application domains. However, inferring the network's graphical structure from data rem...
FlakyGuard: Automatically Fixing Flaky Tests at Industry Scale : Abstract: Flaky tests that non-deterministically pass or fail waste developer time and slow release cycles. While large language models (LLMs) show promise for automatically repairing flaky tests, exi...
Can Artificial Intelligence Accelerate Technological Progress? Researchers' Perspectives on AI in Manufacturing and Materials Science : Abstract: Artificial intelligence (AI) raises expectations of substantial increases in rates of technological and scientific progress, but such anticipations are often not connected to detailed ground...
Knowledge-Grounded Agentic Large Language Models for Multi-Hazard Understanding from Reconnaissance Reports : Abstract: Post-disaster reconnaissance reports contain critical evidence for understanding multi-hazard interactions, yet their unstructured narratives make systematic knowledge transfer difficult. La...
Developing a Grounded View of AI : Abstract: As a capability coming from computation, how does AI differ fundamentally from the capabilities delivered by rule-based software program? The paper examines the behavior of artificial intell...
From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs : Abstract: Recent work has shown that fine-tuning on insecure code data can trigger an emergent misalignment (EMA) phenomenon, where models generate malicious responses even to prompts unrelated to the...
MRI Plane Orientation Detection using a Context-Aware 2.5D Model : Abstract: Humans can easily identify anatomical planes (axial, coronal, and sagittal) on a 2D MRI slice, but automated systems struggle with this task. Missing plane orientation metadata can complicat...
Keeping Code-Aware LLMs Fresh: Full Refresh, In-Context Deltas, and Incremental Fine-Tuning : Abstract: Modern codebases evolve continuously: files are renamed or deleted; public APIs drift; behavior shifts within otherwise familiar modules. A model trained yesterday to map a developer's natur...
Training-free Detection of AI-generated images via Cropping Robustness : Abstract: AI-generated image detection has become crucial with the rapid advancement of vision-generative models. Instead of training detectors tailored to specific datasets, we study a training-free ...
GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards : Abstract: Membership inference attacks (MIAs) on large language models (LLMs) pose significant privacy risks across various stages of model training. Recent advances in Reinforcement Learning with Ver...
Radial Compensation: Stable and Semantically Decoupled Generative Models on Riemannian Manifolds : Abstract: Generative models on curved spaces rely on charts to map Euclidean spaces to manifolds. Exponential maps preserve geodesics but have stiff, radius-dependent Jacobians, while volume-preservin...
A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation : Abstract: In precision sports such as archery, athletes' performance depends on both biomechanical stability and psychological resilience. Traditional motion analysis systems are often expensive and i...
CafeMed: Causal Attention Fusion Enhanced Medication Recommendation : Abstract: Medication recommendation systems play a crucial role in assisting clinicians with personalized treatment decisions. While existing approaches have made significant progress in learning medi...
CFG-EC: Error Correction Classifier-Free Guidance : Abstract: Classifier-Free Guidance (CFG) has become a mainstream approach for simultaneously improving prompt fidelity and generation quality in conditional generative models. During training, CFG sto...
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification : Abstract: Deep learning models have achieved remarkable success in medical image analysis but are fundamentally constrained by the requirement for large-scale, meticulously annotated datasets. This de...
Automated glenoid bone loss measurement and segmentation in CT scans for pre-operative planning in shoulder instability : Abstract: Reliable measurement of glenoid bone loss is essential for operative planning in shoulder instability, but current manual and semi-automated methods are time-consuming and often subject to i...
Error-Driven Scene Editing for 3D Grounding in Large Language Models : Abstract: Despite recent progress in 3D-LLMs, they remain limited in accurately grounding language to visual and spatial elements in 3D environments. This limitation stems in part from training data t...
GCA-ResUNet:Image segmentation in medical images using grouped coordinate attention : Abstract: Medical image segmentation underpins computer-aided diagnosis and therapy by supporting clinical diagnosis, preoperative planning, and disease monitoring. While U-Net style convolutional neu...
NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval : Abstract: Retrieval-augmented generation (RAG) greatly enhances large language models (LLMs) performance in knowledge-intensive tasks. However, naive RAG methods struggle with multi-hop question answe...
FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration : Abstract: All-in-One Image Restoration (AIO-IR) aims to develop a unified model that can handle multiple degradations under complex conditions. However, existing methods often rely on task-specific de...
CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer : Abstract: Vision Transformers (ViTs) have demonstrated remarkable performance across a range of computer vision tasks; however, their high computational, memory, and energy demands hinder deployment o...
Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding : Abstract: Automatic ICD coding from clinical text is a critical task in medical NLP but remains hindered by the extreme long-tail distribution of diagnostic codes. Thousands of rare and zero-shot ICD ...
Soft-Label Training Preserves Epistemic Uncertainty : Abstract: Many machine learning tasks involve inherent subjectivity, where annotators naturally provide varied labels. Standard practice collapses these label distributions into single labels, aggrega...
Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services : Abstract: Timely and accurate pre-arrival video streaming and analytics are critical for emergency medical services (EMS) to deliver life-saving interventions. Yet, current-generation EMS infrastructu...
Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models : Abstract: Pedestrian-vehicle incidents remain a critical urban safety challenge, with pedestrians accounting for over 20% of global traffic fatalities. Although existing video-based systems can detect...
Fair-GNE : Generalized Nash Equilibrium-Seeking Fairness in Multiagent Healthcare Automation : Abstract: Enforcing a fair workload allocation among multiple agents tasked to achieve an objective in learning enabled demand side healthcare worker settings is crucial for consistent and reliable pe...
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM : Abstract: Video Moment Retrieval is a task in video understanding that aims to localize a specific temporal segment in an untrimmed video based on a natural language query. Despite recent progress in ...
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models : Abstract: Vision-language-action (VLA) models have recently emerged as a powerful paradigm for building generalist robots. However, traditional VLA models that generate actions through flow matching (...
Selective Weak-to-Strong Generalization : Abstract: Future superhuman models will surpass the ability of humans and humans will only be able to \textit{weakly} supervise superhuman models. To alleviate the issue of lacking high-quality data f...
Certified Signed Graph Unlearning : Abstract: Signed graphs model complex relationships through positive and negative edges, with widespread real-world applications. Given the sensitive nature of such data, selective removal mechanisms ...
AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated substantial value in unified text-image understanding and reasoning, primarily by converting images into sequences of patch-level t...
SymLoc: Symbolic Localization of Hallucination across HaluEval and TruthfulQA : Abstract: LLMs still struggle with hallucination, especially when confronted with symbolic triggers like modifiers, negation, numbers, exceptions, and named entities. Yet, we lack a clear understandin...
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion : Abstract: Vision-Language-Action (VLA) models have demonstrated significant potential in real-world robotic manipulation. However, pre-trained VLA policies still suffer from substantial performance de...
Few-Shot Precise Event Spotting via Unified Multi-Entity Graph and Distillation : Abstract: Precise event spotting (PES) aims to recognize fine-grained events at exact moments and has become a key component of sports analytics. This task is particularly challenging due to rapid suc...
DiverseClaire: Simulating Students to Improve Introductory Programming Course Materials for All CS1 Learners : Abstract: Although CS programs are booming, introductory courses like CS1 still adopt a one-size-fits-all formats that can exacerbate cognitive load and discourage learners with autism, ADHD, dyslexia...
Multi-Scale Correlation-Aware Transformer for Maritime Vessel Re-Identification : Abstract: Maritime vessel re-identification (Re-ID) plays a crucial role in advancing maritime monitoring and intelligent situational awareness systems. However, some existing vessel Re-ID methods are...
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution : Abstract: We introduce Orion, a visual agent framework that can take in any modality and generate any modality. Using an agentic framework with multiple tool-calling capabilities, Orion is designed fo...
Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts : Abstract: Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere, necessitating probabilistic approaches to quantify uncertainty. While traditional ensemble prediction...
Parallelizing Tree Search with Twice Sequential Monte Carlo : Abstract: Model-based reinforcement learning (RL) methods that leverage search are responsible for many milestone breakthroughs in RL. Sequential Monte Carlo (SMC) recently emerged as an alternative t...
LLM-Aligned Geographic Item Tokenization for Local-Life Recommendation : Abstract: Recent advances in Large Language Models (LLMs) have enhanced text-based recommendation by enriching traditional ID-based methods with semantic generalization capabilities. Text-based method...
ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC : Abstract: Grammatical Error Correction (GEC) is an important aspect of natural language processing. Arabic has a complicated morphological and syntactic structure, posing a greater challenge than othe...
Object-Centric World Models for Causality-Aware Reinforcement Learning : Abstract: World models have been developed to support sample-efficient deep reinforcement learning agents. However, it remains challenging for world models to accurately replicate environments that ar...
Comparing Task-Agnostic Embedding Models for Tabular Data : Abstract: Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation...
Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models : Abstract: Large language models (LLMs) and vision language models (VLMs), such as DeepSeek R1,OpenAI o3, and Gemini 2.5 Pro, have demonstrated remarkable reasoning capabilities across logical inferenc...
KANGURA: Kolmogorov-Arnold Network-Based Geometry-Aware Learning with Unified Representation Attention for 3D Modeling of Complex Structures : Abstract: Microbial Fuel Cells (MFCs) offer a promising pathway for sustainable energy generation by converting organic matter into electricity through microbial processes. A key factor influencing MF...
When AI Does Science: Evaluating the Autonomous AI Scientist KOSMOS in Radiation Biology : Abstract: Agentic AI "scientists" now use language models to search the literature, run analyses, and generate hypotheses. We evaluate KOSMOS, an autonomous AI scientist, on three problems in radiatio...
Causal computations in Semi Markovian Structural Causal Models using divide and conquer : Abstract: Recently, Bjøru et al. proposed a novel divide-and-conquer algorithm for bounding counterfactual probabilities in structural causal models (SCMs). They assumed that the SCMs were learned fro...
Jailbreaking Large Vision Language Models in Intelligent Transportation Systems : Abstract: Large Vision Language Models (LVLMs) demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vu...
CORGI: Efficient Pattern Matching With Quadratic Guarantees : Abstract: Rule-based systems must solve complex matching problems within tight time constraints to be effective in real-time applications, such as planning and reactive control for AI agents, as well ...
Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios : Abstract: Training vision models to detect workplace hazards accurately requires realistic images of unsafe conditions that could lead to accidents. However, acquiring such datasets is difficult becau...
Artificial Intelligence Agents in Music Analysis: An Integrative Perspective Based on Two Use Cases : Abstract: This paper presents an integrative review and experimental validation of artificial intelligence (AI) agents applied to music analysis and education. We synthesize the historical evolution f...
ALEX:A Light Editing-knowledge Extractor : Abstract: The static nature of knowledge within Large Language Models (LLMs) makes it difficult for them to adapt to evolving information, rendering knowledge editing a critical task. However, existin...
Syn-STARTS: Synthesized START Triage Scenario Generation Framework for Scalable LLM Evaluation : Abstract: Triage is a critically important decision-making process in mass casualty incidents (MCIs) to maximize victim survival rates. While the role of AI in such situations is gaining attention for...
AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance : Abstract: AI Scientific Assistant Core (AISAC) is an integrated multi-agent system developed at Argonne National Laboratory for scientific and engineering workflows. AISAC builds on established techno...
Making Evidence Actionable in Adaptive Learning : Abstract: Adaptive learning often diagnoses precisely yet intervenes weakly, yielding help that is mistimed or misaligned. This study presents evidence supporting an instructor-governed feedback loop ...
Collaborative QA using Interacting LLMs. Impact of Network Structure, Node Capability and Distributed Data : Abstract: In this paper, we model and analyze how a network of interacting LLMs performs collaborative question-answering (CQA) in order to estimate a ground truth given a distributed set of documents...
APD-Agents: A Large Language Model-Driven Multi-Agents Collaborative Framework for Automated Page Design : Abstract: Layout design is a crucial step in developing mobile app pages. However, crafting satisfactory designs is time-intensive for designers: they need to consider which controls and content to pr...
PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval : Abstract: With the rapid progress of large language models (LLMs), financial information retrieval has become a critical industrial application. Extracting task-relevant information from lengthy finan...
Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation : Abstract: Vision-and-Language Navigation (VLN) requires an agent to dynamically explore complex 3D environments following human instructions. Recent research underscores the potential of harnessing la...
Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems : Abstract: Current agentic AI benchmarks predominantly evaluate task completion accuracy, while overlooking critical enterprise requirements such as cost-efficiency, reliability, and operational stabil...
HFL-FlowLLM: Large Language Models for Network Traffic Flow Classification in Heterogeneous Federated Learning : Abstract: In modern communication networks driven by 5G and the Internet of Things (IoT), effective network traffic flow classification is crucial for Quality of Service (QoS) management and security....
Do Large Language Models (LLMs) Understand Chronology? : Abstract: Large language models (LLMs) are increasingly used in finance and economics, where prompt-based attempts against look-ahead bias implicitly assume that models understand chronology. We test ...
Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation : Abstract: The Whisper model, an open-source automatic speech recognition system, is widely adopted for its strong performance across multilingual and zero-shot settings. However, it frequently suffers...
DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home : Abstract: Operation recommendation for IoT devices refers to generating personalized device operations for users based on their context, such as historical operations, environment information, and dev...
Enhancing Regional Airbnb Trend Forecasting Using LLM-Based Embeddings of Accessibility and Human Mobility : Abstract: The expansion of short-term rental platforms, such as Airbnb, has significantly disrupted local housing markets, often leading to increased rental prices and housing affordability issues. Ac...
PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models : Abstract: Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkab...
DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledge Retrieval, Multi-role Debating, and Multi-path Reasoning : Abstract: In today's data-driven era, fully automated end-to-end data analytics, particularly insight discovery, is critical for discovering actionable insights that assist organizations in making eff...
When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling : Abstract: One of the long-standing goals in optimisation and constraint programming is to describe a problem in natural language and automatically obtain an executable, efficient model. Large language...
Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior : Abstract: Although large language models (LLMs) are increasingly trained using human feedback for safety and alignment with human values, alignment decisions often overlook human social diversity. Thi...
A Neuro-Symbolic Framework for Reasoning under Perceptual Uncertainty: Bridging Continuous Perception and Discrete Symbolic Planning : Abstract: Bridging continuous perceptual signals and discrete symbolic reasoning is a fundamental challenge in AI systems that must operate under uncertainty. We present a neuro-symbolic framework tha...
Rate-Distortion Guided Knowledge Graph Construction from Lecture Notes Using Gromov-Wasserstein Optimal Transport : Abstract: Task-oriented knowledge graphs (KGs) enable AI-powered learning assistant systems to automatically generate high-quality multiple-choice questions (MCQs). Yet converting unstructured educati...
AutoTool: Efficient Tool Selection for Large Language Model Agents : Abstract: Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs. However, a major bottleneck i...
SkillGen: Learning Domain Skills for In-Context Sequential Decision Making : Abstract: Large language models (LLMs) are increasingly applied to sequential decision-making through in-context learning (ICL), yet their effectiveness is highly sensitive to prompt quality. Effectiv...
Heterogeneous Multi-Agent Proximal Policy Optimization for Power Distribution System Restoration : Abstract: Restoring power distribution systems (PDS) after large-scale outages requires sequential switching operations that reconfigure feeder topology and coordinate distributed energy resources (DE...
From Legacy Fortran to Portable Kokkos: An Autonomous Agentic AI Workflow : Abstract: Scientific applications continue to rely on legacy Fortran codebases originally developed for homogeneous, CPU-based systems. As High-Performance Computing (HPC) shifts toward heterogeneous ...
Signature vs. Substance: Evaluating the Balance of Adversarial Resistance and Linguistic Quality in Watermarking Large Language Models : Abstract: To mitigate the potential harms of Large Language Models (LLMs)generated text, researchers have proposed watermarking, a process of embedding detectable signals within text. With watermarkin...
Preparation Meets Opportunity: Enhancing Data Preprocessing for ML Training With Seneca : Abstract: Input data preprocessing is a common bottleneck when concurrently training multimedia machine learning (ML) models in modern systems. To alleviate these bottlenecks and reduce the training t...
AI Kill Switch for malicious web-based LLM agent : Abstract: Recently, web-based Large Language Model (LLM) agents autonomously perform increasingly complex tasks, thereby bringing significant convenience. However, they also amplify the risks of malic...
Refine Thought: A Test-Time Inference Method for Embedding Model Reasoning : Abstract: We propose RT (Refine Thought), a method that can enhance the semantic rea-soning ability of text embedding models. The method obtains the final semanticrepresentation by running multiple fo...
DualLaguerreNet: A Decoupled Spectral Filter GNN and the Uncovering of the Flexibility-Stability Trade-off : Abstract: Graph Neural Networks (GNNs) based on spectral filters, such as the Adaptive Orthogonal Polynomial Filter (AOPF) class (e.g., LaguerreNet), have shown promise in unifying the solutions for h...
Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration : Abstract: Achieving robust generalization across individuals remains a major challenge in electroencephalogram based imagined speech decoding due to substantial variability in neural activity patterns...
Review of Passenger Flow Modelling Approaches Based on a Bibliometric Analysis : Abstract: This paper presents a bibliometric analysis of the field of short-term passenger flow forecasting within local public transit, covering 814 publications that span from 1984 to 2024. In addit...
nuCarla: A nuScenes-Style Bird's-Eye View Perception Dataset for CARLA Simulation : Abstract: End-to-end (E2E) autonomous driving heavily relies on closed-loop simulation, where perception, planning, and control are jointly trained and evaluated in interactive environments. Yet, most...
Deep reinforcement learning-based spacecraft attitude control with pointing keep-out constraint : Abstract: This paper implements deep reinforcement learning (DRL) for spacecraft reorientation control with a single pointing keep-out zone. The Soft Actor-Critic (SAC) algorithm is adopted to handle ...
DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks : Abstract: Deep neural networks are known to be vulnerable to adversarial perturbations, which are small and carefully crafted inputs that lead to incorrect predictions. In this paper, we propose DeepD...
SCALEX: Scalable Concept and Latent Exploration for Diffusion Models : Abstract: Image generation models frequently encode social biases, including stereotypes tied to gender, race, and profession. Existing methods for analyzing these biases in diffusion models either fo...
Motor Imagery Classification Using Feature Fusion of Spatially Weighted Electroencephalography : Abstract: A Brain Computer Interface (BCI) connects the human brain to the outside world, providing a direct communication channel. Electroencephalography (EEG) signals are commonly used in BCIs to re...
Robustness of LLM-enabled vehicle trajectory prediction under data security threats : Abstract: The integration of large language models (LLMs) into automated driving systems has opened new possibilities for reasoning and decision-making by transforming complex driving contexts into la...

Research Sources: 453 | Generated: 11/19/2025