AI Research News Feeds for November 4th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

PROPEX-RAG: Enhanced GraphRAG using Prompt-Driven Prompt Execution : Abstract: Retrieval-Augmented Generation (RAG) has become a robust framework for enhancing Large Language Models (LLMs) with external knowledge. Recent advances in RAG have investigated graph based re...
SciTextures: Collecting and Connecting Visual Patterns, Models, and Code Across Science and Art : Abstract: The ability to connect visual patterns with the processes that form them represents one of the deepest forms of visual understanding. Textures of clouds and waves, the growth of cities and f...
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning : Abstract: The frontier of visual reasoning is shifting toward models like OpenAI o3, which can intelligently create and operate tools to transform images for problem-solving, also known as thinking-\t...
GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow : Abstract: Spatial transcriptomics (ST) technologies can be used to align transcriptomes with histopathological morphology, presenting exciting new opportunities for biomolecular discovery. Using ST da...
SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping : Abstract: Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor vis...
Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation : Abstract: Algorithmic bias in medical imaging can perpetuate health disparities, yet its causes remain poorly understood in segmentation tasks. While fairness has been extensively studied in classific...
Three-dimensional narrow volume reconstruction method with unconditional stability based on a phase-field Lagrange multiplier approach : Abstract: Reconstruction of an object from points cloud is essential in prosthetics, medical imaging, computer vision, etc. We present an effective algorithm for an Allen--Cahn-type model of reconstru...
Image-based ground distance detection for crop-residue-covered soil : Abstract: Conservation agriculture features a soil surface covered with crop residues, which brings benefits of improving soil health and saving water. However, one significant challenge in conservati...
GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations : Abstract: Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly chall...
Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars : Abstract: An autonomous vehicle can generate several terabytes of sensor data per day. A significant portion of this data consists of 3D point clouds produced by depth sensors such as LiDARs. This dat...
Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images : Abstract: Doctors and researchers routinely use diffusion tensor imaging (DTI) and tractography to visualize the fibrous structure of tissues in the human body. This paper explores the connection of t...
Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation : Abstract: Recent advances in Vision-and-Language Navigation in Continuous Environments (VLN-CE) have leveraged multimodal large language models (MLLMs) to achieve zero-shot navigation. However, existi...
LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping : Abstract: Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odomet...
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects : Abstract: A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captur...
Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis : Abstract: Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations ...
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence : Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities in cross-modal understanding and reasoning, offering new opportunities for intelligent assistive systems, yet exis...
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process : Abstract: Vision-language-action (VLA) models aim to understand natural language instructions and visual observations and to execute corresponding actions as an embodied agent. Recent work integrates ...
Coupled quasi-harmonic bases : Abstract: The use of Laplacian eigenbases has been shown to be fruitful in many computer graphics applications. Today, state-of-the-art approaches to shape analysis, synthesis, and correspondence rely...
Exploring Effective Factors for Improving Visual In-Context Learning : Abstract: The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it...
DeGMix: Efficient Multi-Task Dense Prediction with Deformable and Gating Mixer : Abstract: Convolution neural networks and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Existing studies typically employ eit...
HAT: Hybrid Attention Transformer for Image Restoration : Abstract: Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a ...
Targeted Attack Improves Protection against Unauthorized Diffusion Customization : Abstract: Diffusion models build a new milestone for image generation yet raising public concerns, for they can be fine-tuned on unauthorized images for customization. Protection based on adversarial ...
Balancing Efficiency and Quality: MoEISR for Arbitrary-Scale Image Super-Resolution : Abstract: Arbitrary-scale image super-resolution employing implicit neural functions has gained significant attention lately due to its capability to upscale images across diverse scales utilizing onl...
VRP-SAM: SAM with Visual Reference Prompt : Abstract: In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, crea...
OpenMaterial: A Large-scale Dataset of Complex Materials for 3D Reconstruction : Abstract: Recent advances in deep learning, such as neural radiance fields and implicit neural representations, have significantly advanced 3D reconstruction. However, accurately reconstructing object...
Bidirectional Regression for Monocular 6DoF Head Pose Estimation and Reference System Alignment : Abstract: Precise six-degree-of-freedom (6DoF) head pose estimation is crucial for safety-critical applications and human-computer interaction scenarios, yet existing monocular methods still struggle ...
Preliminary study on artificial intelligence methods for cybersecurity threat detection in computer networks based on raw data packets : Abstract: Most of the intrusion detection methods in computer networks are based on traffic flow characteristics. However, this approach may not fully exploit the potential of deep learning algorithms...
Scalable Autoregressive Image Generation with Mamba : Abstract: We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for lon...
ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions : Abstract: Images captured in challenging environments--such as nighttime, smoke, rainy weather, and underwater--often suffer from significant degradation, resulting in a substantial loss of visual qua...
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context : Abstract: We propose a novel approach to improve action recognition by exploiting the hierarchical organization of actions and by incorporating contextualized textual information, including location a...
Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception : Abstract: 4D content generation aims to create dynamically evolving 3D content that responds to specific input objects such as images or 3D representations. Current approaches typically incorporate ph...
Gaussian Splashing: Direct Volumetric Rendering Underwater : Abstract: In underwater images, most useful features are occluded by water. The extent of the occlusion depends on imaging geometry and can vary even across a sequence of burst images. As a result, 3D...
Epistemic Uncertainty for Generated Image Detection : Abstract: We introduce a novel framework for AI-generated image detection through epistemic uncertainty, aiming to address critical security concerns in the era of generative models. Our key insight s...
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies : Abstract: Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event d...
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error : Abstract: The rapid advancement of diffusion models has significantly improved high-quality image generation, making generated content increasingly challenging to distinguish from real images and rais...
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities : Abstract: We introduce BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model that supports text-based and image-based medical interactions. It enables multi-turn conversatio...
Multi-scale Latent Point Consistency Models for 3D Shape Generation : Abstract: Consistency Models (CMs) have significantly accelerated the sampling process in diffusion models, yielding impressive results in synthesizing high-resolution images. To explore and extend th...
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos : Abstract: This work presents Sa2VA, the first comprehensive, unified model for dense grounded understanding of both images and videos. Unlike existing multi-modal large language models, which are ofte...
BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation : Abstract: Current approaches to dichotomous image segmentation (DIS) treat image matting and object segmentation as fundamentally different tasks. As improvements in image segmentation become increasi...
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework : Abstract: Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents. However, real-world deployment faces chall...
SurGen: 1020 H&E-stained Whole Slide Images With Survival and Genetic Markers : Abstract: Cancer remains one of the leading causes of morbidity and mortality worldwide. Comprehensive datasets that combine histopathological images with genetic and survival data across various tumo...
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation : Abstract: Text-conditioned image generation has gained significant attention in recent years and are processing increasingly longer and comprehensive text prompt. In everyday life, dense and intricate...
A Racing Dataset and Baseline Model for Track Detection in Autonomous Racing : Abstract: A significant challenge in racing-related research is the lack of publicly available datasets containing raw images with corresponding annotations for the downstream task. In this paper, we ...
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review : Abstract: Recent advancements in machine learning (ML) and deep learning (DL), particularly through the introduction of Foundation Models (FMs), have significantly enhanced surgical scene understandin...
New multimodal similarity measure for image registration via modeling local functional dependence with linear combination of learned basis functions : Abstract: The deformable registration of images of different modalities, essential in many medical imaging applications, remains challenging. The main challenge is developing a robust measure for imag...
AdaSCALE: Adaptive Scaling for OOD Detection : Abstract: The ability of the deep learning model to recognize when a sample falls outside its learned distribution is critical for safe and reliable deployment. Recent state-of-the-art out-of-distribu...
LocDiff: Identifying Locations on Earth by Diffusing in the Hilbert Space : Abstract: Image geolocalization is a fundamental yet challenging task, aiming at inferring the geolocation on Earth where an image is taken. State-of-the-art methods employ either grid-based classific...
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion : Abstract: Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability stemming from two factors: 1) limited annotated im...
SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting : Abstract: In this paper, we present SonarSplat, a novel Gaussian splatting framework for imaging sonar that demonstrates realistic novel view synthesis and models acoustic streaking phenomena. Our met...
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View Imagery : Abstract: Building properties, such as height, usage, and material, play a crucial role in spatial data infrastructures, supporting various urban applications. Despite their importance, comprehensive ...
LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking : Abstract: Tissue tracking plays a critical role in various surgical navigation and extended reality (XR) applications. While current methods trained on large synthetic datasets achieve high tracking a...
Efficient Remote Sensing Change Detection with Change State Space Models : Abstract: Despite their frequent use for change detection, both ConvNets and Vision transformers (ViT) exhibit well-known limitations, namely the former struggle to model long-range dependencies while...
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding : Abstract: Recent advancements in image editing have utilized large-scale multimodal models to enable intuitive, natural instruction-driven interactions. However, conventional methods still face signif...
What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching? : Abstract: Synthetic datasets are a crucial ingredient for training stereo matching networks, but the question of what makes a stereo dataset effective remains underexplored. We investigate the design ...
Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras : Abstract: The need for accurate and non-intrusive flow measurement methods has led to the widespread adoption of Particle Image Velocimetry (PIV), a powerful diagnostic tool in fluid motion estimation...
Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID : Abstract: This work focuses on Clothes Changing Re-IDentification (CC-ReID) for the real world. Existing works perform well with high-quality (HQ) images, but struggle with low-quality (LQ) where we c...
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts : Abstract: Long videos contain a vast amount of information, making video-text retrieval an essential and challenging task in multimodal learning. However, existing benchmarks suffer from limited video...
Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds : Abstract: Regarding intelligent transportation systems, low-bitrate transmission via lossy point cloud compression is vital for facilitating real-time collaborative perception among connected agents, ...
Diffusion Classifiers Understand Compositionality, but Conditions Apply : Abstract: Understanding visual scenes is fundamental to human intelligence. While discriminative models have significantly advanced computer vision, they often struggle with compositional understandin...
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders : Abstract: We introduce the Region Encoder Network (REN), a fast and effective model for generating region-based image representations using point prompts. Recent methods combine class-agnostic segment...
Policy Optimized Text-to-Image Pipeline Design : Abstract: Text-to-image generation has evolved beyond single monolithic models to complex multi-component pipelines. These combine fine-tuned generators, adapters, upscaling blocks and even editing st...
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding : Abstract: Leveraging recent diffusion models, LiDAR-based large-scale 3D scene generation has achieved great success. While recent voxel-based approaches can generate both geometric structures and sem...
VidText: Towards Comprehensive Evaluation for Video Text Understanding : Abstract: Visual texts embedded in videos carry rich semantic information, which is crucial for both holistic video understanding and fine-grained reasoning about local human actions. However, existin...
EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models : Abstract: Text-to-image generation models~(e.g., Stable Diffusion) have achieved significant advancements, enabling the creation of high-quality and realistic images based on textual descriptions. Pro...
CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities : Abstract: Canada experienced in 2023 one of the most severe wildfire seasons in recent history, causing damage across ecosystems, destroying communities, and emitting large quantities of CO2. This ext...
Non-Contact Health Monitoring During Daily Personal Care Routines : Abstract: Remote photoplethysmography (rPPG) enables non-contact, continuous monitoring of physiological signals and offers a practical alternative to traditional health sensing methods. Although rPPG...
WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild : Abstract: Despite recent advances in sparse novel view synthesis (NVS) applied to object-centric scenes, scene-level NVS remains a challenge. A central issue is the lack of available clean multi-view ...
DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches : Abstract: Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversa...
Class Agnostic Instance-level Descriptor for Visual Instance Search : Abstract: Despite the great success of the deep features in content-based image retrieval, the visual instance search remains challenging due to the lack of effective instance-level feature representa...
Consistent Supervised-Unsupervised Alignment for Generalized Category Discovery : Abstract: Generalized Category Discovery (GCD) focuses on classifying known categories while simultaneously discovering novel categories from unlabeled data. However, previous GCD methods face challen...
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding : Abstract: Coral reefs are vital yet vulnerable ecosystems that require continuous monitoring to support conservation. While coral reef images provide essential information in coral monitoring, interpr...
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning : Abstract: Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) ...
Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification : Abstract: Multi-label image classification is a critical task in machine learning that aims to accurately assign multiple labels to a single image. While existing methods often utilize attention mecha...
Style-Aware Blending and Prototype-Based Cross-Contrast Consistency for Semi-Supervised Medical Image Segmentation : Abstract: Weak-strong consistency learning strategies are widely employed in semi-supervised medical image segmentation to train models by leveraging limited labeled data and enforcing weak-to-strong ...
Multi-Focused Video Group Activities Hashing : Abstract: With the explosive growth of video data in various complex scenarios, quickly retrieving group activities has become an urgent problem. However, many tasks can only retrieve videos focusing ...
Risk-adaptive Activation Steering for Safe Multimodal Large Language Models : Abstract: One of the key challenges of modern AI models is ensuring that they provide helpful responses to benign queries while refusing malicious ones. But often, the models are vulnerable to multimo...
Finite element-based space-time total variation-type regularization of the inverse problem in electrocardiographic imaging : Abstract: Reconstructing cardiac electrical activity from body surface electric potential measurements results in the severely ill-posed inverse problem in electrocardiography. Many different regulari...
FIPER: Factorized Features for Robust Image Super-Resolution and Compression : Abstract: In this work, we propose using a unified representation, termed Factorized Features, for low-level vision tasks, where we test on Single Image Super-Resolution (SISR) and \textbf{Image Compr...
As Good as It KAN Get: High-Fidelity Audio Representation : Abstract: Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmo...
Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration : Abstract: Glaucoma is a major cause of irreversible blindness, with significant diagnostic subjectivity. This inherent uncertainty, combined with the overconfidence of models optimized solely for accu...
Improved visual-information-driven model for crowd simulation and its modular application : Abstract: Data-driven crowd simulation models offer advantages in enhancing the accuracy and realism of simulations, and improving their generalizability is essential for promoting application. Curren...
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2 : Abstract: Manual annotation of volumetric medical images, such as magnetic resonance imaging (MRI) and computed tomography (CT), is a labor-intensive and time-consuming process. Recent advancements in...
Modality-AGnostic Image Cascade (MAGIC) for Multi-Modality Cardiac Substructure Segmentation : Abstract: Cardiac substructure delineation is emerging in treatment planning to minimize the risk of radiation-induced heart disease. Deep learning offers efficient methods to reduce contouring burden...
Anti-Aliased 2D Gaussian Splatting : Abstract: 2D Gaussian Splatting (2DGS) has recently emerged as a promising method for novel view synthesis and surface reconstruction, offering better view-consistency and geometric accuracy than volu...
Autoadaptive Medical Segment Anything Model : Abstract: Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions. Traditional, fully-supervised segmentation models rely on large amounts of labeled t...
MOSPA: Human Motion Generation Driven by Spatial Audio : Abstract: Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling ...
MIQ-SAM3D: From Single-Point Prompt to Multi-Instance Segmentation via Competitive Query Refinement : Abstract: Accurate segmentation of medical images is fundamental to tumor diagnosis and treatment planning. SAM-based interactive segmentation has gained attention for its strong generalization, but m...
Expanding the Content-Style Frontier: a Balanced Subspace Blending Approach for Content-Style LoRA Fusion : Abstract: Recent advancements in text-to-image diffusion models have significantly improved the personalization and stylization of generated images. However, previous studies have only assessed conten...
CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering : Abstract: Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent self-attention based methods struggle to effectively handle cro...
EREBUS: End-to-end Robust Event Based Underwater Simulation : Abstract: The underwater domain presents a vast array of challenges for roboticists and computer vision researchers alike, such as poor lighting conditions and high dynamic range scenes. In these adve...
SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment : Abstract: Fine-grained cross-modal alignment aims to establish precise local correspondences between vision and language, forming a cornerstone for visual question answering and related multimodal app...
Semantic BIM enrichment for firefighting assets: Fire-ART dataset and panoramic image-based 3D reconstruction : Abstract: Inventory management of firefighting assets is crucial for emergency preparedness, risk assessment, and on-site fire response. However, conventional methods are inefficient due to limited ca...
Towards One-step Causal Video Generation via Adversarial Self-Distillation : Abstract: Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising, but their sequential, iterative nature leads to error accumulation and ...
UniSOT: A Unified Framework for Multi-Modality Single Object Tracking : Abstract: Single object tracking aims to localize target object with specific reference modalities (bounding box, natural language or both) in a sequence of specific video modalities (RGB, RGB+Depth, ...
Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation : Abstract: Off-road semantic segmentation suffers from thick, inconsistent boundaries, sparse supervision for rare classes, and pervasive label noise. Designs that fuse only at low resolution blur edge...
Contrast-Guided Cross-Modal Distillation for Thermal Object Detection : Abstract: Robust perception at night remains challenging for thermal-infrared detection: low contrast and weak high-frequency cues lead to duplicate, overlapping boxes, missed small objects, and class...
Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction : Abstract: To effectively manage the wastage of perishable fruits, it is crucial to accurately predict their freshness or shelf life using non-invasive methods that rely on visual data. In this regard,...
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation : Abstract: Recent studies have identified Direct Preference Optimization (DPO) as an efficient and reward-free approach to improving video generation quality. However, existing methods largely follow i...
When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA : Abstract: Safety and reliability are essential for deploying Visual Question Answering (VQA) in surgery, where incorrect or ambiguous responses can harm the patient. Most surgical VQA research focuses...
Efficiently Training A Flat Neural Network Before It has been Quantizated : Abstract: Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook t...
HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA : Abstract: The expansion of instruction-tuning data has enabled foundation language models to exhibit improved instruction adherence and superior performance across diverse downstream tasks. Semantical...
SecDiff: Diffusion-Aided Secure Deep Joint Source-Channel Coding Against Adversarial Attacks : Abstract: Deep joint source-channel coding (JSCC) has emerged as a promising paradigm for semantic communication, delivering significant performance gains over conventional separate coding schemes. Ho...
EPAN: Robust Pedestrian Re-Identification via Enhanced Alignment Network for IoT Surveillance : Abstract: Person re-identification (ReID) plays a pivotal role in computer vision, particularly in surveillance and security applications within IoT-enabled smart environments. This study introduces t...
SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation : Abstract: Object pose estimation is a fundamental problem in robotics and computer vision, yet it remains challenging due to partial observability, occlusions, and object symmetries, which inevitably ...
Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning : Abstract: Unsupervised learning of depth and ego-motion, two fundamental 3D perception tasks, has made significant strides in recent years. However, most methods treat ego-motion as an auxiliary task,...
Luminance-Aware Statistical Quantization: Unsupervised Hierarchical Learning for Illumination Enhancement : Abstract: Low-light image enhancement (LLIE) faces persistent challenges in balancing reconstruction fidelity with cross-scenario generalization. While existing methods predominantly focus on determin...
Example-Based Feature Painting on Textures : Abstract: In this work, we propose a system that covers the complete workflow for achieving controlled authoring and editing of textures that present distinctive local characteristics. These include v...
NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation : Abstract: Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles. Simply finetuning them on the target style datasets still struggl...
Driving scenario generation and evaluation using a structured layer representation and foundational models : Abstract: Rare and challenging driving scenarios are critical for autonomous vehicle development. Since they are difficult to encounter, simulating or generating them using generative models is a popu...
PCD-ReID: Occluded Person Re-Identification for Base Station Inspection : Abstract: Occluded pedestrian re-identification (ReID) in base station environments is a critical task in computer vision, particularly for surveillance and security applications. This task faces nume...
NOA: a versatile, extensible tool for AI-based organoid analysis : Abstract: AI tools can greatly enhance the analysis of organoid microscopy images, from detection and segmentation to feature extraction and classification. However, their limited accessibility to bio...
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model : Abstract: Vision-Language-Action models (VLAs) are emerging as powerful tools for learning generalizable visuomotor control policies. However, current VLAs are mostly trained on large-scale image-text...
Generative Adversarial Synthesis and Deep Feature Discrimination of Brain Tumor MRI Images : Abstract: Compared to traditional methods, Deep Learning (DL) becomes a key technology for computer vision tasks. Synthetic data generation is an interesting use case for DL, especially in the field o...
Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation : Abstract: The unification of understanding and generation within a single multi-modal large model (MLLM) remains one significant challenge, largely due to the dichotomy between continuous and discrete...
Lite ENSAM: a lightweight cancer segmentation model for 3D Computed Tomography : Abstract: Accurate tumor size measurement is a cornerstone of evaluating cancer treatment response. The most widely adopted standard for this purpose is the Response Evaluation Criteria in Solid Tumor...
DINO-MX: A Modular & Flexible Framework for Self-Supervised Learning : Abstract: Vision Foundation Models (VFMs) have advanced representation learning through self-supervised methods. However, existing training pipelines are often inflexible, domain-specific, or computat...
Benchmark-Ready 3D Anatomical Shape Classification : Abstract: Progress in anatomical 3D shape classification is limited by the complexity of mesh data and the lack of standardized benchmarks, highlighting the need for robust learning methods and reprod...
Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers : Abstract: In the retrieval domain, candidates' fusion from heterogeneous retrievers is a long-standing challenge, particularly for complex, multi-modal data such as videos. While typical fusion techni...
Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward : Abstract: Reinforcement Learning (RL) has recently been incorporated into diffusion models, e.g., tasks such as text-to-image. However, directly applying existing RL methods to diffusion-based image r...
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback : Abstract: Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. Howeve...
Progressive Translation of H&E to IHC with Enhanced Structural Fidelity : Abstract: Compared to hematoxylin-eosin (H&E) staining, immunohistochemistry (IHC) not only maintains the structural features of tissue samples, but also provides high-resolution protein localization,...
Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond : Abstract: Under-display ToF imaging aims to achieve accurate depth sensing through a ToF camera placed beneath a screen panel. However, transparent OLED (TOLED) layers introduce severe degradations-su...
Toward Strategy Identification and Subtask Decomposition In Task Exploration : Abstract: This research builds on work in anticipatory human-machine interaction, a subfield of human-machine interaction where machines can facilitate advantageous interactions by anticipating a user...
CGF-DETR: Cross-Gated Fusion DETR for Enhanced Pneumonia Detection in Chest X-rays : Abstract: Pneumonia remains a leading cause of morbidity and mortality worldwide, necessitating accurate and efficient automated detection systems. While recent transformer-based detectors like RT-DET...
3EED: Ground Everything Everywhere in 3D : Abstract: Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platf...
HGFreNet: Hop-hybrid GraphFomer for 3D Human Pose Estimation with Trajectory Consistency in Frequency Domain : Abstract: 2D-to-3D human pose lifting is a fundamental challenge for 3D human pose estimation in monocular video, where graph convolutional networks (GCNs) and attention mechanisms have proven to be i...
Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image : Abstract: In this work, we introduce \textbf{Wonder3D++}, a novel method for efficiently generating high-fidelity textured meshes from single-view images. Recent methods based on Score Distillation Sa...
UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs : Abstract: Although transformers have demonstrated remarkable capabilities across various domains, their quadratic attention mechanisms introduce significant computational overhead when processing long...
How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment : Abstract: Foundation models in video generation are demonstrating remarkable capabilities as potential world models for simulating the physical world. However, their application in high-stakes domains...
ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation : Abstract: Video generative models pretrained on large-scale datasets can produce high-quality videos, but are often conditioned on text or a single image, limiting controllability and applicability. W...
SegDebias: Test-Time Bias Mitigation for ViT-Based CLIP via Segmentation : Abstract: Vision language models such as CLIP have shown remarkable performance in zero shot classification, but remain susceptible to spurious correlations, where irrelevant visual features influence...
Text-guided Fine-Grained Video Anomaly Detection : Abstract: Video Anomaly Detection (VAD) aims to identify anomalous events within video segments. In scenarios such as surveillance or industrial process monitoring, anomaly detection is of critical im...
Real-IAD Variety: Pushing Industrial Anomaly Detection Dataset to a Modern Era : Abstract: Industrial Anomaly Detection (IAD) is critical for enhancing operational safety, ensuring product quality, and optimizing manufacturing efficiency across global industries. However, the IAD ...
MIFO: Learning and Synthesizing Multi-Instance from One Image : Abstract: This paper proposes a method for precise learning and synthesizing multi-instance semantics from a single image. The difficulty of this problem lies in the limited training data, and it beco...
4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting : Abstract: Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gau...
Generalized Category Discovery under Domain Shift: A Frequency Domain Perspective : Abstract: Generalized Category Discovery (GCD) aims to leverage labeled samples from known categories to cluster unlabeled data that may include both known and unknown categories. While existing metho...
TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection : Abstract: Video anomalies often depend on contextual information available and temporal evolution. Non-anomalous action in one context can be anomalous in some other context. Most anomaly detectors, h...
CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World : Abstract: How far are deep models from real-world video anomaly understanding (VAU)? Current works typically emphasize on detecting unexpected occurrences deviated from normal patterns or comprehendin...
Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach : Abstract: Understanding surgical instrument-tissue interactions requires not only identifying which instrument performs which action on which anatomical target, but also grounding these interactions s...
Benchmarking individual tree segmentation using multispectral airborne laser scanning data: the FGI-EMIT dataset : Abstract: Individual tree segmentation (ITS) from LiDAR point clouds is fundamental for applications such as forest inventory, carbon monitoring and biodiversity assessment. Traditionally, ITS has bee...
Outlier-Aware Post-Training Quantization for Image Super-Resolution : Abstract: Quantization techniques, including quantization-aware training (QAT) and post-training quantization (PTQ), have become essential for inference acceleration of image super-resolution (SR) net...
Evolve to Inspire: Novelty Search for Diverse Image Generation : Abstract: Text-to-image diffusion models, while proficient at generating high-fidelity im- ages, often suffer from limited output diversity, hindering their application in exploratory and ideation tas...
Toward Better Optimization of Low-Dose CT Enhancement: A Critical Analysis of Loss Functions and Image Quality Assessment Metrics : Abstract: Low-dose CT (LDCT) imaging is widely used to reduce radiation exposure to mitigate high exposure side effects, but often suffers from noise and artifacts that affect diagnostic accuracy. To ...
Validating Deep Models for Alzheimer's 18F-FDG PET Diagnosis Across Populations: A Study with Latin American Data : Abstract: Deep learning models have shown strong performance in diagnosing Alzheimer's disease (AD) using neuroimaging data, particularly 18F-FDG PET scans, with training datasets largely composed of ...
Towards classification-based representation learning for place recognition on LiDAR scans : Abstract: Place recognition is a crucial task in autonomous driving, allowing vehicles to determine their position using sensor data. While most existing methods rely on contrastive learning, we explo...
A Hybrid YOLOv5-SSD IoT-Based Animal Detection System for Durian Plantation Protection : Abstract: Durian plantation suffers from animal intrusions that cause crop damage and financial loss. The traditional farming practices prove ineffective due to the unavailability of monitoring withou...
Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking : Abstract: 3D instance segmentation is an important task for real-world applications. To avoid costly manual annotations, existing methods have explored generating pseudo labels by transferring 2D mask...
FedOnco-Bench: A Reproducible Benchmark for Privacy-Aware Federated Tumor Segmentation with Synthetic CT Data : Abstract: Federated Learning (FL) allows multiple institutions to cooperatively train machine learning models while retaining sensitive data at the source, which has great utility in privacy-sensitive...
Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing : Abstract: Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community's progress remains constrained by the absence ...
TA-LSDiff:Topology-Aware Diffusion Guided by a Level Set Energy for Pancreas Segmentation : Abstract: Pancreas segmentation in medical image processing is a persistent challenge due to its small size, low contrast against adjacent tissues, and significant topological variations. Traditional ...
OMEGA: Optimized Multimodal Position Encoding Index Derivation with Global Adaptive Scaling for Vision-Language Models : Abstract: Vision-Language Models (VLMs) have demonstrated strong performance across various multimodal tasks, where position encoding plays a vital role in modeling both the sequential structure of te...
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack : Abstract: Visual-Language Pre-training (VLP) models have achieved significant performance across various downstream tasks. However, they remain vulnerable to adversarial examples. While prior efforts ...
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials : Abstract: Vision Transformers (ViTs) have become a universal backbone for both image recognition and image generation. Yet their Multi-Head Self-Attention (MHSA) layer still performs a quadratic query...
Parameter Interpolation Adversarial Training for Robust Image Classification : Abstract: Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective...
OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks : Abstract: Brain imaging analysis is vital for diagnosing and treating brain disorders, and multimodal large language models (MLLMs) are increasingly assisting in that analysis. However, current brain-...
Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion : Abstract: In autonomous driving, transparency in the decision-making of perception models is critical, as even a single misperception can be catastrophic. Yet with multi-sensor inputs, it is difficult...
GraphGeo: Multi-Agent Debate Framework for Visual Geo-localization with Heterogeneous Graph Neural Networks : Abstract: Visual geo-localization requires extensive geographic knowledge and sophisticated reasoning to determine image locations without GPS metadata. Traditional retrieval methods are constrained b...
Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable effectiveness in various general-domain scenarios, such as visual question answering and image captioning. Recently, res...
Dynamic Multi-level Weighted Alignment Network for Zero-shot Sketch-based Image Retrieval : Abstract: The problem of zero-shot sketch-based image retrieval (ZS-SBIR) has achieved increasing attention due to its wide applications, e.g. e-commerce. Despite progress made in this field, previous...
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference : Abstract: We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance tr...
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis : Abstract: Most video-anomaly research stops at frame-wise detection, offering little insight into why an event is abnormal, typically outputting only frame-wise anomaly scores without spatial or seman...
VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel : Abstract: Accurate vessel segmentation is critical for clinical applications such as disease diagnosis and surgical planning, yet remains challenging due to thin, branching structures and low texture ...
MID: A Self-supervised Multimodal Iterative Denoising Framework : Abstract: Data denoising is a persistent challenge across scientific and engineering domains. Real-world data is frequently corrupted by complex, non-linear noise, rendering traditional rule-based den...
HyFormer-Net: A Synergistic CNN-Transformer with Interpretable Multi-Scale Fusion for Breast Lesion Segmentation and Classification in Ultrasound Images : Abstract: B-mode ultrasound for breast cancer diagnosis faces challenges: speckle, operator dependency, and indistinct boundaries. Existing deep learning suffers from single-task learning, architectur...
FastBoost: Progressive Attention with Dynamic Scaling for Efficient Deep Learning : Abstract: We present FastBoost, a parameter-efficient neural architecture that achieves state-of-the-art performance on CIFAR benchmarks through a novel Dynamically Scaled Progressive Attention (DSPA)...
T-MLA: A Targeted Multiscale Log--Exponential Attack Framework for Neural Image Compression : Abstract: Neural image compression (NIC) has become the state-of-the-art for rate-distortion performance, yet its security vulnerabilities remain significantly less understood than those of classifier...
Epanechnikov nonparametric kernel density estimation based feature-learning in respiratory disease chest X-ray images : Abstract: This study presents a novel method for diagnosing respiratory diseases using image data. It combines Epanechnikov's non-parametric kernel density estimation (EKDE) with a bimodal logistic re...
Anatomically Constrained Transformers for Echocardiogram Analysis : Abstract: Video transformers have recently demonstrated strong potential for echocardiogram (echo) analysis, leveraging self-supervised pre-training and flexible adaptation across diverse tasks. Howev...
Boosting performance of computer vision applications through embedded GPUs on the edge : Abstract: Computer vision applications, especially those using augmented reality technology, are becoming quite popular in mobile devices. However, this type of application is known as presenting sign...
Weakly Supervised Concept Learning with Class-Level Priors for Interpretable Medical Diagnosis : Abstract: Human-interpretable predictions are essential for deploying AI in medical imaging, yet most interpretable-by-design (IBD) frameworks require concept annotations for training data, which are ...
MicroAUNet: Boundary-Enhanced Multi-scale Fusion with Knowledge Distillation for Colonoscopy Polyp Image Segmentation : Abstract: Early and accurate segmentation of colorectal polyps is critical for reducing colorectal cancer mortality, which has been extensively explored by academia and industry. However, current deep...
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation : Abstract: Unified multimodal models (UMMs) have emerged as a powerful paradigm for seamlessly unifying text and image understanding and generation. However, prevailing evaluations treat these abilitie...
Web-Scale Collection of Video Data for 4D Animal Reconstruction : Abstract: Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-...
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution : Abstract: Discrete Wavelet Transform (DWT) has been widely explored to enhance the performance of image superresolution (SR). Despite some DWT-based methods improving SR by capturing fine-grained freq...
A Topology-Aware Graph Convolutional Network for Human Pose Similarity and Action Quality Assessment : Abstract: Action Quality Assessment (AQA) requires fine-grained understanding of human motion and precise evaluation of pose similarity. This paper proposes a topology-aware Graph Convolutional Networ...
MoSa: Motion Generation with Scalable Autoregressive Modeling : Abstract: We introduce MoSa, a novel hierarchical motion generation framework for text-driven 3D human motion generation that enhances the Vector Quantization-guided Generative Transformers (VQ-GT) pa...
OmniVLA: Unifiying Multi-Sensor Perception for Physically-Grounded Multimodal VLA : Abstract: Vision-language-action (VLA) models have shown strong generalization for action prediction through large-scale vision-language pretraining. However, most existing models rely solely on RGB c...
Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering : Abstract: The immense diversity in the culture and culinary of Indian cuisines calls attention to the major shortcoming of the existing Visual Question Answering(VQA) systems which are inclined toward...
Saliency-Guided Domain Adaptation for Left-Hand Driving in Autonomous Steering : Abstract: Domain adaptation is required for automated driving models to generalize well across diverse road conditions. This paper explores a training method for domain adaptation to adapt PilotNet, a...
Gesture Generation (Still) Needs Improved Human Evaluation Practices: Insights from a Community-Driven State-of-the-Art Benchmark : Abstract: We review human evaluation practices in automated, speech-driven 3D gesture generation and find a lack of standardisation and frequent use of flawed experimental setups. This leads to a situ...
Eyes on Target: Gaze-Aware Object Detection in Egocentric Video : Abstract: Human gaze offers rich supervisory signals for understanding visual attention in complex visual environments. In this paper, we propose Eyes on Target, a novel depth-aware and gaze-guided ob...
Beyond Deceptive Flatness: Dual-Order Solution for Strengthening Adversarial Transferability : Abstract: Transferable attacks generate adversarial examples on surrogate models to fool unknown victim models, posing real-world threats and growing research interest. Despite focusing on flat losses...
CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion Segmentation : Abstract: Brain lesion segmentation remains challenging due to small, low-contrast lesions, anisotropic sampling, and cross-slice discontinuities. We propose CenterMamba-SAM, an end-to-end framework t...
Source-Only Cross-Weather LiDAR via Geometry-Aware Point Drop : Abstract: LiDAR semantic segmentation degrades in adverse weather because refraction, scattering, and point dropouts corrupt geometry. Prior work in weather simulation, mixing-based augmentation, doma...
PRevivor: Reviving Ancient Chinese Paintings using Prior-Guided Color Transformers : Abstract: Ancient Chinese paintings are a valuable cultural heritage that is damaged by irreversible color degradation. Reviving color-degraded paintings is extraordinarily difficult due to the comple...
Adaptation of Foundation Models for Medical Image Analysis: Strategies, Challenges, and Future Directions : Abstract: Foundation models (FMs) have emerged as a transformative paradigm in medical image analysis, offering the potential to provide generalizable, task-agnostic solutions across a wide range of c...
Detecting Generated Images by Fitting Natural Image Distributions : Abstract: The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training bi...
UniREditBench: A Unified Reasoning-based Image Editing Benchmark : Abstract: Recent advances in multi-modal generative models have driven substantial improvements in image editing. However, current generative models still struggle with handling diverse and complex im...
REASON: Probability map-guided dual-branch fusion framework for gastric content assessment : Abstract: Accurate assessment of gastric content from ultrasound is critical for stratifying aspiration risk at induction of general anesthesia. However, traditional methods rely on manual tracing of ...
Positive Semi-definite Latent Factor Grouping-Boosted Cluster-reasoning Instance Disentangled Learning for WSI Representation : Abstract: Multiple instance learning (MIL) has been widely used for representing whole-slide pathology images. However, spatial, semantic, and decision entanglements among instances limit its represen...
Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models : Abstract: Recent advances in diffusion models have enabled high-quality synthesis of specific subjects, such as identities or objects. This capability, while unlocking new possibilities in content cre...
MVSMamba: Multi-View Stereo with State Space Model : Abstract: Robust feature representations are essential for learning-based Multi-View Stereo (MVS), which relies on accurate feature matching. Recent MVS methods leverage Transformers to capture long-r...
A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model : Abstract: The rapid growth of deep learning has brought about powerful models that can handle various tasks, like identifying images and understanding language. However, adversarial attacks, an unnoti...
RDTE-UNet: A Boundary and Detail Aware UNet for Precise Medical Image Segmentation : Abstract: Medical image segmentation is essential for computer-assisted diagnosis and treatment planning, yet substantial anatomical variability and boundary ambiguity hinder reliable delineation of f...
Eye Tracking Based Cognitive Evaluation of Automatic Readability Assessment Measures : Abstract: Methods for scoring text readability have been studied for over a century, and are widely used in research and in user-facing applications in many domains. Thus far, the development and eval...
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models : Abstract: Pre-training large language models (LLMs) on vast text corpora enhances natural language processing capabilities but risks encoding social biases, particularly gender bias. While parameter-m...
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps : Abstract: When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. Despite m...
Targeted Distillation for Sentiment Analysis : Abstract: This paper explores targeted distillation methods for sentiment analysis, aiming to build compact and practical models that preserve strong and generalizable sentiment analysis capabilities....
Medical Hallucinations in Foundation Models and Their Impact on Healthcare : Abstract: Hallucinations in foundation models arise from autoregressive training objectives that prioritize token-likelihood optimization over epistemic accuracy, fostering overconfidence and poorly c...
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following : Abstract: Large Language Models (LLMs) have demonstrated remarkable instruction-following capabilities across various applications. However, their performance in multilingual settings lacks systematic...
Natural Language Generation : Abstract: This article provides a brief overview of the field of Natural Language Generation. The term Natural Language Generation (NLG), in its broadest definition, refers to the study of systems tha...
Do LLM Evaluators Prefer Themselves for a Reason? : Abstract: Large language models (LLMs) are increasingly used as automatic evaluators in applications like benchmarking, reward modeling, and self-refinement. Prior work highlights a potential self-pre...
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making : Abstract: Large language models (LLMs) have shown potential in supporting decision-making applications, particularly as personal assistants in the financial, healthcare, and legal domains. While promp...
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning : Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs). However, our study reveals a surprising contrad...
Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment : Abstract: The reasoning capabilities of large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, have seen substantial advancements through deep thinking. However, these enhancements come w...
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access : Abstract: A key advantage of Recurrent Neural Networks (RNNs) over Transformers is their linear computational and space complexity enables faster training and inference for long sequences. However, RN...
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts : Abstract: In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensi...
JobHop: A Large-Scale Dataset of Career Trajectories : Abstract: Understanding labor market dynamics is essential for policymakers, employers, and job seekers. However, comprehensive datasets that capture real-world career trajectories are scarce. In this...
Editing Across Languages: A Survey of Multilingual Knowledge Editing : Abstract: While Knowledge Editing has been extensively studied in monolingual settings, it remains underexplored in multilingual contexts. This survey systematizes recent research on Multilingual Know...
Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling : Abstract: Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-...
The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions : Abstract: This paper is the first investigation of the connection between emotion, embodiment, and everyday language in a large sample of natural language data. We created corpora of body part mention...
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning : Abstract: Large Language Models (LLMs) have shown impressive performance on complex tasks through Chain-of-Thought (CoT) reasoning. However, conventional CoT relies on explicitly verbalized intermedia...
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally : Abstract: Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations c...
Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning : Abstract: Project-Based Learning (PBL) involves a variety of highly correlated multimodal data, making it a vital educational approach within STEM disciplines. With the rapid development of multimodal...
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation : Abstract: SDForger is a flexible and efficient framework for generating high-quality multivariate time series using LLMs. Leveraging a compact data representation, SDForger provides synthetic time ser...
GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations : Abstract: We introduce GreekBarBench, a benchmark that evaluates LLMs on legal questions across five different legal areas from the Greek Bar exams, requiring citations to statutory articles and case ...
KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization : Abstract: This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) mo...
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models : Abstract: Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the mod...
Trustworthy Medical Question Answering: An Evaluation-Centric Survey : Abstract: Trustworthiness in healthcare question-answering (QA) systems is important for ensuring patient safety, clinical effectiveness, and user confidence. As large language models (LLMs) become in...
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque : Abstract: Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to conventio...
Discourse Heuristics For Paradoxically Moral Self-Correction : Abstract: Moral self-correction has emerged as a promising approach for aligning the output of Large Language Models (LLMs) with human moral values. However, moral self-correction techniques are subje...
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains : Abstract: We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable f...
An Exploration of Knowledge Editing for Arabic : Abstract: While Knowledge Editing (KE) has been widely explored in English, its behavior in morphologically rich languages like Arabic remains underexamined. In this work, we present the first study o...
Large Language Models as Medical Codes Selectors: a benchmark using the International Classification of Primary Care : Abstract: Background: Medical coding structures healthcare data for research, quality monitoring, and policy. This study assesses the potential of large language models (LLMs) to assign ICPC-2 codes u...
GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration : Abstract: Graphs are widely used for modeling relational data in real-world scenarios, such as social networks and urban computing. Existing LLM-based graph analysis approaches either integrate graph ...
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks : Abstract: Jailbreaks have been a central focus of research regarding the safety and reliability of large language models (LLMs), yet the mechanisms underlying these attacks remain poorly understood. W...
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers : Abstract: Multimodal Large Language Models (MLLMs) suffer from high computational costs due to their massive size and the large number of visual tokens. In this paper, we investigate layer-wise redund...
CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward : Abstract: We propose Fuzzy Group Relative Policy Reward (FGRPR), a novel framework that integrates Group Relative Policy Optimization (GRPO) with a fuzzy reward function to enhance learning efficiency...
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark : Abstract: Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with ...
UI-Evol: Automatic Knowledge Evolving for Computer Use Agents : Abstract: External knowledge has played a crucial role in the recent development of computer use agents. We identify a critical knowledge-execution gap: retrieved knowledge often fails to translate in...
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning : Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific be...
PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue : Abstract: Recent rapid progress in the field of computational pathology has been enabled by foundation models. These models are beginning to move beyond encoding image patches towards whole-slide unde...
Generative human motion mimicking through feature extraction in denoising diffusion settings : Abstract: Recent success with large language models has sparked a new wave of verbal human-AI interaction. While such models support users in a variety of creative tasks, they lack the embodied nature...
Deep Learning Models for Coral Bleaching Classification in Multi-Condition Underwater Image Datasets : Abstract: Coral reefs support numerous marine organisms and are an important source of coastal protection from storms and floods, representing a major part of marine ecosystems. However coral reefs fa...
Automating Coral Reef Fish Family Identification on Video Transects Using a YOLOv8-Based Deep Learning Pipeline : Abstract: Coral reef monitoring in the Western Indian Ocean is limited by the labor demands of underwater visual censuses. This work evaluates a YOLOv8-based deep learning pipeline for automating fami...
Mutual Information guided Visual Contrastive Learning : Abstract: Representation learning methods utilizing the InfoNCE loss have demonstrated considerable capacity in reducing human annotation effort by training invariant neural feature extractors. Althou...
Benchmarking Federated Learning Frameworks for Medical Imaging Deployment: A Comparative Study of NVIDIA FLARE, Flower, and Owkin Substra : Abstract: Federated Learning (FL) has emerged as a transformative paradigm in medical AI, enabling collaborative model training across institutions without direct data sharing. This study benchmarks t...
Enhancing rice leaf images: An overview of image denoising techniques : Abstract: Digital image processing involves the systematic handling of images using advanced computer algorithms, and has gained significant attention in both academic and practical fields. Image enha...
Which LiDAR scanning pattern is better for roadside perception: Repetitive or Non-repetitive? : Abstract: LiDAR-based roadside perception is a cornerstone of advanced Intelligent Transportation Systems (ITS). While considerable research has addressed optimal LiDAR placement for infrastructure, t...
Habitat and Land Cover Change Detection in Alpine Protected Areas: A Comparison of AI Architectures : Abstract: Rapid climate change and other disturbances in alpine ecosystems demand frequent habitat monitoring, yet manual mapping remains prohibitively expensive for the required temporal resolution. ...
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation : Abstract: We present LeMiCa, a training-free and efficient acceleration framework for diffusion-based video generation. While existing caching strategies primarily focus on reducing local heuristic er...
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL : Abstract: Supervised fine-tuning (SFT) has become the de facto post-training strategy for large vision-language-action (VLA) models, but its reliance on costly human demonstrations limits scalability ...
SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation : Abstract: The anatomical structure segmentation of the spine and adjacent structures from computed tomography (CT) images is a key step for spinal disease diagnosis and treatment. However, the segment...
FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video : Abstract: Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts...
AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency : Abstract: Text to video generation has emerged as a critical frontier in generative artificial intelligence, yet existing approaches struggle with maintaining temporal consistency, compositional under...
Chain of Time: In-Context Physical Simulation with Image Generation Models : Abstract: We propose a novel cognitively-inspired method to improve and interpret physical simulation in vision-language models. Our ``Chain of Time" method involves generating a series of intermediat...
VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images : Abstract: The primary challenge in computer vision is precisely calculating the pose of 6D objects, however many current approaches are still fragile and have trouble generalizing from synthetic data ...
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding : Abstract: Recent studies in long video understanding have harnessed the advanced visual-language reasoning capabilities of Large Multimodal Models (LMMs), driving the evolution of video-LMMs specializ...
BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing : Abstract: Recent advances in text-to-image models have increased the exposure of powerful image editing techniques as a tool, raising concerns about their potential for malicious use. An emerging line...
CompAgent: An Agentic Framework for Visual Compliance Verification : Abstract: Visual compliance verification is a critical yet underexplored problem in computer vision, especially in domains such as media, entertainment, and advertising where content must adhere to co...
From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection : Abstract: The rapid evolution of AI-generated images poses unprecedented challenges to information integrity and media authenticity. Existing detection approaches suffer from fundamental limitations: ...
An Efficient and Generalizable Transfer Learning Method for Weather Condition Detection on Ground Terminals : Abstract: The increasing adoption of satellite Internet with low-Earth-orbit (LEO) satellites in mega-constellations allows ubiquitous connectivity to rural and remote areas. However, weather events h...
DM-QPMNET: Dual-modality fusion network for cell segmentation in quantitative phase microscopy : Abstract: Cell segmentation in single-shot quantitative phase microscopy (ssQPM) faces challenges from traditional thresholding methods that are sensitive to noise and cell density, while deep learnin...
Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior : Abstract: Petascale electron microscopy (EM) datasets push storage, transfer, and downstream analysis toward their current limits. We present a vector-quantized variational autoencoder-based (VQ-VAE) ...
Hyperbolic Optimal Transport : Abstract: The optimal transport (OT) problem aims to find the most efficient mapping between two probability distributions under a given cost function, and has diverse applications in many fields such...
Object-Aware 4D Human Motion Generation : Abstract: Recent advances in video diffusion models have enabled the generation of high-quality videos. However, these videos still suffer from unrealistic deformations, semantic violations, and physi...
Merlin L48 Spectrogram Dataset : Abstract: In the single-positive multi-label (SPML) setting, each image in a dataset is labeled with the presence of a single class, while the true presence of other classes remains unknown. The chall...
BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing : Abstract: In entomology and ecology research, biologists often need to collect a large number of insects, among which beetles are the most common species. A common practice for biologists to organize ...
MambaNetLK: Enhancing Colonoscopy Point Cloud Registration with Mamba : Abstract: Accurate 3D point cloud registration underpins reliable image-guided colonoscopy, directly affecting lesion localization, margin assessment, and navigation safety. However, biological tissue...
Spot The Ball: A Benchmark for Visual Social Inference : Abstract: Humans excel at visual social inference, the ability to infer hidden elements of a scene from subtle behavioral cues such as other people's gaze, pose, and orientation. This ability drives e...
FedReplay: A Feature Replay Assisted Federated Transfer Learning Framework for Efficient and Privacy-Preserving Smart Agriculture : Abstract: Accurate classification plays a pivotal role in smart agriculture, enabling applications such as crop monitoring, fruit recognition, and pest detection. However, conventional centralized tra...
Multi-View Consistent Human Image Customization via In-Context Learning : Abstract: Recent advances in personalized generative models demonstrate impressive results in creating identity-consistent images of the same person under diverse settings. Yet, we note that most meth...
Towards Automated Petrography : Abstract: Petrography is a branch of geology that analyzes the mineralogical composition of rocks from microscopical thin section samples. It is essential for understanding rock properties across geol...
A DeepONet joint Neural Tangent Kernel Hybrid Framework for Physics-Informed Inverse Source Problems and Robust Image Reconstruction : Abstract: This work presents a novel hybrid approach that integrates Deep Operator Networks (DeepONet) with the Neural Tangent Kernel (NTK) to solve complex inverse problem. The method effectively add...
Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities : Abstract: Multimodal Emotion Recognition in Conversations (MERC) enhances emotional understanding through the fusion of multimodal signals. However, unpredictable modality absence in real-world scenar...
Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach : Abstract: The rapid rise of generative diffusion models has made distinguishing authentic visual content from synthetic imagery increasingly challenging. Traditional deepfake detection methods, which ...
Transfer Learning for Onboard Cloud Segmentation in Thermal Earth Observation: From Landsat to a CubeSat Constellation : Abstract: Onboard cloud segmentation is a critical yet underexplored task in thermal Earth observation (EO), particularly for CubeSat missions constrained by limited hardware and spectral information....
Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery : Abstract: Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR s...
Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict : Abstract: Video moment retrieval uses a text query to locate a moment from a given untrimmed video reference. Locating corresponding video moments with text queries helps people interact with videos e...
VisionCAD: An Integration-Free Radiology Copilot Framework : Abstract: Widespread clinical deployment of computer-aided diagnosis (CAD) systems is hindered by the challenge of integrating with existing hospital IT infrastructure. Here, we introduce VisionCAD, a...
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond : Abstract: Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary dom...
VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning : Abstract: Multimodal code generation has garnered significant interest within the research community. Despite the notable success of recent vision-language models (VLMs) on specialized tasks like Char...
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks : Abstract: We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks, eg, SOD, CoSOD, and SIS, by casting each as a Chain-of-Thought (CoT) reasoning p...
LGCA: Enhancing Semantic Representation via Progressive Expansion : Abstract: Recent advancements in large-scale pretraining in natural language processing have enabled pretrained vision-language models such as CLIP to effectively align images and text, significantly ...
Leveraging Hierarchical Image-Text Misalignment for Universal Fake Image Detection : Abstract: With the rapid development of generative models, detecting generated fake images to prevent their malicious use has become a critical issue recently. Existing methods frame this challenge as...
Enhancing Frequency Forgery Clues for Diffusion-Generated Image Detection : Abstract: Diffusion models have achieved remarkable success in image synthesis, but the generated high-quality images raise concerns about potential malicious use. Existing detectors often struggle to...
Weakly Supervised Pneumonia Localization from Chest X-Rays Using Deep Neural Network and Grad-CAM Explanations : Abstract: This study proposes a weakly supervised deep learning framework for pneumonia classification and localization from chest X-rays, utilizing Grad-CAM explanations. Instead of costly pixel-leve...
HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation : Abstract: Recent advances in generative models have achieved high-fidelity in 3D human reconstruction, yet their utility for specific tasks (e.g., human 3D segmentation) remains constrained. We propos...
Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations : Abstract: Accurate segmentation of vestibular schwannoma (VS) on Magnetic Resonance Imaging (MRI) is essential for patient management but often requires time-intensive manual annotations by experts. W...
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models : Abstract: We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image. Our approach unifies the generative priors of video diffusion models ...
VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning : Abstract: We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pa...
OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback : Abstract: This paper investigates Multi-Object Tracking (MOT) in panoramic imagery, which introduces unique challenges including a 360{\deg} Field of View (FoV), resolution dilution, and severe view-d...
Leveraging Multi-Agent System (MAS) and Fine-Tuned Small Language Models (SLMs) for Automated Telecom Network Troubleshooting : Abstract: Telecom networks are rapidly growing in scale and complexity, making effective management, operation, and optimization increasingly challenging. Although Artificial Intelligence (AI) has bee...
Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models : Abstract: Social media has exacerbated the promotion of Western beauty norms, leading to negative self-image, particularly in women and girls, and causing harm such as body dysmorphia. Increasingly co...
Reevaluating Self-Consistency Scaling in Multi-Agent Systems : Abstract: This study examines the trade-offs of increasing sampled reasoning paths in self-consistency for modern large language models (LLMs). Earlier research with older models showed that combining...
MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models : Abstract: Spoken Dialogue Models (SDMs) have advanced rapidly, yet their ability to sustain genuinely interactive multi-turn conversations remains underexplored, as most benchmarks focus on single-tur...
LLMs Position Themselves as More Rational Than Humans: Emergence of AI Self-Awareness Measured Through Game Theory : Abstract: As Large Language Models (LLMs) grow in capability, do they develop self-awareness as an emergent behavior? And if so, can we measure it? We introduce the AI Self-Awareness Index (AISAI), a ...
ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL : Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-...
On the Emergence of Induction Heads for In-Context Learning : Abstract: Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as in-context learning (ICL): they can acqu...
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning : Abstract: Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these m...
S2Doc - Spatial-Semantic Document Format : Abstract: Documents are a common way to store and share information, with tables being an important part of many documents. However, there is no real common understanding of how to model documents and...
Novelty and Impact of Economics Papers : Abstract: We propose a framework that recasts scientific novelty not as a single attribute of a paper, but as a reflection of its position within the evolving intellectual landscape. We decompose this...
$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles : Abstract: Understanding Rebus Puzzles (Rebus Puzzles use pictures, symbols, and letters to represent words or phrases creatively) requires a variety of skills such as image recognition, cognitive skil...
Hidden in Plain Sight: Where Developers Confess Self-Admitted Technical Debt : Abstract: Context. Detecting Self-Admitted Technical Debt (SATD) is crucial for proactive software maintenance. Previous research has primarily targeted detecting and prioritizing SATD, with little fo...
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly improved 2D visual understanding, prompting interest in their application to complex 3D reasoning tasks. Howeve...
MaiBaam Annotation Guidelines : Abstract: This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs ...
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists : Abstract: Existing LLM-as-a-Judge approaches for evaluating text generation suffer from rating inconsistencies, with low agreement and high rating variance across different evaluator models. We attrib...
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models : Abstract: The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language mode...
Exploring Large Language Models for Detecting Mental Disorders : Abstract: This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five Ru...
A Comprehensive Evaluation of Cognitive Biases in LLMs : Abstract: We present a large-scale evaluation of 30 cognitive biases in 20 state-of-the-art large language models (LLMs) under various decision-making scenarios. Our contributions include a novel gene...
Incivility and Rigidity: Evaluating the Risks of Fine-Tuning LLMs for Political Argumentation : Abstract: Incivility on platforms such as Twitter (now X) and Reddit complicates the development of AI systems that can support productive, rhetorically sound political argumentation. We present exper...
Training Large Language Models to Reason in a Continuous Latent Space : Abstract: Large language models (LLMs) are typically constrained to reason in the language space, where they express the reasoning process through a chain-of-thought (CoT) to solve complex problems. H...
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding : Abstract: Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual featu...
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks : Abstract: With the rapid advancement of Large Language Models (LLMs), the safety of LLMs has been a critical concern requiring precise assessment. Current benchmarks primarily concentrate on single-tu...
Cognitive Alignment in Personality Reasoning: Leveraging Prototype Theory for MBTI Inference : Abstract: Personality recognition from text is typically cast as hard-label classification, which obscures the graded, prototype-like nature of human personality judgments. We present ProtoMBTI, a cog...
Training LLMs Beyond Next Token Prediction - Filling the Mutual Information Gap : Abstract: Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work c...
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning : Abstract: Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable t...
AgentBnB: A Browser-Based Cybersecurity Tabletop Exercise with Large Language Model Support and Retrieval-Aligned Scaffolding : Abstract: Traditional cybersecurity tabletop exercises (TTXs) provide valuable training but are often scripted, resource-intensive, and difficult to scale. We introduce AgentBnB, a browser-based re-im...
Language Modeling With Factorization Memory : Abstract: We propose Factorization Memory, an efficient recurrent neural network (RNN) architecture that achieves performance comparable to Transformer models on short-context language modeling tasks ...
Reversal Invariance in Autoregressive Language Models : Abstract: We formalize a structural property of the causal (autoregressive) language modeling (CLM) objective: reversal invariance. Formally, the next-token prediction loss assigns identical likelihoo...
LingGym: How Far Are LLMs from Thinking Like Field Linguists? : Abstract: This paper introduces LingGym, a new benchmark that evaluates LLMs' capacity for meta-linguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18...
Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs : Abstract: In Socratic debugging, instructors guide students towards identifying and fixing a bug on their own, instead of providing the bug fix directly. Most novice programmer bugs are caused by prog...
PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks : Abstract: While AI-generated text (AIGT) detectors achieve over 90\% accuracy on direct LLM outputs, they fail catastrophically against iteratively-paraphrased content. We investigate why iteratively-...
G2: Guided Generation for Enhanced Output Diversity in LLMs : Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse natural language processing tasks. However, these models exhibit a critical limitation in output diversi...
Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks : Abstract: Ongoing breakthroughs in Large Language Models (LLMs) are reshaping search and recommendation platforms at their core. While this shift unlocks powerful new scientometric tools, it also expo...
Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus : Abstract: The linguistic diversity of India poses significant machine translation challenges, especially for underrepresented tribal languages like Bhili, which lack high-quality linguistic resources....
With Privacy, Size Matters: On the Importance of Dataset Size in Differentially Private Text Rewriting : Abstract: Recent work in Differential Privacy with Natural Language Processing (DP NLP) has proposed numerous promising techniques in the form of text rewriting mechanisms. In the evaluation of these ...
ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models : Abstract: Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented G...
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge : Abstract: Retrieval-Augmented Generation has shown remarkable results to address Large Language Models' hallucinations, which usually uses a large external corpus to supplement knowledge to LLMs. Howe...
Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations : Abstract: Conversational agents are increasingly being explored to support healthcare delivery, particularly in resource-constrained settings such as rural Nepal. Large-scale conversational models typ...
Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models : Abstract: Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which have achieved state-of-the-art performance...
Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly : Abstract: Large Reasoning Models (LRMs) are often bottlenecked by the high cost of output tokens. We show that a significant portion of these tokens are useless self-repetitions - what we call "word s...
Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack : Abstract: Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities. Investigating these weaknesses is crucial for robust safety mechanisms. Existing...
FlashEVA: Accelerating LLM inference via Efficient Attention : Abstract: Transformer models have revolutionized natural language processing, achieving state-of-the-art performance and demonstrating remarkable scalability. However, their memory demands, particular...
OpenSIR: Open-Ended Self-Improving Reasoner : Abstract: Recent advances in large language model (LLM) reasoning through reinforcement learning rely on annotated datasets for verifiable rewards, which may limit models' ability to surpass human-lev...
SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding : Abstract: Speculative decoding has become the standard approach for accelerating Large Language Model (LLM) inference. It exploits a lossless draft-then-verify procedure to circumvent the latency of a...
Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios : Abstract: Reliable uncertainty quantification (UQ) is essential for ensuring trustworthy downstream use of large language models, especially when they are deployed in decision-support and other knowle...
Modeling the Construction of a Literary Archetype: The Case of the Detective Figure in French Literature : Abstract: This research explores the evolution of the detective archetype in French detective fiction through computational analysis. Using quantitative methods and character-level embeddings, we show...
Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge : Abstract: Most multilingual question-answering benchmarks, while covering a diverse pool of languages, do not factor in regional diversity in the information they capture and tend to be Western-centri...
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages? : Abstract: Large language models (LLMs) undergo safety alignment after training and tuning, yet recent work shows that safety can be bypassed through jailbreak attacks. While many jailbreaks and defens...
Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies : Abstract: In this work, we conduct a systematic analysis of Native Sparse Attention (NSA) and propose targeted improvements that enhance long-context modeling. A key insight is that alternating betwee...
TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models : Abstract: The increasing utilization of large language models raises significant concerns about the propagation of social biases, which may result in harmful and unfair outcomes. However, existing deb...
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval : Abstract: Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, existing methods for multimodal document retrieval often ...
The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical Diagnoses : Abstract: Large language models (LLMs) show promise for supporting clinicians in diagnostic communication by generating explanations and guidance for patients. Yet their ability to produce outputs tha...
The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs using Indian Riddles : Abstract: The extent to which large language models (LLMs) can perform culturally grounded reasoning across non-English languages remains underexplored. This paper examines the reasoning and self-asse...
Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective : Abstract: Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional t...
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL : Abstract: Translating natural language to SQL remains difficult for complex queries. Such queries often need environmental interaction and self-correction. To address this, we introduce MARS-SQL, a no...
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation : Abstract: Instruction following is a fundamental ability of Large Language Models (LLMs), requiring their generated outputs to follow multiple constraints imposed in input instructions. Numerous studi...
Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning : Abstract: Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and eff...
VayuChat: An LLM-Powered Conversational Interface for Air Quality Data Analytics : Abstract: Air pollution causes about 1.6 million premature deaths each year in India, yet decision makers struggle to turn dispersed data into decisions. Existing tools require expertise and provide s...
Building a Silver-Standard Dataset from NICE Guidelines for Clinical LLMs : Abstract: Large language models (LLMs) are increasingly used in healthcare, yet standardised benchmarks for evaluating guideline-based clinical reasoning are missing. This study introduces a validated...
HPLT~3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models : Abstract: We present an ongoing initiative to provide open, very large, high-quality, and richly annotated textual datasets for almost 200 languages. At 30 trillion tokens, this is likely the largest ...
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering : Abstract: Large Language Models (LLMs) have recently exploded in popularity, often matching or outperforming human abilities on many tasks. One of the key factors in training LLMs is the availability ...
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence : Abstract: Reasoning over temporal and numerical data, such as time series, is a crucial aspect of fact-checking. While many systems have recently been developed to handle this form of evidence, their ...
MicroRemed: Benchmarking LLMs in Microservices Remediation : Abstract: Large Language Models (LLMs) integrated with agent-based reasoning frameworks have recently shown strong potential for autonomous decision-making and system-level operations. One promising y...
Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs : Abstract: Large language models (LLMs) are widely deployed for open-ended communication, yet most bias evaluations still rely on English, classification-style tasks. We introduce DebateBias-8K, a new ...
ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction : Abstract: The rapid spread of fake news threatens social stability and public trust, rendering its detection an imperative research priority. Although large language models (LLMs) excel at numerous na...
DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection : Abstract: Detecting machine-generated text (MGT) has emerged as a critical challenge, driven by the rapid advancement of large language models (LLMs) capable of producing highly realistic, human-like ...
AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs : Abstract: This paper investigates the impact of domain specificity on abstractive summarisation of Arabic financial texts using large language models (LLMs). We introduce AraFinNews, the largest publi...
When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding : Abstract: Speculative decoding (SD) has emerged as an effective technique to accelerate large language model (LLM) inference without compromising output quality. However, the achievable speedup largel...
"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers : Abstract: With the rapid advancement of AI models, their deployment across diverse tasks has become increasingly widespread. A notable emerging application is leveraging AI models to assist in reviewi...
FirstAidQA: A Synthetic Dataset for First Aid and Emergency Response in Low-Connectivity Settings : Abstract: In emergency situations, every second counts. The deployment of Large Language Models (LLMs) in time-sensitive, low or zero-connectivity environments remains limited. Current models are comp...
DeepSpecs: Expert-Level Questions Answering in 5G : Abstract: 5G technology enables mobile Internet access for billions of users. Answering expert-level questions about 5G specifications requires navigating thousands of pages of cross-referenced standa...
DEEPAMBIGQA: Ambiguous Multi-hop Questions for Benchmarking LLM Answer Completeness : Abstract: Large language models (LLMs) with integrated search tools show strong promise in open-domain question answering (QA), yet they often struggle to produce complete answer set to complex questi...
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series : Abstract: Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning perfor...
PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise : Abstract: Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model ou...
Safer in Translation? Presupposition Robustness in Indic Languages : Abstract: Increasingly, more and more people are turning to large language models (LLMs) for healthcare advice and consultation, making it important to gauge the efficacy and accuracy of the responses...
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation : Abstract: The rapid rise of Large Language Models (LLMs) and Large Reasoning Models (LRMs) has been accompanied by an equally rapid increase of benchmarks used to assess them. However, due to both imp...
Confounding Factors in Relating Model Performance to Morphology : Abstract: The extent to which individual language characteristics influence tokenization and language modeling is an open question. Differences in morphological systems have been suggested as both uni...
RAGSmith: A Framework for Finding the Optimal Composition of Retrieval-Augmented Generation Methods Across Datasets : Abstract: Retrieval-Augmented Generation (RAG) quality depends on many interacting choices across retrieval, ranking, augmentation, prompting, and generation, so optimizing modules in isolation is bri...
LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge : Abstract: Evaluating large language models (LLMs) on question answering often relies on static benchmarks that reward memorization and understate the role of retrieval, failing to capture the dynamic ...
"Don't Teach Minerva": Guiding LLMs Through Complex Syntax for Faithful Latin Translation with RAG : Abstract: Translating a morphology-rich, low-resource language like Latin poses significant challenges. This paper introduces a reproducible draft-based refinement pipeline that elevates open-source L...
BARD: budget-aware reasoning distillation : Abstract: While long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models, the reasoning process often remains redundant and computational budget u...
Towards Consistent Detection of Cognitive Distortions: LLM-Based Annotation and Dataset-Agnostic Evaluation : Abstract: Text-based automated Cognitive Distortion detection is a challenging task due to its subjective nature, with low agreement scores observed even among expert human annotators, leading to unre...
Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning : Abstract: As synthetic data becomes widely used in language model development, understanding its impact on model behavior is crucial. This paper investigates the impact of the diversity of sources of ...
BanglaNirTox: A Large-scale Parallel Corpus for Explainable AI in Bengali Text Detoxification : Abstract: Toxic language in Bengali remains prevalent, especially in online environments, with few effective precautions against it. Although text detoxification has seen progress in high-resource lan...
Difficulty-Controllable Cloze Question Distractor Generation : Abstract: Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods of...
Math anxiety and associative knowledge structure are entwined in psychology students but not in Large Language Models like GPT-3.5 and GPT-4o : Abstract: Math anxiety poses significant challenges for university psychology students, affecting their career choices and overall well-being. This study employs a framework based on behavioural forma...
ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue Generation : Abstract: Controllable Dialogue Generation (CDG) enables chatbots to generate responses with desired attributes, and weighted decoding methods have achieved significant success in the CDG task. Howeve...
BIRD: Bronze Inscription Restoration and Dating : Abstract: Bronze inscriptions from early China are fragmentary and difficult to date. We introduce BIRD(Bronze Inscription Restoration and Dating), a fully encoded dataset grounded in standard scholar...
Imperfect Language, Artificial Intelligence, and the Human Mind: An Interdisciplinary Approach to Linguistic Errors in Native Spanish Speakers : Abstract: Linguistic errors are not merely deviations from normative grammar; they offer a unique window into the cognitive architecture of language and expose the current limitations of artificial sy...
ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech, Polish, and Serbian : Abstract: ParlaSpeech is a collection of spoken parliamentary corpora currently spanning four Slavic languages - Croatian, Czech, Polish and Serbian - all together 6 thousand hours in size. The corpor...
A Graph-based RAG for Energy Efficiency Question Answering : Abstract: In this work, we investigate the use of Large Language Models (LLMs) within a graph-based Retrieval Augmented Generation (RAG) architecture for Energy Efficiency (EE) Question Answering. Fir...
Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation : Abstract: This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy...
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia : Abstract: We introduce SeaLLMs-Audio, the first large audio-language model (LALM) tailored for multiple Southeast Asian (SEA) languages-Indonesian (id), Thai (th), and Vietnamese (vi)-alongside Englis...
Efficient Tool-Calling Multi-Expert NPC Agent for Commonsense Persona-Grounded Dialogue : Abstract: We present a multi-expert system for creating Non-Player Characters (NPCs) capable of both natural dialogue and contextual action execution in interactive environments. Using Qwen3 as the ba...
Accumulating Context Changes the Beliefs of Language Models : Abstract: Language model (LM) assistants are increasingly used in applications such as brainstorming and research. Improvements in memory and context size have allowed these models to become more auto...
Plan-and-Write: Structure-Guided Length Control for LLMs without Model Retraining : Abstract: Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries n...
Towards Robust Mathematical Reasoning : Abstract: Finding the right north-star metrics is highly critical for advancing the mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too ...
Tool-to-Agent Retrieval: Bridging Tools and Agents for Scalable LLM Multi-Agent Systems : Abstract: Recent advances in LLM Multi-Agent Systems enable scalable orchestration of sub-agents, each coordinating hundreds or thousands of tools or Model Context Protocol (MCP) servers. However, exi...
Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment : Abstract: Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbala...
Multimodal Detection of Fake Reviews using BERT and ResNet-50 : Abstract: In the current digital commerce landscape, user-generated reviews play a critical role in shaping consumer behavior, product reputation, and platform credibility. However, the proliferation ...
Wayfinding through the AI wilderness: Mapping rhetorics of ChatGPT prompt writing on X (formerly Twitter) to promote critical AI literacies : Abstract: In this paper, we demonstrate how studying the rhetorics of ChatGPT prompt writing on social media can promote critical AI literacies. Prompt writing is the process of writing instructions f...
Real-time and Zero-footprint Bag of Synthetic Syllables Algorithm for E-mail Spam Detection Using Subject Line and Short Text Fields : Abstract: Contemporary e-mail services have high availability expectations from the customers and are resource-strained because of the high-volume throughput and spam attacks. Deep Machine Learning ar...
Advancing Cognitive Science with LLMs : Abstract: Cognitive science faces ongoing challenges in knowledge synthesis and conceptual clarity, in part due to its multifaceted and interdisciplinary nature. Recent advances in artificial intellig...
Diverse Human Value Alignment for Large Language Models via Ethical Reasoning : Abstract: Ensuring that Large Language Models (LLMs) align with the diverse and evolving human values across different regions and cultures remains a critical challenge in AI ethics. Current alignment...
\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs : Abstract: Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code...
Structurally Refined Graph Transformer for Multimodal Recommendation : Abstract: Multimodal recommendation systems utilize various types of information, including images and text, to enhance the effectiveness of recommendations. The key challenge is predicting user purch...
Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks : Abstract: Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel jailbreaks represents a critical challenge in AI...
Contextual Tokenization for Graph Inverted Indices : Abstract: Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph represe...
A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs : Abstract: Finding equilibrium points in continuous minmax games has become a key problem within machine learning, in part due to its connection to the training of generative adversarial networks and r...
Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials : Abstract: We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction. We adopt a set of prior distributions that resolve identifiability issues...
Complex QA and language models hybrid architectures, Survey : Abstract: This paper reviews the state-of-the-art of large language models (LLM) architectures and strategies for "complex" question-answering with a focus on hybrid architectures. LLM based chatbot s...
On the Variance, Admissibility, and Stability of Empirical Risk Minimization : Abstract: It is well known that Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error (Birg\'e and Massart, 1993). In this paper, we prove that, unde...
Knolling Bot: Teaching Robots the Human Notion of Tidiness : Abstract: For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Huma...
Hybrid-Task Meta-Learning: A GNN Approach for Scalable and Transferable Bandwidth Allocation : Abstract: In this paper, we develop a deep learning-based bandwidth allocation policy that is: 1) scalable with the number of users and 2) transferable to different communication scenarios, such as no...
Ocean Wave Forecasting with Deep Learning as Alternative to Conventional Models : Abstract: This study presents OceanCastNet (OCN), a machine learning approach for wave forecasting that incorporates wind and wave fields to predict significant wave height, mean wave period, and mean...
Memory-Enhanced Neural Solvers for Routing Problems : Abstract: Routing Problems are central to many real-world applications, yet remain challenging due to their (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off bet...
Inducing Riesz and orthonormal bases in $L^2$ via composition operators : Abstract: Let $C_h$ be a composition operator mapping $L^2(\Omega_1)$ into $L^2(\Omega_2)$ for some open sets $\Omega_1, \Omega_2 \subseteq \mathbb{R}^n$. We characterize the mappings $h$ that transfo...
Augmenting learning in neuro-embodied systems through neurobiological first principles : Abstract: Recent progress in artificial intelligence (AI) has been driven by insights from physics and neuroscience, particularly through the development of artificial neural networks (ANNs) capable o...
SLIP: Securing LLMs IP Using Weights Decomposition : Abstract: Large language models (LLMs) have recently seen widespread adoption in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting substanti...
DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding : Abstract: Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream util...
Application of Langevin Dynamics to Advance the Quantum Natural Gradient Optimization Algorithm : Abstract: A Quantum Natural Gradient (QNG) algorithm for optimization of variational quantum circuits has been proposed recently. In this study, we employ the Langevin equation with a QNG stochastic f...
Automate Strategy Finding with LLM in Quant Investment : Abstract: We present a novel three-stage framework leveraging Large Language Models (LLMs) within a risk-aware multi-agent system for automate strategy finding in quantitative finance. Our approach ad...
R+R: Revisiting Static Feature-Based Android Malware Detection using Machine Learning : Abstract: Static feature-based Android malware detection using machine learning (ML) remains critical due to its scalability and efficiency. However, existing approaches often overlook security-critic...
IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages? : Abstract: Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on...
Federated Vision-Language-Recommendation with Personalized Fusion : Abstract: Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device i...
STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack : Abstract: Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Gen...
Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix : Abstract: Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusiv...
Learning Nonholonomic Dynamics with Constraint Discovery : Abstract: We consider learning nonholonomic dynamical systems while discovering the constraints, and describe in detail the case of the rolling disk. A nonholonomic system is a system subject to nonho...
High Resolution Seismic Waveform Generation using Denoising Diffusion : Abstract: Accurate prediction and synthesis of seismic waveforms are crucial for seismic-hazard assessment and earthquake-resistant infrastructure design. Existing prediction methods, such as ground-m...
Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity : Abstract: Out-of-distribution (OOD) detection is essential for ensuring the reliability and safety of machine learning systems. In recent years, it has received increasing attention, particularly thro...
Optimal Execution with Reinforcement Learning : Abstract: This study investigates the development of an optimal execution strategy through reinforcement learning, aiming to determine the most effective approach for traders to buy and sell inventory...
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift : Abstract: Transformer-based architectures have recently advanced the image reconstruction quality of super-resolution (SR) models. Yet, their scalability remains limited by quadratic attention costs a...
TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding : Abstract: Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods oft...
FLRONet: Deep Operator Learning for High-Fidelity Fluid Flow Field Reconstruction from Sparse Sensor Measurements : Abstract: Reconstructing high-fidelity fluid flow fields from sparse sensor measurements is vital for many science and engineering applications but remains challenging because of dimensional dispariti...
Wait-Less Offline Tuning and Re-solving for Online Decision Making : Abstract: Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear pr...
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement : Abstract: We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling...
A Study in Dataset Distillation for Image Super-Resolution : Abstract: Dataset distillation aims to compress large datasets into compact yet highly informative subsets that preserve the training behavior of the original data. While this concept has gained tract...
THFlow: A Temporally Hierarchical Flow Matching Framework for 3D Peptide Design : Abstract: Deep generative models provide a promising approach to de novo 3D peptide design. Most of them jointly model the distributions of peptide's position, orientation, and conformation, attemptin...
Tool and Tutor? Experimental evidence from AI deployment in cancer diagnosis : Abstract: Numerous countries globally face shortages of medical experts, deepening inequalities in access to healthcare. Artificial Intelligence (AI)-based diagnostic tools hold considerable promise t...
Differential privacy guarantees of Markov chain Monte Carlo algorithms : Abstract: This paper aims to provide differential privacy (DP) guarantees for Markov chain Monte Carlo (MCMC) algorithms. In a first part, we establish DP guarantees on samples output by MCMC algorith...
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory : Abstract: Strategic decision-making involves interactive reasoning where agents adapt their choices in response to others, yet existing evaluations of large language models (LLMs) often emphasize Nash...
Damper-B-PINN: Damper Characteristics-Based Bayesian Physics-Informed Neural Network for Vehicle State Estimation : Abstract: Accurate state estimation is fundamental to intelligent vehicles. Wheel load, one of the most important chassis states, serves as an essential input for advanced driver assistance systems (A...
MarsLGPR: Mars Rover Localization with Ground Penetrating Radar : Abstract: In this work, we propose the use of Ground Penetrating Radar (GPR) for rover localization on Mars. Precise pose estimation is an important task for mobile robots exploring planetary surfaces...
Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound : Abstract: Accurate segmentation of nodules in both 2D breast ultrasound (BUS) and 3D automated breast ultrasound (ABUS) is crucial for clinical diagnosis and treatment planning. Therefore, developing ...
Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning : Abstract: Imitation learning is a popular method for teaching robots new behaviors. However, most existing methods focus on teaching short, isolated skills rather than long, multi-step tasks. To bridg...
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? : Abstract: The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away...
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation : Abstract: Multilingual speech translation (ST) and machine translation (MT) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specia...
Transforming Hyperspectral Images Into Chemical Maps: A Novel End-to-End Deep Learning Approach : Abstract: Current approaches to chemical map generation from hyperspectral images are based on models such as partial least squares (PLS) regression, generating pixel-wise predictions that do not cons...
MARFT: Multi-Agent Reinforcement Fine-Tuning : Abstract: LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticate...
Behavior of prediction performance metrics with rare events : Abstract: Objective: Area under the receiving operator characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC mi...
A Genealogy of Foundation Models in Remote Sensing : Abstract: Foundation models have garnered increasing attention for representation learning in remote sensing. Many such foundation models adopt approaches that have demonstrated success in computer vi...
Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks : Abstract: The growing luminosity frontier at the Large Hadron Collider is challenging the reconstruction and analysis of particle collision events. Increased particle multiplicities are straining late...
Characterization and Learning of Causal Graphs from Hard Interventions : Abstract: A fundamental challenge in the empirical sciences involves uncovering causal structure through observation and experimentation. Causal discovery entails linking the conditional independence ...
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions : Abstract: A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Cons...
New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models : Abstract: Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data ...
Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators : Abstract: Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, p...
Exploring the limits of strong membership inference attacks on large language models : Abstract: State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs)....
When Models Don't Collapse: On the Consistency of Iterative MLE : Abstract: The widespread use of generative models has created a feedback loop, in which each generation of models is trained on data partially produced by its predecessors. This process has raised con...
Exploring the Hidden Capacity of LLMs for One-Step Text Generation : Abstract: A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one trained input embedd...
Spatial Knowledge Graph-Guided Multimodal Synthesis : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To addres...
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals : Abstract: Foundation Models (FMs) are large-scale, pre-trained artificial intelligence (AI) systems that have revolutionized natural language processing and computer vision, and are now advancing geos...
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences : Abstract: Measuring alignment between language and vision is a fundamental challenge, especially as multimodal data becomes increasingly detailed and complex. Existing methods often rely on collecting...
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression : Abstract: We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monoton...
Distributionally Robust Wireless Semantic Communication with Large AI Models : Abstract: Semantic communication (SemCom) has emerged as a promising paradigm for 6G wireless systems by transmitting task-relevant information rather than raw bits, yet existing approaches remain vul...
Lorica: A Synergistic Fine-Tuning Framework for Advancing Personalized Adversarial Robustness : Abstract: The growing use of large pre-trained models in edge computing has made model inference on mobile clients both feasible and popular. Yet these devices remain vulnerable to adversarial attacks...
Transferring Linear Features Across Language Models With Model Stitching : Abstract: In this work, we demonstrate that affine mappings between residual streams of language models is a cheap way to effectively transfer represented features between models. We apply this techni...
MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification : Abstract: We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch ...
Solving Inequality Proofs with Large Language Models : Abstract: Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes ...
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography : Abstract: Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the z...
Schr\"odinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres : Abstract: Recent advances in flow-based generative modelling have provided scalable methods for computing the Schr\"odinger Bridge (SB) between distributions, a dynamic form of entropy-regularised Opt...
Representation Consistency for Accurate and Coherent LLM Answer Aggregation : Abstract: Test-time scaling improves large language models' (LLMs) performance by allocating more compute budget during inference. To achieve this, existing methods often require intricate modificatio...
Context Tuning for In-Context Optimization : Abstract: We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based ada...
On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study : Abstract: Recent advances in natural language processing highlight two key factors for improving reasoning in large language models (LLMs): (i) allocating more test-time compute tends to help on harde...
Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval : Abstract: Scientific paper retrieval, particularly framed as document-to-document retrieval, aims to identify relevant papers in response to a long-form query paper, rather than a short query string. ...
SimKey: A Semantically Aware Key Module for Watermarking Language Models : Abstract: The rapid spread of text generated by large language models (LLMs) makes it increasingly difficult to distinguish authentic human writing from machine output. Watermarking offers a promising...
MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models : Abstract: We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theo...
A Free Probabilistic Framework for Denoising Diffusion Models: Entropy, Transport, and Reverse Processes : Abstract: This paper develops a rigorous probabilistic framework that extends denoising diffusion models to the setting of noncommutative random variables. Building on Voiculescu's theory of free entr...
PlotCraft: Pushing the Limits of LLMs for Complex and Interactive Data Visualization : Abstract: Recent Large Language Models (LLMs) have demonstrated remarkable profi- ciency in code generation. However, their ability to create complex visualiza- tions for scaled and structured data re...
A probabilistic view on Riemannian machine learning models for SPD matrices : Abstract: The goal of this paper is to show how different machine learning tools on the Riemannian manifold $\mathcal{P}_d$ of Symmetric Positive Definite (SPD) matrices can be united under a probabil...
An Effective Flow-based Method for Positive-Unlabeled Learning: 2-HNC : Abstract: In many scenarios of binary classification, only positive instances are provided in the training data, leaving the rest of the data unlabeled. This setup, known as positive-unlabeled (PU) le...
Evaluating Simplification Algorithms for Interpretability of Time Series Classification : Abstract: In this work, we introduce metrics to evaluate the use of simplified time series in the context of interpretability of a TSC - a Time Series Classifier. Such simplifications are important be...
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning : Abstract: Motivated by practical applications where stable long-term performance is critical-such as robotics, operations research, and healthcare-we study the problem of distributionally robust (DR) ...
Learning Repetition-Invariant Representations for Polymer Informatics : Abstract: Polymers are large macromolecules composed of repeating structural units known as monomers and are widely applied in fields such as energy storage, construction, medicine, and aerospace. How...
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate ``t...
Multi-head Temporal Latent Attention : Abstract: While Transformer self-attention offers strong parallelism, the Key-Value (KV) cache grows linearly with sequence length and becomes a bottleneck for inference efficiency. Multi-head latent ...
Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models : Abstract: What is the shortest path between two data points lying in a high-dimensional space? While the answer is trivial in Euclidean geometry, it becomes significantly more complex when the data li...
PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models : Abstract: The rapid advancement of generative AI has provided users with a wide range of well-trained models to address diverse prompts. When selecting a model for a given prompt, users should weigh n...
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies : Abstract: Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatoria...
Differentiable Generalized Sliced Wasserstein Plans : Abstract: Optimal Transport (OT) has attracted significant interest in the machine learning community, not only for its ability to define meaningful distances between probability distributions -- such...
Diversity-Aware Policy Optimization for Large Language Model Reasoning : Abstract: The reasoning capabilities of large language models (LLMs) have advanced rapidly, particularly following the release of DeepSeek R1, which has inspired a surge of research into data quality ...
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning : Abstract: In-context learning, the ability of large language models to perform tasks using only examples provided in the prompt, has recently been adapted for time series forecasting. This paradigm en...
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models : Abstract: Equivariant neural networks have proven to be effective for tasks with known underlying symmetries. However, optimizing equivariant networks can be tricky and best training practices are les...
Tight analyses of first-order methods with error feedback : Abstract: Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged...
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning : Abstract: We aim to improve the reasoning capabilities of language models via reinforcement learning (RL). Recent RL post-trained models like DeepSeek-R1 have demonstrated reasoning abilities on mathe...
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments : Abstract: This paper introduces an efficient Vision-Language Model (VLM) pipeline specifically optimized for deployment on embedded devices, such as those used in robotics and autonomous driving. The ...
Information-Theoretic Framework for Understanding Modern Machine-Learning : Abstract: We introduce an information-theoretic framework that views learning as universal prediction under log loss, characterized through regret bounds. Central to the framework is an effective noti...
Flat Channels to Infinity in Neural Loss Landscapes : Abstract: The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the ...
Over-squashing in Spatiotemporal Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have achieved remarkable success across various domains. However, recent theoretical advances have identified fundamental limitations in their information propag...
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation : Abstract: The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been ...
CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning : Abstract: Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and compo...
Distributionally Robust Optimization with Adversarial Data Contamination : Abstract: Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training dat...
RL Fine-Tuning Heals OOD Forgetting in SFT : Abstract: The two-stage fine-tuning paradigm of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has empirically shown better reasoning performance than one-stage SFT for the post-...
SemBench: A Benchmark for Semantic Query Processing Engines : Abstract: We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large...
Probabilistic Robustness for Free? Revisiting Training via a Benchmark : Abstract: Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenari...
A Proof of Learning Rate Transfer under $\mu$P : Abstract: We provide the first proof of learning rate transfer with width in a linear multi-layer perceptron (MLP) parametrized with $\mu$P, a neural network parameterization designed to ``maximize'' ...
ADNAC: Audio Denoiser using Neural Audio Codec : Abstract: Audio denoising is critical in signal processing, enhancing intelligibility and fidelity for applications like restoring musical recordings. This paper presents a proof-of-concept for adapti...
Hybrid Neural Network-Based Indoor Localisation System for Mobile Robots Using CSI Data in a Robotics Simulator : Abstract: We present a hybrid neural network model for inferring the position of mobile robots using Channel State Information (CSI) data from a Massive MIMO system. By leveraging an existing CSI data...
Disciplined Biconvex Programming : Abstract: We introduce disciplined biconvex programming (DBCP), a modeling framework for specifying and solving biconvex optimization problems. Biconvex optimization problems arise in various applicat...
KV Cache Transform Coding for Compact Storage in LLM Inference : Abstract: Serving large language models (LLMs) at scale necessitates efficient key-value (KV) cache management. KV caches can be reused across conversation turns via shared-prefix prompts that are com...
Simulating Environments with Reasoning Models for Agent Training : Abstract: LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas....
Proximal Regret and Proximal Correlated Equilibria: A New Tractable Solution Concept for Online Learning and Games : Abstract: Learning and computation of equilibria are central problems in algorithmic game theory. In this work, we introduce proximal regret, a new notion of regret based on proximal operators that li...
Ranking hierarchical multi-label classification results with mLPRs : Abstract: Hierarchical multi-label classification (HMC) has gained considerable attention in recent decades. A seminal line of HMC research addresses the problem in two stages: first, training individ...
You Are the Best Reviewer of Your Own Papers: The Isotonic Mechanism : Abstract: Machine learning (ML) and artificial intelligence (AI) conferences including NeurIPS and ICML have experienced a significant decline in peer review quality in recent years. To address this g...
Algorithmic Assistance with Recommendation-Dependent Preferences : Abstract: When an algorithm provides risk assessments, we typically think of them as helpful inputs to human decisions, such as when risk scores are presented to judges or doctors. However, a decision...
Neighboring State-based Exploration for Reinforcement Learning : Abstract: Reinforcement Learning is a powerful tool to model decision-making processes. However, it relies on an exploration-exploitation trade-off that remains an open challenge for many tasks. In th...
ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models : Abstract: Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can b...
Is Risk-Sensitive Reinforcement Learning Properly Resolved? : Abstract: Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by le...
APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks : Abstract: Activation function is a pivotal component of deep learning, facilitating the extraction of intricate data patterns. While classical activation functions like ReLU and its variants are exten...
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints : Abstract: Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models l...
Calibrating Bayesian Learning via Regularization, Confidence Minimization, and Selective Inference : Abstract: The application of artificial intelligence (AI) models in fields such as engineering is limited by the known difficulty of quantifying the reliability of an AI's decision. A well-calibrated ...
SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting : Abstract: Time series forecasting has made significant advances, including with Transformer-based models. The attention mechanism in Transformer effectively captures temporal dependencies by attending...
Learning Diffusion Priors from Observations by Expectation Maximization : Abstract: Diffusion models recently proved to be remarkable priors for Bayesian inverse problems. However, training these models typically requires access to large amounts of clean data, which could p...
Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT : Abstract: This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quanti...
Bellman Diffusion Models : Abstract: Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitati...
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships : Abstract: While interpretability methods identify a model's learned concepts, they overlook the relationships between concepts that make up its abstractions and inform its ability to generalize to new...
Gymnasium: A Standard Interface for Reinforcement Learning Environments : Abstract: Reinforcement Learning (RL) is a continuously growing field that has the potential to revolutionize many areas of artificial intelligence. However, despite its promise, RL research is often ...
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection : Abstract: Malicious attacks on vehicular networks pose a serious threat to road safety as well as communication reliability. A major source of these threats stems from misbehaving vehicles within the ...
FairAIED: Navigating Fairness, Bias, and Ethics in Educational AI Applications : Abstract: The integration of AI in education holds immense potential for personalizing learning experiences and transforming instructional practices. However, AI systems can inadvertently encode and a...
Dataset Distillation for Offline Reinforcement Learning : Abstract: Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train ...
Lyapunov Neural ODE State-Feedback Control Policies : Abstract: Deep neural networks are increasingly used as an effective parameterization of control policies in various learning-based control paradigms. For continuous-time optimal control problems (OCP...
Neural Entropy : Abstract: We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfe...
AI-Guided Molecular Simulations in VR: Exploring Strategies for Imitation Learning in Hyperdimensional Molecular Systems : Abstract: Molecular dynamics (MD) simulations are a crucial computational tool for researchers to understand and engineer molecular structure and function in areas such as drug discovery, protein engi...
From Epilepsy Seizures Classification to Detection: A Deep Learning-based Approach for Raw EEG Signals : Abstract: Epilepsy represents the most prevalent neurological disease in the world. One-third of people suffering from mesial temporal lobe epilepsy (MTLE) exhibit drug resistance, urging the need to ...
Mechanism Learning: reverse causal inference in the presence of multiple unknown confounding through causally weighted Gaussian mixture models : Abstract: A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables. In high-stakes automation ap...
DuSEGO: Dual Second-order Equivariant Graph Ordinary Differential Equation : Abstract: Graph Neural Networks (GNNs) with equivariant properties have achieved significant success in modeling complex dynamic systems and molecular properties. However, their expressiveness ability...
Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification : Abstract: Time series classification is a relevant step supporting decision-making processes in various domains, and deep neural models have shown promising performance in this respect. Despite signif...
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research : Abstract: "Machine unlearning" is a popular proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons, including privacy, copyright, safet...
Low-Rank Adaptation for Foundation Models: A Comprehensive Review : Abstract: The rapid advancement of foundation modelslarge-scale neural networks trained on diverse, extensive datasetshas revolutionized artificial intelligence, enabling unprecedented advancements ac...
Deep Modularity Networks with Diversity-Preserving Regularization : Abstract: Graph clustering plays a crucial role in graph representation learning but often faces challenges in achieving feature-space diversity. While Deep Modularity Networks (DMoN) leverage modular...
Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture : Abstract: Gradient descent for matrix factorization exhibits an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the...
E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products : Abstract: Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs fa...
Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence : Abstract: Message-passing Graph Neural Networks (GNNs) are often criticized for their limited expressiveness, issues like over-smoothing and over-squashing, and challenges in capturing long-range depe...
Efficient Neural SDE Training using Wiener-Space Cubature : Abstract: A neural stochastic differential equation (SDE) is an SDE with drift and diffusion terms parametrized by neural networks. The training procedure for neural SDEs consists of optimizing the SD...
Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving : Abstract: Vehicle-to-everything technologies (V2X) have become an ideal paradigm to extend the perception range and see through the occlusion. Exiting efforts focus on single-frame cooperative percept...
Electrical Load Forecasting over Multihop Smart Metering Networks with Federated Learning : Abstract: Electric load forecasting is essential for power management and stability in smart grids. This is mainly achieved via advanced metering infrastructure, where smart meters (SMs) record househ...
Split Gibbs Discrete Diffusion Posterior Sampling : Abstract: We study the problem of posterior sampling in discrete-state spaces using discrete diffusion models. While posterior sampling methods for continuous diffusion models have achieved remarkable...
Understanding Endogenous Data Drift in Adaptive Models with Recourse-Seeking Users : Abstract: Deep learning models are widely used in decision-making and recommendation systems, where they typically rely on the assumption of a static data distribution between training and deployment....
Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence : Abstract: An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like La...
PolyG: Adaptive Graph Traversal for Diverse GraphRAG Questions : Abstract: GraphRAG enhances large language models (LLMs) to generate quality answers for user questions by retrieving related facts from external knowledge graphs. However, current GraphRAG methods ar...
Trustworthy AI Must Account for Interactions : Abstract: Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. Ultim...
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models : Abstract: Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious hallucination issu...
Reinforcement Learning from Human Feedback : Abstract: Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentl...
A Basic Evaluation of Neural Networks Trained with the Error Diffusion Learning Algorithm : Abstract: This paper presents a comprehensive formulation of Kaneko's Error Diffusion Learning Algorithm (EDLA) and evaluates its effectiveness across parity check, regression, and image classificatio...
Hyper-Transforming Latent Diffusion Models : Abstract: We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Unlike prior ap...
Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search : Abstract: Efficient optimization remains a fundamental challenge across numerous scientific and engineering domains, especially when objective function and gradient evaluations are computationally exp...
Chronic Diseases Prediction using Machine Learning and Deep Learning Methods : Abstract: Chronic diseases, such as cardiovascular disease, diabetes, chronic kidney disease, and thyroid disorders, are the leading causes of premature mortality worldwide. Early detection and interv...
End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning : Abstract: Cardiac ultrasound (US) is among the most widely used diagnostic tools in cardiology for assessing heart health, but its effectiveness is limited by operator dependence, time constraints, an...
Integrating ConvNeXt and Vision Transformers for Enhancing Facial Age Estimation : Abstract: Age estimation from facial images is a complex and multifaceted challenge in computer vision. In this study, we present a novel hybrid architecture that combines ConvNeXt, a state-of-the-art...
ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus : Abstract: The Abstraction and Reasoning Corpus remains one of the most compelling and challenging benchmarks for tracking progress toward achieving Artificial General Intelligence. In contrast to othe...
Generative Modeling Enables Molecular Structure Retrieval from Coulomb Explosion Imaging : Abstract: Capturing the structural changes that molecules undergo during chemical reactions in real space and time is a long-standing dream and an essential prerequisite for understanding and ultimate...
ParaScopes: What do Language Models Activations Encode About Future Text? : Abstract: Interpretability studies in language models often investigate forward-looking representations of activations. However, as language models become capable of doing ever longer time horizon tas...
A Retrospect to Multi-prompt Learning across Vision and Language : Abstract: The vision community is undergoing the unprecedented progress with the emergence of Vision-Language Pretraining Models (VLMs). Prompt learning plays as the holy grail of accessing VLMs since...
Reducing Robotic Upper-Limb Assessment Time While Maintaining Precision: A Time Series Foundation Model Approach : Abstract: Purpose: Visually Guided Reaching (VGR) on the Kinarm robot yields sensitive kinematic biomarkers but requires 40-64 reaches, imposing time and fatigue burdens. We evaluate whether time-seri...
Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification : Abstract: ``Vibe coding'' -- the practice of developing software through iteratively conversing with a large language model (LLM) -- has exploded in popularity within the last year. However, developer...
Transfer learning discovery of molecular modulators for perovskite solar cells : Abstract: The discovery of effective molecular modulators is essential for advancing perovskite solar cells (PSCs), but the research process is hindered by the vastness of chemical space and the time-...
Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data : Abstract: Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting met...
NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion : Abstract: Everyday speech conveys far more than words, it reflects who we are, how we feel, and the circumstances surrounding our interactions. Yet, most existing speech datasets are acted, limited in...
Advancing AI Challenges for the United States Department of the Air Force : Abstract: The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundament...
IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval : Abstract: Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two ta...
POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation : Abstract: Sign language translation remains a challenging task due to the scarcity of large-scale, sentence-aligned datasets. Prior arts have focused on various feature extraction and architectural ch...
LongCat-Flash-Omni Technical Report : Abstract: We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspir...
Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models : Abstract: Lightweight vision classification models such as MobileNet, ShuffleNet, and EfficientNet are increasingly deployed in mobile and embedded systems, yet their performance has been predominantl...
Split Learning-Enabled Framework for Secure and Light-weight Internet of Medical Things Systems : Abstract: The rapid growth of Internet of Medical Things (IoMT) devices has resulted in significant security risks, particularly the risk of malware attacks on resource-constrained devices. Convention...
MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research : Abstract: We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of feat...
OSMGen: Highly Controllable Satellite Image Synthesis using OpenStreetMap Data : Abstract: Accurate and up-to-date geospatial data are essential for urban planning, infrastructure monitoring, and environmental management. Yet, automating urban monitoring remains difficult because ...
Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks : Abstract: The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this work, we propose a novel approach to craftin...
Mind the Gap: Missing Cyber Threat Coverage in NIDS Datasets for the Energy Sector : Abstract: Network Intrusion Detection Systems (NIDS) developed us- ing publicly available datasets predominantly focus on enterprise environ- ments, raising concerns about their effectiveness for conv...
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection : Abstract: High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data u...
A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications : Abstract: Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogat...
Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs : Abstract: Organizations are increasingly adopting and adapting Large Language Models (LLMs) hosted on public repositories such as HuggingFace. Although these adaptations often improve performance on s...
MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts : Abstract: Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts -- a prerequisite for safe deployment -- remain...
Trust-Region Methods with Low-Fidelity Objective Models : Abstract: We introduce two multifidelity trust-region methods based on the Magical Trust Region (MTR) framework. MTR augments the classical trust-region step with a secondary, informative direction. I...
ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training : Abstract: The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from large-scale web data through self-supervised con...
Towards Reliable Pediatric Brain Tumor Segmentation: Task-Specific nnU-Net Enhancements : Abstract: Accurate segmentation of pediatric brain tumors in multi-parametric magnetic resonance imaging (mpMRI) is critical for diagnosis, treatment planning, and monitoring, yet faces unique challen...
FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts : Abstract: In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and ...
Accuracy estimation of neural networks by extreme value theory : Abstract: Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between...
Multi-refined Feature Enhanced Sentiment Analysis Using Contextual Instruction : Abstract: Sentiment analysis using deep learning and pre-trained language models (PLMs) has gained significant traction due to their ability to capture rich contextual representations. However, existi...
Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control : Abstract: Several studies have employed reinforcement learning (RL) to address the challenges of regional adaptive traffic signal control (ATSC) and achieved promising results. In this field, existing...
Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization : Abstract: Automatic code optimization remains a difficult challenge, particularly for complex loop nests on modern hardware. This paper investigates a novel approach to code optimization where Large L...
Node Preservation and its Effect on Crossover in Cartesian Genetic Programming : Abstract: While crossover is a critical and often indispensable component in other forms of Genetic Programming, such as Linear- and Tree-based, it has consistently been claimed that it deteriorates s...
DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching : Abstract: Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces tha...
Filtered Neural Galerkin model reduction schemes for efficient propagation of initial condition uncertainties in digital twins : Abstract: Uncertainty quantification in digital twins is critical to enable reliable and credible predictions beyond available data. A key challenge is that ensemble-based approaches can become prohib...
Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal? : Abstract: In this paper, we introduce a model for analyzing deep learning optimization over a single iteration by leveraging the matrix structure of the weights. We derive the model by assuming isotro...
Metadata-Aligned 3D MRI Representations for Contrast Understanding and Quality Control : Abstract: Magnetic Resonance Imaging suffers from substantial data heterogeneity and the absence of standardized contrast labels across scanners, protocols, and institutions, which severely limits lar...
SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations : Abstract: The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited ...
A CPU-Centric Perspective on Agentic AI : Abstract: Agentic AI frameworks add a decision-making orchestrator embedded with external tools, including web search, Python interpreter, contextual database, and others, on top of monolithic LLMs, t...
Correspondence Between Ising Machines and Neural Networks : Abstract: Computation with the Ising model is central to future computing technologies like quantum annealing, adiabatic quantum computing, and thermodynamic classical computing. Traditionally, comput...
Trust Region-Based Bayesian Optimisation to Discover Diverse Solutions : Abstract: Bayesian optimisation (BO) is a surrogate-based optimisation technique that efficiently solves expensive black-box functions with small evaluation budgets. Recent studies consider trust regi...
A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks : Abstract: Measuring similarity in urban spatial networks is key to understanding cities as complex systems. Yet most existing methods are not tailored for spatial networks and struggle to differentiat...
Reliable Curation of EHR Dataset via Large Language Models under Environmental Constraints : Abstract: Electronic health records (EHRs) are central to modern healthcare delivery and research; yet, many researchers lack the database expertise necessary to write complex SQL queries or generate ...
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs : Abstract: Maximizing training throughput and cost-efficiency of RL for LLMs is essential to democratize this advanced technique. One promising but challenging approach is to deploy such a computationa...
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents : Abstract: With the software industry shifting toward a data-driven culture, online A/B testing is a key tool for evaluating new technologies. However, deploying such experiments requires substantial r...
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding : Abstract: Graphical user interface (GUI) grounding is a key function of computer-use agents, which maps natural-language instructions to actionable screen regions. Existing approaches based on Multimo...
Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning : Abstract: Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive ...
Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection : Abstract: Out-of-distribution (OOD) detection is essential for deploying deep learning models in open-world environments. Existing approaches, such as energy-based scoring and gradient-projection meth...
Occlusion-Aware Diffusion Model for Pedestrian Intention Prediction : Abstract: Predicting pedestrian crossing intentions is crucial for the navigation of mobile robots and intelligent vehicles. Although recent deep learning-based models have shown significant success i...
Controlling Gender Bias in Retrieval via a Backpack Architecture : Abstract: The presence of social biases in large language models (LLMs) has become a significant concern in AI research. These biases, often embedded in training data, can perpetuate harmful stereotyp...
Assessing LLM Reasoning Steps via Principal Knowledge Grounding : Abstract: Step-by-step reasoning has become a standard approach for large language models (LLMs) to tackle complex tasks. While this paradigm has proven effective, it raises a fundamental question: Ho...
Deep Generative Models for Enhanced Vitreous OCT Imaging : Abstract: Purpose: To evaluate deep learning (DL) models for enhancing vitreous optical coherence tomography (OCT) image quality and reducing acquisition time. Methods: Conditional Denoising Diffusion...
HEATNETs: Explainable Random Feature Neural Networks for High-Dimensional Parabolic PDEs : Abstract: We deal with the solution of the forward problem for high-dimensional parabolic PDEs with random feature (projection) neural networks (RFNNs). We first prove that there exists a single-hidde...
Android Malware Detection: A Machine Leaning Approach : Abstract: This study examines machine learning techniques like Decision Trees, Support Vector Machines, Logistic Regression, Neural Networks, and ensemble methods to detect Android malware. The study ...
Towards Channel Charting Enhancement with Non-Reconfigurable Intelligent Surfaces : Abstract: We investigate how fully-passive electromagnetic skins (EMSs) can be engineered to enhance channel charting (CC) in dense urban environments. We employ two complementary state-of-the-art CC ...
Aligning LLM agents with human learning and adjustment behavior: a dual agent approach : Abstract: Effective modeling of how human travelers learn and adjust their travel behavior from interacting with transportation systems is critical for system assessment and planning. However, this ta...
Transformer-Based Decoding in Concatenated Coding Schemes Under Synchronization Errors : Abstract: We consider the reconstruction of a codeword from multiple noisy copies that are independently corrupted by insertions, deletions, and substitutions. This problem arises, for example, in DNA...
Integrating Visual and X-Ray Machine Learning Features in the Study of Paintings by Goya : Abstract: Art authentication of Francisco Goya's works presents complex computational challenges due to his heterogeneous stylistic evolution and extensive historical patterns of forgery. We introduce...
OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights : Abstract: Artificial intelligence is transforming the sciences, yet general conversational AI systems often generate unverified "hallucinations" undermining scientific rigor. We present OceanAI, a con...
Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer : Abstract: We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often...
Binary perceptron computational gap -- a parametric fl RDT view : Abstract: Recent studies suggest that asymmetric binary perceptron (ABP) likely exhibits the so-called statistical-computational gap characterized with the appearance of two phase transitioning constr...
Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry : Abstract: We extend several recent results providing symmetry-based guarantees for variational inference (VI) with location-scale families. VI approximates a target density~$p$ by the best match $q^*$...
GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction : Abstract: Image geolocalization, the task of determining an image's geographic origin, poses significant challenges, largely due to visual similarities across disparate locations and the large search ...
SliceVision-F2I: A Synthetic Feature-to-Image Dataset for Visual Pattern Representation on Network Slices : Abstract: The emergence of 5G and 6G networks has established network slicing as a significant part of future service-oriented architectures, demanding refined identification methods supported by robu...
Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes : Abstract: Foundational marked temporal point process (MTPP) models, such as the Hawkes process, often use inexpressive model families in order to offer interpretable parameterizations of event data. O...
SLAP: Shortcut Learning for Abstract Planning : Abstract: Long-horizon decision-making with sparse rewards and continuous states and actions remains a fundamental challenge in AI and robotics. Task and motion planning (TAMP) is a model-based framew...
Generative Machine Learning Models for the Deconvolution of Charge Carrier Dynamics in Organic Photovoltaic Cells : Abstract: Charge carrier dynamics critically affect the efficiency and stability of organic photovoltaic devices, but they are challenging to model with traditional analytical methods. We introduce \b...
Learning with Category-Equivariant Architectures for Human Activity Recognition : Abstract: We propose CatEquiv, a category-equivariant neural network for Human Activity Recognition (HAR) from inertial sensors that systematically encodes temporal, amplitude, and structural symmetri...
Few-Shot Multimodal Medical Imaging: A Theoretical Framework : Abstract: Medical imaging relies heavily on large, labeled datasets. But, unfortunately, they are not always easily accessible in clinical settings. Additionally, many practitioners often face various...
Stability of the Kim--Milman flow map : Abstract: In this short note, we characterize stability of the Kim--Milman flow map -- also known as the probability flow ODE -- with respect to variations in the target measure. Rather than the Wasse...
Learning When to Quit in Sales Conversations : Abstract: Salespeople frequently face the dynamic screening decision of whether to persist in a conversation or abandon it to pursue the next lead. Yet, little is known about how these decisions are m...
Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning : Abstract: Test-time reinforcement learning (TTRL) offers a label-free paradigm for adapting models using only synthetic signals at inference, but its success hinges on constructing reliable learning s...
An Interdisciplinary and Cross-Task Review on Missing Data Imputation : Abstract: Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social...
Quantum Deep Learning Still Needs a Quantum Leap : Abstract: Quantum computing technology is advancing rapidly. Yet, even accounting for these trends, a quantum leap would be needed for quantum computers to mean- ingfully impact deep learning over the...
MotionStream: Real-Time Video Generation with Interactive Motion Controls : Abstract: Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream...
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift : Abstract: Pretrained Transformers excel at in-context learning (ICL), inferring new tasks from only a handful of examples. Yet, their ICL performance can degrade sharply under distribution shift betwe...
Black-Box Differentially Private Nonparametric Confidence Intervals Under Minimal Assumptions : Abstract: We introduce a simple, general framework that takes any differentially private estimator of any arbitrary quantity as a black box, and from it constructs a differentially private nonparametr...
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they o...
A semantic-based deep learning approach for mathematical expression retrieval : Abstract: Mathematical expressions (MEs) have complex two-dimensional structures in which symbols can be present at any nested depth like superscripts, subscripts, above, below etc. As MEs are represe...
Extremal Contours: Gradient-driven contours for compact visual attribution : Abstract: Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, ...
Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions : Abstract: By reducing resolution, coarse-grained models greatly accelerate molecular simulations, unlocking access to long-timescale phenomena, though at the expense of microscopic information. Recove...
Quantum Blackwell's Ordering and Differential Privacy : Abstract: We develop a framework for quantum differential privacy (QDP) based on quantum hypothesis testing and Blackwell's ordering. This approach characterizes $(\eps,\delta)$-QDP via hypothesis tes...
Deep Learning Prediction of Beam Coherence Time for Near-FieldTeraHertz Networks : Abstract: Large multiple antenna arrays coupled with accu- rate beamforming are essential in terahertz (THz) communi- cations to ensure link reliability. However, as the number of antennas increases, ...
Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning : Abstract: Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms ...
Fast, memory-efficient genomic interval tokenizers for modern machine learning : Abstract: Introduction: Epigenomic datasets from high-throughput sequencing experiments are commonly summarized as genomic intervals. As the volume of this data grows, so does interest in analyzing it...
Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed Systems : Abstract: Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. Training high-p...
L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3 : Abstract: Configuration tuning is critical for database performance. Although recent advancements in database tuning have shown promising results in throughput and latency improvement, challenges rema...
Partial Trace-Class Bayesian Neural Networks : Abstract: Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architec...
EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering : Abstract: Large Language Models (LLMs) are increasingly being applied to specialized, high-stakes domains like engineering, which demands rigorous evaluation of their complex reasoning capabilities. W...
Panther: A Cost-Effective Privacy-Preserving Framework for GNN Training and Inference Services in Cloud Environments : Abstract: Graph Neural Networks (GNNs) have marked significant impact in traffic state prediction, social recommendation, knowledge-aware question answering and so on. As more and more users move towa...
Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach : Abstract: Social scientists are increasingly turning to unstructured datasets to unlock new empirical insights, e.g., estimating causal effects on text outcomes, measuring beliefs from open-ended surv...
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI : Abstract: The character of the "AI assistant" persona generated by modern chatbot large language models influences both surface-level behavior and apparent values, beliefs, and ethics. These all affec...
Solution Space Topology Guides CMTS Search : Abstract: A fundamental question in search-guided AI: what topology should guide Monte Carlo Tree Search (MCTS) in puzzle solving? Prior work applied topological features to guide MCTS in ARC-style ta...
Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement : Abstract: Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model w...
Bayesian Coreset Optimization for Personalized Federated Learning : Abstract: In a distributed machine learning setting like Federated Learning where there are multiple clients involved which update their individual weights to a single central server, often training o...
Dynamic Reconstruction of Ultrasound-Derived Flow Fields With Physics-Informed Neural Fields : Abstract: Blood flow is sensitive to disease and provides insight into cardiac function, making flow field analysis valuable for diagnosis. However, while safer than radiation-based imaging and more s...
No-rank Tensor Decomposition Using Metric Learning : Abstract: Tensor decomposition faces fundamental challenges in analyzing high-dimensional data, where traditional methods based on reconstruction and fixed-rank constraints often fail to capture seman...
Machine and Deep Learning for Indoor UWB Jammer Localization : Abstract: Ultra-wideband (UWB) localization delivers centimeter-scale accuracy but is vulnerable to jamming attacks, creating security risks for asset tracking and intrusion detection in smart buildin...
Towards Multi-Fidelity Scaling Laws of Neural Surrogates in CFD : Abstract: Scaling laws describe how model performance grows with data, parameters and compute. While large datasets can usually be collected at relatively low cost in domains such as language or visio...
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models : Abstract: Vision-Language Models (VLMs) suffer from catastrophic forgetting when sequentially fine-tuned on new tasks, degrading performance on previously learned foundational and task-specific capabi...
Priors in Time: Missing Inductive Biases for Language Model Interpretability : Abstract: Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent d...
Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South : Abstract: Accurate prediction of Reservoir Water Temperature (RWT) is vital for sustainable water management, ecosystem health, and climate resilience. Yet, prediction alone offers limited insight int...
Bridging Lifelong and Multi-Task Representation Learning via Algorithm and Complexity Measure : Abstract: In lifelong learning, a learner faces a sequence of tasks with shared structure and aims to identify and leverage it to accelerate learning. We study the setting where such structure is capt...
Coordinate ascent neural Kalman-MLE for state estimation : Abstract: This paper presents a coordinate ascent algorithm to learn dynamic and measurement models in dynamic state estimation using maximum likelihood estimation in a supervised manner. In particula...
Group-Equivariant Diffusion Models for Lattice Field Theory : Abstract: Near the critical point, Markov Chain Monte Carlo (MCMC) simulations of lattice quantum field theories (LQFT) become increasingly inefficient due to critical slowing down. In this work, we i...
Matrix Phylogeny: Compact Spectral Fingerprints for Trap-Robust Preconditioner Selection : Abstract: Matrix Phylogeny introduces compact spectral fingerprints (CSF/ASF) that characterize matrices at the family level. These fingerprints are low-dimensional, eigendecomposition-free descriptor...
Using machine learning methods to predict cognitive age from psychophysiological tests : Abstract: This study introduces a novel method for predicting cognitive age using psychophysiological tests. To determine cognitive age, subjects were asked to complete a series of psychological tests...
Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model : Abstract: In the context of global sustainability mandates, corporate carbon disclosure has emerged as a critical mechanism for aligning business strategy with environmental responsibility. The Carbon...
On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication : Abstract: Floating-point non-associativity makes fundamental deep learning operations, such as matrix multiplication (matmul) on GPUs, inherently non-deterministic. Despite this, the statistical struc...
Position Paper: If Innovation in AI Systematically Violates Fundamental Rights, Is It Innovation at All? : Abstract: Artificial intelligence (AI) now permeates critical infrastructures and decision-making systems where failures produce social, economic, and democratic harm. This position paper challenges t...
On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning : Abstract: Recent advances in learnable reward shaping have shown promise in single-agent reinforcement learning by automatically discovering effective feedback signals. However, the effectiveness of d...
Graph-Attentive MAPPO for Dynamic Retail Pricing : Abstract: Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinfor...
World Simulation with Video Foundation Models for Physical AI : Abstract: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Ima...
LookSync: Large-Scale Visual Product Search System for AI-Generated Fashion Looks : Abstract: Generative AI is reshaping fashion by enabling virtual looks and avatars making it essential to find real products that best match AI-generated styles. We propose an end-to-end product searc...
PDA-LSTM: Knowledge-driven page data arrangement based on LSTM for LCM supression in QLC 3D NAND flash memories : Abstract: Quarter level cell (QLC) 3D NAND flash memory is emerging as the predominant storage solution in the era of artificial intelligence. QLC 3D NAND flash stores 4 bit per cell to expand the sto...
Forecasting Occupational Survivability of Rickshaw Pullers in a Changing Climate with Wearable Data : Abstract: Cycle rickshaw pullers are highly vulnerable to extreme heat, yet little is known about how their physiological biomarkers respond under such conditions. This study collected real-time weath...
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail : Abstract: End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenario...
QuantumBench: A Benchmark for Quantum Problem Solving : Abstract: Large language models are now integrated into many scientific workflows, accelerating data analysis, hypothesis generation, and design space exploration. In parallel with this growth, there ...
A filtering scheme for confocal laser endomicroscopy (CLE)-video sequences for self-supervised learning : Abstract: Confocal laser endomicroscopy (CLE) is a non-invasive, real-time imaging modality that can be used for in-situ, in-vivo imaging and the microstructural analysis of mucous structures. The dia...
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow : Abstract: Understanding the evolution of cellular microenvironments in spatiotemporal data is essential for deciphering tissue development and disease progression. While experimental techniques like s...
Balanced Multimodal Learning via Mutual Information : Abstract: Multimodal learning has increasingly become a focal point in research, primarily due to its ability to integrate complementary information from diverse modalities. Nevertheless, modality imb...
Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis : Abstract: In recent years, effectively modeling multivariate time series has gained significant popularity, mainly due to its wide range of applications, ranging from healthcare to financial markets a...
None To Optima in Few Shots: Bayesian Optimization with MDP Priors : Abstract: Bayesian Optimization (BO) is an efficient tool for optimizing black-box functions, but its theoretical guarantees typically hold in the asymptotic regime. In many critical real-world applic...
Equality Graph Assisted Symbolic Regression : Abstract: In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutralit...
What's the next frontier for Data-centric AI? Data Savvy Agents : Abstract: The recent surge in AI agents that autonomously communicate, collaborate with humans and use diverse tools has unlocked promising opportunities in various real-world settings. However, a vit...
SARIMAX-Based Power Outage Prediction During Extreme Weather Events : Abstract: This study develops a SARIMAX-based prediction system for short-term power outage forecasting during extreme weather events. Using hourly data from Michigan counties with outage counts and c...
MedEqualizer: A Framework Investigating Bias in Synthetic Medical Data and Mitigation via Augmentation : Abstract: Synthetic healthcare data generation presents a viable approach to enhance data accessibility and support research by overcoming limitations associated with real-world medical datasets. Howe...
Window-Based Feature Engineering for Cognitive Workload Detection : Abstract: Cognitive workload is a topic of increasing interest across various fields such as health, psychology, and defense applications. In this research, we focus on classifying cognitive workload ...
Energy-Efficient Deep Learning Without Backpropagation: A Rigorous Evaluation of Forward-Only Algorithms : Abstract: The long-held assumption that backpropagation (BP) is essential for state-of-the-art performance is challenged by this work. We present rigorous, hardware-validated evidence that the Mono-Fo...
Happiness as a Measure of Fairness : Abstract: In this paper, we propose a novel fairness framework grounded in the concept of happi- ness, a measure of the utility each group gains fromdecisionoutcomes. Bycapturingfairness through this ...
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs : Abstract: This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and wi...
Continual Learning, Not Training: Online Adaptation For Agents : Abstract: Continual Learning (CL) methods have traditionally focused on mitigating catastrophic forgetting through gradient-based retraining, an approach ill-suited for deployed agents that must adapt...
One model to solve them all: 2BSDE families via neural operators : Abstract: We introduce a mild generative variant of the classical neural operator model, which leverages Kolmogorov--Arnold networks to solve infinite families of second-order backward stochastic diff...
Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization : Abstract: Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approac...
Regularization Implies balancedness in the deep linear network : Abstract: We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. ...
Adapt under Attack and Domain Shift: Unified Adversarial Meta-Learning and Domain Adaptation for Robust Automatic Modulation Classification : Abstract: Deep learning has emerged as a leading approach for Automatic Modulation Classification (AMC), demonstrating superior performance over traditional methods. However, vulnerability to adversar...
A Comparative Study of Model Adaptation Strategies for Multi-Treatment Uplift Modeling : Abstract: Uplift modeling has emerged as a crucial technique for individualized treatment effect estimation, particularly in fields such as marketing and healthcare. Modeling uplift effects in multi-t...
Analyzing the Power of Chain of Thought through Memorization Capabilities : Abstract: It has been shown that the chain of thought (CoT) can enhance the power of large language models (LLMs) to solve certain mathematical reasoning problems. However, the capacity of CoT is stil...
Transmitter Identification and Protocol Categorization in Shared Spectrum via Multi-Task RF Classification at the Network Edge : Abstract: As spectrum sharing becomes increasingly vital to meet rising wireless demands in the future, spectrum monitoring and transmitter identification are indispensable for enforcing spectrum usag...
FEval-TTC: Fair Evaluation Protocol for Test-Time Compute : Abstract: The performance of Large Language Models (LLMs) and the associated dollar costs of API calls can fluctuate over time, potentially invalidating conclusions drawn in prior research. To address...
Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations : Abstract: The rapid growth of electric vehicles (EVs) necessitates the strategic placement of charging stations to optimize resource utilization and minimize user inconvenience. Reinforcement learning...
WindMiL: Equivariant Graph Learning for Wind Loading Prediction : Abstract: Accurate prediction of wind loading on buildings is crucial for structural safety and sustainable design, yet conventional approaches such as wind tunnel testing and large-eddy simulation (L...
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization : Abstract: The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable e...
KAT-GNN: A Knowledge-Augmented Temporal Graph Neural Network for Risk Prediction in Electronic Health Records : Abstract: Clinical risk prediction using electronic health records (EHRs) is vital to facilitate timely interventions and clinical decision support. However, modeling heterogeneous and irregular tempo...
A Spatio-Temporal Online Robust Tensor Recovery Approach for Streaming Traffic Data Imputation : Abstract: Data quality is critical to Intelligent Transportation Systems (ITS), as complete and accurate traffic data underpin reliable decision-making in traffic control and management. Recent advanc...
Adversarial Spatio-Temporal Attention Networks for Epileptic Seizure Forecasting : Abstract: Forecasting epileptic seizures from multivariate EEG signals represents a critical challenge in healthcare time series prediction, requiring high sensitivity, low false alarm rates, and subj...
Identification of Capture Phases in Nanopore Protein Sequencing Data Using a Deep Learning Model : Abstract: Nanopore protein sequencing produces long, noisy ionic current traces in which key molecular phases, such as protein capture and translocation, are embedded. Capture phases mark the successf...
Lyapunov Stability Learning with Nonlinear Control via Inductive Biases : Abstract: Finding a control Lyapunov function (CLF) in a dynamical system with a controller is an effective way to guarantee stability, which is a crucial issue in safety-concerned applications. Recen...
Koopman-based Prediction of Connectivity for Flying Ad Hoc Networks : Abstract: The application of machine learning (ML) to communication systems is expected to play a pivotal role in future artificial intelligence (AI)-based next-generation wireless networks. While mos...
LSHFed: Robust and Communication-Efficient Federated Learning with Locally-Sensitive Hashing Gradient Mapping : Abstract: Federated learning (FL) enables collaborative model training across distributed nodes without exposing raw data, but its decentralized nature makes it vulnerable in trust-deficient environme...
Diffusion-Based Solver for CNF Placement on the Cloud-Continuum : Abstract: The placement of Cloud-Native Network Functions (CNFs) across the Cloud-Continuum represents a core challenge in the orchestration of current 5G and future 6G networks. The process involves ...
MiniFool - Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks : Abstract: In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle ...
Verifiable Split Learning via zk-SNARKs : Abstract: Split learning is an approach to collaborative learning in which a deep neural network is divided into two parts: client-side and server-side at a cut layer. The client side executes its mod...
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization : Abstract: Traditional continuous deep reinforcement learning (RL) algorithms employ deterministic or unimodal Gaussian actors, which cannot express complex multimodal decision distributions. This limi...
Protecting the Neural Networks against FGSM Attack Using Machine Unlearning : Abstract: Machine learning is a powerful tool for building predictive models. However, it is vulnerable to adversarial attacks. Fast Gradient Sign Method (FGSM) attacks are a common type of adversaria...
Memory-Efficient Training with In-Place FFT Implementation : Abstract: Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot...
Leveraging Compact Satellite Embeddings and Graph Neural Networks for Large-Scale Poverty Mapping : Abstract: Accurate, fine-grained poverty maps remain scarce across much of the Global South. While Demographic and Health Surveys (DHS) provide high-quality socioeconomic data, their spatial coverage ...
CG-FKAN: Compressed-Grid Federated Kolmogorov-Arnold Networks for Communication Constrained Environment : Abstract: Federated learning (FL), widely used in privacy-critical applications, suffers from limited interpretability, whereas Kolmogorov-Arnold Networks (KAN) address this limitation via learnable s...
The Curvature Rate {\lambda}: A Scalar Measure of Input-Space Sharpness in Neural Networks : Abstract: Curvature influences generalization, robustness, and how reliably neural networks respond to small input perturbations. Existing sharpness metrics are typically defined in parameter space (e...
Efficient Curvature-aware Graph Network : Abstract: Graph curvature provides geometric priors for Graph Neural Networks (GNNs), enhancing their ability to model complex graph structures, particularly in terms of structural awareness, robustne...
DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation : Abstract: Data Assimilation is a cornerstone of atmospheric system modeling, tasked with reconstructing system states by integrating sparse, noisy observations with prior estimation. While traditional...
Real-time Continual Learning on Intel Loihi 2 : Abstract: AI systems on edge devices face a critical challenge in open-world environments: adapting when data distributions shift and novel classes emerge. While offline training dominates current par...
Gated Fusion Enhanced Multi-Scale Hierarchical Graph Convolutional Network for Stock Movement Prediction : Abstract: Accurately predicting stock market movements remains a formidable challenge due to the inherent volatility and complex interdependencies among stocks. Although multi-scale Graph Neural Netwo...
HIT-ROCKET: Hadamard-vector Inner-product Transformer for ROCKET : Abstract: Time series classification holds broad application value in communications, information countermeasures, finance, and medicine. However, state-of-the-art (SOTA) methods-including HIVE-COTE, ...
Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization : Abstract: Embedding models are a cornerstone of modern AI. Driven by Multimodal Large Language Models (MLLMs), they have made great progress in architecture and data curation, while the holistic parad...
Defining Energy Indicators for Impact Identification on Aerospace Composites: A Physics-Informed Machine Learning Perspective : Abstract: Energy estimation is critical to impact identification on aerospace composites, where low-velocity impacts can induce internal damage that is undetectable at the surface. Current methodologi...
Estimation of Toeplitz Covariance Matrices using Overparameterized Gradient Descent : Abstract: We consider covariance estimation under Toeplitz structure. Numerous sophisticated optimization methods have been developed to maximize the Gaussian log-likelihood under Toeplitz constraints...
Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving : Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, ex...
Cross-Treatment Effect Estimation for Multi-Category, Multi-Valued Causal Inference via Dynamic Neural Masking : Abstract: Counterfactual causal inference faces significant challenges when extended to multi-category, multi-valued treatments, where complex cross-effects between heterogeneous interventions are dif...
Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering : Abstract: Vision-language pre-trained models, such as CLIP, have established new benchmarks in multimodal data mining. In such models, few-shot fine-tuning is a major challenge to achieve optimal perf...
Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding : Abstract: The growing demand for on-device large language model (LLM) inference highlights the need for efficient mobile edge computing (MEC) solutions, especially in resource-constrained settings. Sp...
Edge AI in Highly Volatile Environments: Is Fairness Worth the Accuracy Trade-off? : Abstract: Federated learning (FL) has emerged as a transformative paradigm for edge intelligence, enabling collaborative model training while preserving data privacy across distributed personal device...
Game-theoretic distributed learning of generative models for heterogeneous data collections : Abstract: One of the main challenges in distributed learning arises from the difficulty of handling heterogeneous local models and data. In light of the recent success of generative models, we propose...
HyperNQ: A Hypergraph Neural Network Decoder for Quantum LDPC Codes : Abstract: Quantum computing requires effective error correction strategies to mitigate noise and decoherence. Quantum Low-Density Parity-Check (QLDPC) codes have emerged as a promising solution for sc...
Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing : Abstract: Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substan...
An Open-Access Benchmark of Statistical and Machine-Learning Anomaly Detection Methods for Battery Applications : Abstract: Battery safety is critical in applications ranging from consumer electronics to electric vehicles and aircraft, where undetected anomalies could trigger safety hazards or costly downtime. In...
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks : Abstract: Open-ended generation tasks require outputs to satisfy diverse and often implicit task-specific evaluation rubrics. The sheer number of relevant rubrics leads to prohibitively high verificat...
Random Initialization of Gated Sparse Adapters : Abstract: When fine-tuning language models on new tasks, catastrophic forgetting -- performance degradation on previously-learned tasks -- is a ubiquitous problem. While Parameter-Efficient Fine-Tunin...
Fractional Diffusion Bridge Models : Abstract: We present Fractional Diffusion Bridge Models (FDBM), a novel generative diffusion bridge framework driven by an approximation of the rich and non-Markovian fractional Brownian motion (fBM)....
DynBERG: Dynamic BERT-based Graph neural network for financial fraud detection : Abstract: Financial fraud detection is critical for maintaining the integrity of financial systems, particularly in decentralised environments such as cryptocurrency networks. Although Graph Convoluti...
Adaptive Spatio-Temporal Graphs with Self-Supervised Pretraining for Multi-Horizon Weather Forecasting : Abstract: Accurate and robust weather forecasting remains a fundamental challenge due to the inherent spatio-temporal complexity of atmospheric systems. In this paper, we propose a novel self-supervis...
FLoRA: Fused forward-backward adapters for parameter efficient fine-tuning and reducing inference-time latencies of LLMs : Abstract: As the large language models (LLMs) grow in size each day, efficient training and fine-tuning has never been as important as nowadays. This resulted in the great interest in parameter effici...
Calibrating and Rotating: A Unified Framework for Weight Conditioning in PEFT : Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods are crucial for adapting large pre-trained models. Among these, LoRA is considered a foundational approach. Building on this, the influential D...
Feature-Guided Analysis of Neural Networks: A Replication Study : Abstract: Understanding why neural networks make certain decisions is pivotal for their use in safety-critical applications. Feature-Guided Analysis (FGA) extracts slices of neural networks relevant t...
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models : Abstract: The design of training objective is central to training time-series forecasting models. Existing training objectives such as mean squared error mostly treat each future step as an independen...
SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation : Abstract: While Vision-Language Models (VLMs) excel in many areas, they struggle with complex spatial reasoning, which requires problem decomposition and strategic tool use. Fine-tuning smaller, more ...
Exploring Federated Learning for Thermal Urban Feature Segmentation -- A Comparison of Centralized and Decentralized Approaches : Abstract: Federated Learning (FL) is an approach for training a shared Machine Learning (ML) model with distributed training data and multiple participants. FL allows bypassing limitations of the trad...
MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling : Abstract: The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimizatio...
Automatically Finding Rule-Based Neurons in OthelloGPT : Abstract: OthelloGPT, a transformer trained to predict valid moves in Othello, provides an ideal testbed for interpretability research. The model is complex enough to exhibit rich computational patter...
EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics : Abstract: Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can b...
Aligning Brain Signals with Multimodal Speech and Vision Embeddings : Abstract: When we hear the word "house", we don't just process sound, we imagine walls, doors, memories. The brain builds meaning through layers, moving from raw acoustics to rich, multimodal associat...
Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models : Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful approach for strengthening the reasoning capabilities of large language models (LLMs). Among existing algorith...
Latent Domain Prompt Learning for Vision-Language Models : Abstract: The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet m...
Benchmarking Generative AI Against Bayesian Optimization for Constrained Multi-Objective Inverse Design : Abstract: This paper investigates the performance of Large Language Models (LLMs) as generative optimizers for solving constrained multi-objective regression tasks, specifically within the challenging...
Wavelet-Based Feature Extraction and Unsupervised Clustering for Parity Detection: A Feature Engineering Perspective : Abstract: This paper explores a deliberately over-engineered approach to the classical problem of parity detection -- determining whether a number is odd or even -- by combining wavelet-based feature ...
Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with B\'ezier Curves : Abstract: While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pict...
flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R : Abstract: flowengineR is an R package designed to provide a modular and extensible framework for building reproducible algorithmic workflows for general-purpose machine learning pipelines. It is motiv...
Fixed-point graph convolutional networks against adversarial attacks : Abstract: Adversarial attacks present a significant risk to the integrity and performance of graph neural networks, particularly in tasks where graph structure and node features are vulnerable to mani...
Application of predictive machine learning in pen & paper RPG game design : Abstract: In recent years, the pen and paper RPG market has experienced significant growth. As a result, companies are increasingly exploring the integration of AI technologies to enhance player exper...
MaGNet: A Mamba Dual-Hypergraph Network for Stock Prediction via Temporal-Causal and Global Relational Learning : Abstract: Stock trend prediction is crucial for profitable trading strategies and portfolio management yet remains challenging due to market volatility, complex temporal dynamics and multifaceted inte...
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph : Abstract: Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference, typically through parallel, sequential, or hybrid scaling. However, prior...
GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation : Abstract: Graph incremental learning (GIL), which continuously updates graph models by sequential knowledge acquisition, has garnered significant interest recently. However, existing GIL approaches fo...
A generative adversarial network optimization method for damage detection and digital twinning by deep AI fault learning: Z24 Bridge structural health monitoring benchmark validation : Abstract: The optimization-based damage detection and damage state digital twinning capabilities are examined here of a novel conditional-labeled generative adversarial network methodology. The framew...
Deep recurrent-convolutional neural network learning and physics Kalman filtering comparison in dynamic load identification : Abstract: The dynamic structural load identification capabilities of the gated recurrent unit, long short-term memory, and convolutional neural networks are examined herein. The examination is on real...
Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving : Abstract: Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for adapting large language models (LLMs) to downstream tasks. While prior work has ex...
Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers : Abstract: The discovery of conservation laws is a cornerstone of scientific progress. However, identifying these invariants from observational data remains a significant challenge. We propose a hybrid...
Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence : Abstract: This report presents Pelican-VL 1.0, a new family of open-source embodied brain models with parameter scales ranging from 7 billion to 72 billion. Our explicit mission is clearly stated as: ...
MeixnerNet: Adaptive and Robust Spectral Graph Neural Networks with Discrete Orthogonal Polynomials : Abstract: Spectral Graph Neural Networks (GNNs) have achieved state-of-the-art results by defining graph convolutions in the spectral domain. A common approach, popularized by ChebyNet, is to use poly...
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers : Abstract: Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energ...
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads : Abstract: The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of b...
Analysis of Line Break prediction models for detecting defensive breakthrough in football : Abstract: In football, attacking teams attempt to break through the opponent's defensive line to create scoring opportunities. This action, known as a Line Break, is a critical indicator of offensive ...
Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models : Abstract: We analyse how the sampling dynamics of distributions evolve in score-based diffusion models using cross-fluctuations, a centered-moment statistic from statistical physics. Specifically, we ...
Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features : Abstract: Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limi...
Casing Collar Identification using AlexNet-based Neural Networks for Depth Measurement in Oil and Gas Wells : Abstract: Accurate downhole depth measurement is essential for oil and gas well operations, directly influencing reservoir contact, production efficiency, and operational safety. Collar correlation us...
A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios : Abstract: The remarkable capabilities of Large Language Models (LLMs) often need to be tailored for specific applications, requiring the integration of new knowledge or the acquisition of new skills. ...
Feature Importance Guided Random Forest Learning with Simulated Annealing Based Hyperparameter Tuning : Abstract: This paper introduces a novel framework for enhancing Random Forest classifiers by integrating probabilistic feature sampling and hyperparameter tuning via Simulated Annealing. The proposed ...
Physiologically Active Vegetation Reverses Its Cooling Effect in Humid Urban Climates : Abstract: Efforts to green cities for cooling are succeeding unevenly because the same vegetation that cools surfaces can also intensify how hot the air feels. Previous studies have identified humid h...
A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control : Abstract: Leveraging large language models (LLMs) in traffic signal control (TSC) improves optimization efficiency and interpretability compared to traditional reinforcement learning (RL) methods. How...
Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning : Abstract: To improve decision-making and planning efficiency in back-end centralized redundant supply chains, this paper proposes a decision model integrating deep learning with intelligent particle s...
Can SAEs reveal and mitigate racial biases of LLMs in healthcare? : Abstract: LLMs are increasingly being used in healthcare. This promises to free physicians from drudgery, enabling better care to be delivered at scale. But the use of LLMs in this space also brings r...
PDE-SHARP: PDE Solver Hybrids Through Analysis & Refinement Passes : Abstract: Current LLM-driven approaches using test-time computing to generate PDE solvers execute a large number of solver samples to identify high-accuracy solvers. These paradigms are especially cos...
EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs : Abstract: Membership inference attacks (MIA) aim to infer whether a particular data point is part of the training dataset of a model. In this paper, we propose a new task in the context of LLM privacy...
Diffusion LLMs are Natural Adversaries for any LLM : Abstract: We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized inference task}. Our core insight is that p...
Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules versus Therapeutic Peptides : Abstract: Diffusion models have emerged as a leading framework in generative modeling, showing significant potential to accelerate and transform the traditionally slow and costly process of drug disco...
Iterative Foundation Model Fine-Tuning on Multiple Rewards : Abstract: Fine-tuning foundation models has emerged as a powerful approach for generating objects with specific desired properties. Reinforcement learning (RL) provides an effective framework for this...
Melanoma Classification Through Deep Ensemble Learning and Explainable AI : Abstract: Melanoma is one of the most aggressive and deadliest skin cancers, leading to mortality if not detected and treated in the early stages. Artificial intelligence techniques have recently been...
A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice : Abstract: We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2...
X-TRACK: Physics-Aware xLSTM for Realistic Vehicle Trajectory Prediction : Abstract: Recent advancements in Recurrent Neural Network (RNN) architectures, particularly the Extended Long Short Term Memory (xLSTM), have addressed the limitations of traditional Long Short Term M...
Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning : Abstract: Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particular...
Calibration Across Layers: Understanding Calibration Evolution in LLMs : Abstract: Large Language Models (LLMs) have demonstrated inherent calibration capabilities, where predicted probabilities align well with correctness, despite prior findings that deep neural networks ...
A systematic evaluation of uncertainty quantification techniques in deep learning: a case study in photoplethysmography signal analysis : Abstract: In principle, deep learning models trained on medical time-series, including wearable photoplethysmography (PPG) sensor data, can provide a means to continuously monitor physiological parame...
A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data : Abstract: Large Language Models (LLMs) offer a flexible means to generate synthetic tabular data, yet existing approaches often fail to preserve key causal parameters such as the average treatment eff...
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding : Abstract: Speculative Decoding (SD) ensures that the output matches the target model's distribution exactly. However, we argue that this distribution matching requirement is too stringent and results ...
Toward Unifying Group Fairness Evaluation from a Sparsity Perspective : Abstract: Ensuring algorithmic fairness remains a significant challenge in machine learning, particularly as models are increasingly applied across diverse domains. While numerous fairness criteria ex...
Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet : Abstract: Achieving both accurate and interpretable classification of motor imagery EEG remains a key challenge in brain computer interface (BCI) research. This paper compares a transparent fuzzy reas...
PolyRecommender: A Multimodal Recommendation System for Polymer Discovery : Abstract: We introduce PolyRecommender, a multimodal discovery framework that integrates chemical language representations from PolyBERT with molecular graph-based representations from a graph encoder...
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings : Abstract: The remarkable success of multimodal large language models (MLLMs) has driven advances in multimodal embeddings, yet existing models remain inherently discriminative, limiting their ability ...
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling : Abstract: Adversarial attacks present a critical challenge to deep neural networks' robustness, particularly in transfer scenarios across different model architectures. However, the transferability of...
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse : Abstract: In agentic LLM scenarios, an agent's interaction process during a single rollout often exhibits branching behaviors. Due to memory retrieval and concurrent tool executions at certain decisio...
Structure-Preserving Physics-Informed Neural Network for the Korteweg--de Vries (KdV) Equation : Abstract: Physics-Informed Neural Networks (PINNs) offer a flexible framework for solving nonlinear partial differential equations (PDEs), yet conventional implementations often fail to preserve key p...
Bootstrap Off-policy with World Model : Abstract: Online planning has proven effective in reinforcement learning (RL) for improving sample efficiency and final performance. However, using planning for environment interaction inevitably intr...
Region-Aware Reconstruction Strategy for Pre-training fMRI Foundation Model : Abstract: The emergence of foundation models in neuroimaging is driven by the increasing availability of large-scale and heterogeneous brain imaging datasets. Recent advances in self-supervised learni...
Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders : Abstract: An anomaly detection method based on deep autoencoders is proposed to address anomalies that often occur in enterprise-level ETL data streams. The study first analyzes multiple types of anom...
Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima : Abstract: Federated optimization is a constrained form of distributed optimization that enables training a global model without directly sharing client data. Although existing algorithms can guarantee...
Variational Autoencoder for Calibration: A New Approach : Abstract: In this paper we present a new implementation of a Variational Autoencoder (VAE) for the calibration of sensors. We propose that the VAE can be used to calibrate sensor data by training the ...
Reasoning Planning for Language Models : Abstract: Selecting an appropriate reasoning method for a given query remains a key challenge in language model generation. Existing approaches typically generate multiple candidate responses and use ...
Air Pollution Forecasting in Bucharest : Abstract: Air pollution, especially the particulate matter 2.5 (PM2.5), has become a growing concern in recent years, primarily in urban areas. Being exposed to air pollution is linked to developing n...
Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance : Abstract: Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupl...
Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations : Abstract: Traffic congestion, primarily driven by intersection queuing, significantly impacts urban living standards, safety, environmental quality, and economic efficiency. While Traffic Signal Contr...
Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales : Abstract: Accurate multi-horizon retail forecasts are critical for inventory and promotions. We present a novel study of weekly Walmart sales (45 stores, 2010--2012) using a Temporal Fusion Transforme...
Red-teaming Activation Probes using Prompted LLMs : Abstract: Activation probes are attractive monitors for AI systems due to low cost and latency, but their real-world robustness remains underexplored. We ask: What failure modes arise under realistic,...
FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction : Abstract: Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long shor...
Bayesian Network Structure Discovery Using Large Language Models : Abstract: Understanding probabilistic relationships among variables is crucial for analyzing complex systems. Traditional structure learning methods often require extensive observational data and incu...
Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology : Abstract: Data-driven discovery of model equations is a powerful approach for understanding the behavior of dynamical systems in many scientific fields. In particular, the ability to learn mathematica...
Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation : Abstract: Large language models (LLMs) offer transformative potential for clinical decision support in spine surgery but pose significant risks through hallucinations, which are factually inconsistent...
Gaining Momentum: Uncovering Hidden Scoring Dynamics in Hockey through Deep Neural Sequencing and Causal Modeling : Abstract: We present a unified, data-driven framework for quantifying and enhancing offensive momentum and scoring likelihood (expected goals, xG) in professional hockey. Leveraging a Sportlogiq datas...
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering : Abstract: Large language models (LLMs) can be controlled at inference time through prompts (in-context learning) and internal activations (activation steering). Different accounts have been proposed t...
Stochastic Shortest Path with Sparse Adversarial Costs : Abstract: We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Des...
Diluting Restricted Boltzmann Machines : Abstract: Recent advances in artificial intelligence have relied heavily on increasingly large neural networks, raising concerns about their computational and environmental costs. This paper investiga...
Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning : Abstract: Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous Fed...
Sensitivity Analysis for Climate Science with Generative Flow Models : Abstract: Sensitivity analysis is a cornerstone of climate science, essential for understanding phenomena ranging from storm intensity to long-term climate feedbacks. However, computing these sensitiv...
Inference-Time Chain-of-Thought Pruning with Latent Informativeness Signals : Abstract: Large language models (LLMs) improve reasoning accuracy when generating multiple candidate solutions at test time, but standard methods like Best-of-N (BoN) incur high computational cost by ...
Privacy-Aware Time Series Synthesis via Public Knowledge Distillation : Abstract: Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns...
Investigating the Robustness of Knowledge Tracing Models in the Presence of Student Concept Drift : Abstract: Knowledge Tracing (KT) has been an established problem in the educational data mining field for decades, and it is commonly assumed that the underlying learning process be- ing modeled remai...
TRISKELION-1: Unified Descriptive-Predictive-Generative AI : Abstract: TRISKELION-1 is a unified descriptive-predictive-generative architecture that integrates statistical, mechanistic, and generative reasoning within a single encoder-decoder framework. The mod...
Enhancing Heavy Rain Nowcasting with Multimodal Data: Integrating Radar and Satellite Observations : Abstract: The increasing frequency of heavy rainfall events, which are a major cause of urban flooding, underscores the urgent need for accurate precipitation forecasting - particularly in urban areas...
Effective Series Decomposition and Components Learning for Time Series Generation : Abstract: Time series generation focuses on modeling the underlying data distribution and resampling to produce authentic time series data. Key components, such as trend and seasonality, drive tempora...
Fast PINN Eigensolvers via Biconvex Reformulation : Abstract: Eigenvalue problems have a distinctive forward-inverse structure and are fundamental to characterizing a system's thermal response, stability, and natural modes. Physics-Informed Neural Netw...
Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration : Abstract: Reinforcement learning with verifiable rewards (RLVR) has improved the reasoning ability of large language models, yet training remains costly because many rollouts contribute little to opti...
Attention Saturation and Gradient Suppression at Inflection Layers: Diagnosing and Mitigating Bottlenecks in Transformer Adaptation : Abstract: Pre-trained Transformers often exhibit over-confidence in source patterns and difficulty in forming new target-domain patterns during fine-tuning. We formalize the mechanism of output satura...
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment : Abstract: Erasing harmful or proprietary concepts from powerful text to image generators is an emerging safety requirement, yet current "concept erasure" techniques either collapse image quality, rely...
Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems : Abstract: Cyber-physical systems (CPS) require the joint optimization of discrete cyber actions and continuous physical parameters under stringent safety logic constraints. However, existing hierarchi...
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games : Abstract: Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important clas...
LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons : Abstract: Vision Transformers have been tremendously successful in computer vision tasks. However, their large computational, memory, and energy demands are a challenge for edge inference on FPGAs -- ...
Identifying Slug Formation in Oil Well Pipelines: A Use Case from Industrial Analytics : Abstract: Slug formation in oil and gas pipelines poses significant challenges to operational safety and efficiency, yet existing detection approaches are often offline, require domain expertise, and ...
FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management : Abstract: Large Language Model (LLM) serving is increasingly constrained by the growing size of the key-value (KV) cache, which scales with both context length and generation length. Prior work shows ...
Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding : Abstract: LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy...
KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization : Abstract: We propose KFCPO, a novel Safe Reinforcement Learning (Safe RL) algorithm that combines scalable Kronecker-Factored Approximate Curvature (K-FAC) based second-order policy optimization with ...
SpEx: A Spectral Approach to Explainable Clustering : Abstract: Explainable clustering by axis-aligned decision trees was introduced by Moshkovitz et al. (2020) and has gained considerable interest. Prior work has focused on minimizing the price of expla...
Learning with Category-Equivariant Representations for Human Activity Recognition : Abstract: Human activity recognition is challenging because sensor signals shift with context, motion, and environment; effective models must therefore remain stable as the world around them changes. ...
Random Spiking Neural Networks are Stable and Spectrally Simple : Abstract: Spiking neural networks (SNNs) are a promising paradigm for energy-efficient computation, yet their theoretical foundations-especially regarding stability and robustness-remain limited compa...
Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle : Abstract: Transformers have demonstrated strong adaptability across a wide range of tasks and have become the backbone of modern Large Language Models (LLMs). However, their underlying mechanisms rema...
Motion-Robust Multimodal Fusion of PPG and Accelerometer Signals for Three-Class Heart Rhythm Classification : Abstract: Atrial fibrillation (AF) is a leading cause of stroke and mortality, particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables non-invasive, continuous rhythm monitorin...
The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks : Abstract: Normalization methods are fundamental components of modern deep neural networks (DNNs). Empirically, they are known to stabilize optimization dynamics and improve generalization. However, th...
Using Synthetic Data to estimate the True Error is theoretically and practically doable : Abstract: Accurately evaluating model performance is crucial for deploying machine learning systems in real-world applications. Traditional methods often require a sufficiently large labeled test set ...
VRScout: Towards Real-Time, Autonomous Testing of Virtual Reality Games : Abstract: Virtual Reality (VR) has rapidly become a mainstream platform for gaming and interactive experiences, yet ensuring the quality, safety, and appropriateness of VR content remains a pressing c...
Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts : Abstract: Large Language Model (LLM) deployment requires guiding the LLM to recognize and not answer unsafe prompts while complying with safe prompts. Previous methods for achieving this require adjus...
Probing Knowledge Holes in Unlearned LLMs : Abstract: Machine unlearning has emerged as a prevalent technical solution for selectively removing unwanted knowledge absorbed during pre-training, without requiring full retraining. While recent unl...
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators : Abstract: In recent years, Neural Operators(NO) have gradually emerged as a popular approach for solving Partial Differential Equations (PDEs). However, their application to large-scale engineering ta...
Neural Architecture Search for global multi-step Forecasting of Energy Production Time Series : Abstract: The dynamic energy sector requires both predictive accuracy and runtime efficiency for short-term forecasting of energy generation under operational constraints, where timely and precise pre...
Semi-Supervised Preference Optimization with Limited Feedback : Abstract: The field of preference optimization has made outstanding contributions to the alignment of language models with human preferences. Despite these advancements, recent methods still rely heav...
Physics-Informed Neural Network Frameworks for the Analysis of Engineering and Biological Dynamical Systems Governed by Ordinary Differential Equations : Abstract: In this study, we present and validate the predictive capability of the Physics-Informed Neural Networks (PINNs) methodology for solving a variety of engineering and biological dynamical sys...
ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks : Abstract: Physical Neural Networks (PNN) are promising platforms for next-generation computing systems. However, recent advances in digital neural network performance are largely driven by the rapid g...

Research Sources: 832 | Generated: 11/4/2025