AI RESEARCH PAPERS & ACADEMIC SOURCES
- From Engineering Diagrams to Graphs: Digitizing P&IDs with Transformers : Abstract: Digitizing engineering diagrams like Piping and Instrumentation Diagrams (P&IDs) plays a vital role in maintainability and operational efficiency of process and hydraulic systems. Previous m...
- MMRel: Benchmarking Relation Understanding in Multi-Modal Large Language Models : Abstract: Though Multi-modal Large Language Models (MLLMs) have recently achieved significant progress, they often struggle to understand diverse and complicated inter-object relations. Specifically, ...
- Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference : Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-trained Vision Transformer (ViT) models to downstream applications by updating only a small subset o...
- Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition : Abstract: Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have r...
- BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion : Abstract: Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: th...
- Low-Resolution Action Recognition for Tiny Actions Challenge : Abstract: Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, huma...
- Sceniris: A Fast Procedural Scene Generation Framework : Abstract: Synthetic 3D scenes are essential for developing Physical AI and generative models. Existing procedural generation methods often have low output throughput, creating a significant bottleneck...
- VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation : Abstract: When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and i...
- Don't Guess, Escalate: Towards Explainable Uncertainty-Calibrated AI Forensic Agents : Abstract: AI is reshaping the landscape of multimedia forensics. We propose AI forensic agents: reliable orchestrators that select and combine forensic detectors, identify provenance and context, and ...
- Autoencoder-based Denoising Defense against Adversarial Attacks on Object Detection : Abstract: Deep learning-based object detection models play a critical role in real-world applications such as autonomous driving and security surveillance systems, yet they remain vulnerable to advers...
- A Tri-Dynamic Preprocessing Framework for UGC Video Compression : Abstract: In recent years, user generated content (UGC) has become the dominant force in internet traffic. However, UGC videos exhibit a higher degree of variability and diverse characteristics compar...
- Machine Learning Enabled Graph Analysis of Particulate Composites: Application to Solid-state Battery Cathodes : Abstract: Particulate composites underpin many solid-state chemical and electrochemical systems, where microstructural features such as multiphase boundaries and inter-particle connections strongly in...
- MCR-VQGAN: A Scalable and Cost-Effective Tau PET Synthesis Approach for Alzheimer's Disease Imaging : Abstract: Tau positron emission tomography (PET) is a critical diagnostic modality for Alzheimer's disease (AD) because it visualizes and quantifies neurofibrillary tangles, a hallmark of AD pathology...
- In search of truth: Evaluating concordance of AI-based anatomy segmentation models : Abstract: Purpose AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets. The growing number of similar in functionality models raises the challenge of ...
- Large Video Planner Enables Generalizable Robot Control : Abstract: General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language...
- Human-like Working Memory from Artificial Intrinsic Plasticity Neurons : Abstract: Working memory enables the brain to integrate transient information for rapid decision-making. Artificial networks typically replicate this via recurrent or parallel architectures, yet incur...
- BioimageAIpub: a toolbox for AI-ready bioimaging data publishing : Abstract: Modern bioimage analysis approaches are data hungry, making it necessary for researchers to scavenge data beyond those collected within their (bio)imaging facilities. In addition to scale, b...
- The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text : Abstract: We present WorldCanvas, a framework for promptable world events that enables rich, user-directed simulation by combining text, trajectories, and reference images. Unlike text-only approaches...
- Generative Refocusing: Flexible Defocus Control from a Single Image : Abstract: Depth-of-field control is essential in photography, but getting the perfect focus often takes several tries or special equipment. Single-image refocusing is still difficult. It involves reco...
- Next-Embedding Prediction Makes Strong Vision Learners : Abstract: Inspired by the success of generative pretraining in natural language, we ask whether the same principles can yield strong self-supervised visual learners. Instead of training models to outp...
- Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification : Abstract: Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps across models. To address this, we ...
- EasyV2V: A High-quality Instruction-based Video Editing Framework : Abstract: While image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the design space of data, architecture,...
- DVGT: Driving Visual Geometry Transformer : Abstract: Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can a...
- AdaTooler-V: Adaptive Tool-Use for Images and Videos : Abstract: Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interactions. However, existing open-sour...
- StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors : Abstract: The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains cost...
- Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation : Abstract: In this work, we present a panoramic metric depth foundation model that generalizes across diverse scene distances. We explore a data-in-the-loop paradigm from the view of both data construc...
- MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning : Abstract: Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and w...
- SceneDiff: A Benchmark and Method for Multiview Object Change Detection : Abstract: We investigate the problem of identifying objects that have been added, removed, or moved between a pair of captures (images or videos) of the same scene at different times. Detecting such c...
- Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos : Abstract: Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these,...
- VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization : Abstract: Instruction-based video editing aims to modify an input video according to a natural-language instruction while preserving content fidelity and temporal coherence. However, existing diffusio...
- Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection : Abstract: Recent advances in Text-to-Image (T2I) generative models, such as Imagen, Stable Diffusion, and FLUX, have led to remarkable improvements in visual quality. However, their performance is fun...
- FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction : Abstract: Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transf...
- Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation : Abstract: Portrait animation has witnessed tremendous quality improvements thanks to recent advances in video diffusion models. However, these 2D methods often compromise 3D consistency and speed, lim...
- M-PhyGs: Multi-Material Object Dynamics from Video : Abstract: Knowledge of the physical material properties governing the dynamics of a real-world object becomes necessary to accurately anticipate its response to unseen interactions. Existing methods f...
- Memory-Enhanced SAM3 for Occlusion-Robust Surgical Instrument Segmentation : Abstract: Accurate surgical instrument segmentation in endoscopic videos is crucial for computer-assisted interventions, yet remains challenging due to frequent occlusions, rapid motion, specular arte...
- RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing : Abstract: Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate in...
- GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation : Abstract: Automating Text-to-Image (T2I) model evaluation is challenging; a judge model must be used to score correctness, and test prompts must be selected to be challenging for current T2I models bu...
- OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction : Abstract: The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are sca...
- Radiology Report Generation with Layer-Wise Anatomical Attention : Abstract: Automatic radiology report generation is a promising application of multimodal deep learning, aiming to reduce reporting workload and improve consistency. However, current state-of-the-art (...
- Next-Generation License Plate Detection and Recognition System using YOLOv8 : Abstract: In the evolving landscape of traffic management and vehicle surveillance, efficient license plate detection and recognition are indispensable. Historically, many methodologies have tackled t...
- DenseBEV: Transforming BEV Grid Cells into 3D Objects : Abstract: In current research, Bird's-Eye-View (BEV)-based transformers are increasingly utilized for multi-camera 3D object detection. Traditional models often employ random queries as anchors, optim...
- GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation : Abstract: Vision-Language-Action (VLA) models achieve strong generalization in robotic manipulation but remain largely reactive and 2D-centric, making them unreliable in tasks that require precise 3D ...
- KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals : Abstract: Full-body motion tracking plays an essential role in AR/VR applications, bridging physical and virtual interactions. However, it is challenging to reconstruct realistic and diverse full-body...
- R3ST: A Synthetic 3D Dataset With Realistic Trajectories : Abstract: Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic...
- Kling-Omni Technical Report : Abstract: We present Kling-Omni, a generalist generative framework designed to synthesize high-fidelity videos directly from multimodal visual language inputs. Adopting an end-to-end perspective, Klin...
- FlowDet: Unifying Object Detection and Generative Transport Flows : Abstract: We present FlowDet, the first formulation of object detection using modern Conditional Flow Matching techniques. This work follows from DiffusionDet, which originally framed detection as a g...
- Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation : Abstract: Posing 3D characters is a fundamental task in computer graphics and vision. However, existing methods like auto-rigging and pose-conditioned generation often struggle with challenges such as...
- TreeNet: A Light Weight Model for Low Bitrate Image Compression : Abstract: Reducing computational complexity remains a critical challenge for the widespread adoption of learning-based image compression techniques. In this work, we propose TreeNet, a novel low-compl...
- Task-Oriented Data Synthesis and Control-Rectify Sampling for Remote Sensing Semantic Segmentation : Abstract: With the rapid progress of controllable generation, training data synthesis has become a promising way to expand labeled datasets and alleviate manual annotation in remote sensing (RS). Howe...
- OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition : Abstract: Online micro gesture recognition from hand skeletons is critical for VR/AR interaction but faces challenges due to limited public datasets and task-specific algorithms. Micro gestures involv...
- A multi-centre, multi-device benchmark dataset for landmark-based comprehensive fetal biometry : Abstract: Accurate fetal growth assessment from ultrasound (US) relies on precise biometry measured by manually identifying anatomical landmarks in standard planes. Manual landmarking is time-consumin...
- SDFoam: Signed-Distance Foam for explicit surface reconstruction : Abstract: Neural radiance fields (NeRF) have driven impressive progress in view synthesis by using ray-traced volumetric rendering. Splatting-based methods such as 3D Gaussian Splatting (3DGS) provide...
- Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting? : Abstract: The rapid progress of generative AI has enabled highly realistic image manipulations, including inpainting and region-level editing. These approaches preserve most of the original visual con...
- Few-Shot Fingerprinting Subject Re-Identification in 3D-MRI and 2D-X-Ray : Abstract: Combining open-source datasets can introduce data leakage if the same subject appears in multiple sets, leading to inflated model performance. To address this, we explore subject fingerprint...
- FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering : Abstract: Neural rendering for interactive applications requires translating geometric and material properties (G-buffer) to photorealistic images with realistic lighting on a frame-by-frame basis. Wh...
- REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion : Abstract: Latent diffusion models (LDMs) achieve state-of-the-art image synthesis, yet their reconstruction-style denoising objective provides only indirect semantic supervision: high-level semantics ...
- DeContext as Defense: Safe Image Editing in Diffusion Transformers : Abstract: In-context diffusion models allow users to modify images with remarkable ease and realism. However, the same power raises serious privacy concerns: personal images can be easily manipulated ...
- Plug to Place: Indoor Multimedia Geolocation from Electrical Sockets for Digital Investigation : Abstract: Computer vision is a rapidly evolving field, giving rise to powerful new tools and techniques in digital forensic investigation, and shows great promise for novel digital forensic applicatio...
- Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers : Abstract: Diffusion Transformers (DiTs) set the state of the art in visual generation, yet their quadratic self-attention cost fundamentally limits scaling to long token sequences. Recent Top-K sparse...
- Hazedefy: A Lightweight Real-Time Image and Video Dehazing Pipeline for Practical Deployment : Abstract: This paper introduces Hazedefy, a lightweight and application-focused dehazing pipeline intended for real-time video and live camera feed enhancement. Hazedefy prioritizes computational simp...
- Yuan-TecSwin: A text conditioned Diffusion model with Swin-transformer blocks : Abstract: Diffusion models have shown remarkable capacity in image synthesis based on their U-shaped architecture and convolutional neural networks (CNN) as basic blocks. The locality of the convoluti...
- Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs : Abstract: While Multimodal Large Language Models (MLLMs) excel at visual understanding tasks through text reasoning, they often fall short in scenarios requiring visual imagination. Unlike current wor...
- CRONOS: Continuous Time Reconstruction for 4D Medical Longitudinal Series : Abstract: Forecasting how 3D medical scans evolve over time is important for disease progression, treatment planning, and developmental assessment. Yet existing models either rely on a single prior sc...
- Causal-Tune: Mining Causal Factors from Vision Foundation Models for Domain Generalized Semantic Segmentation : Abstract: Fine-tuning Vision Foundation Models (VFMs) with a small number of parameters has shown remarkable performance in Domain Generalized Semantic Segmentation (DGSS). Most existing works either ...
- 4D Primitive-M\^ach\'e: Glueing Primitives for Persistent 4D Scene Reconstruction : Abstract: We present a dynamic reconstruction system that receives a casual monocular RGB video as input, and outputs a complete and persistent reconstruction of the scene. In other words, we reconstr...
- N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models : Abstract: While current multimodal models can answer questions based on 2D images, they lack intrinsic 3D object perception, limiting their ability to comprehend spatial relationships and depth cues i...
- TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models : Abstract: Vision-Language Models (VLMs), such as CLIP, have achieved impressive zero-shot recognition performance but remain highly susceptible to adversarial perturbations, posing significant risks i...
- Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images : Abstract: Accurate intrinsic decomposition of face images under unconstrained lighting is a prerequisite for photorealistic relighting, high-fidelity digital doubles, and augmented-reality effects. Th...
- Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization : Abstract: The self-supervised pretraining paradigm has achieved great success in learning 3D action representations for skeleton-based action recognition using contrastive learning. However, learning ...
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks : Abstract: GUI grounding is a critical component in building capable GUI agents. However, existing grounding benchmarks suffer from significant limitations: they either provide insufficient data volume...
- PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation : Abstract: The lifting-based methods have dominated monocular 3D human pose estimation by leveraging detected 2D poses as intermediate representations. The 2D component of the final 3D human pose benef...
- YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images : Abstract: The processing of omnidirectional 360-degree images poses significant challenges for object detection due to inherent spatial distortions, wide fields of view, and ultra-high-resolution inpu...
- Smile on the Face, Sadness in the Eyes: Bridging the Emotion Gap with a Multimodal Dataset of Eye and Facial Behaviors : Abstract: Emotion Recognition (ER) is the process of analyzing and identifying human emotions from sensing data. Currently, the field heavily relies on facial expression recognition (FER) because visu...
- Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment : Abstract: Humans assess image quality through a perception-reasoning cascade, integrating sensory cues with implicit reasoning to form self-consistent judgments. In this work, we investigate how a mod...
- StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models : Abstract: Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image gene...
- SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning : Abstract: Autonomous robotic systems require spatio-temporal understanding of dynamic environments to ensure reliable navigation and interaction. While Vision-Language Models (VLMs) provide open-world...
- Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach : Abstract: Human motion generation is a challenging task that aims to create realistic motion imitating natural human behaviour. We focus on the well-studied behaviour of priming an object/location for...
- Geometric Disentanglement of Text Embeddings for Subject-Consistent Text-to-Image Generation using A Single Prompt : Abstract: Text-to-image diffusion models excel at generating high-quality images from natural language descriptions but often fail to preserve subject consistency across multiple outputs, limiting the...
- CountZES: Counting via Zero-Shot Exemplar Selection : Abstract: Object counting in complex scenes remains challenging, particularly in the zero-shot setting, where the goal is to count instances of unseen categories specified only by a class name. Existi...
- BrepLLM: Native Boundary Representation Understanding with Large Language Models : Abstract: Current token-sequence-based Large Language Models (LLMs) are not well-suited for directly processing 3D Boundary Representation (Brep) models that contain complex geometric and topological ...
- Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture : Abstract: We leverage increasingly popular three-dimensional neural representations in order to construct a unified and consistent explanation of a collection of uncalibrated images of the human face....
- Adaptive Frequency Domain Alignment Network for Medical image segmentation : Abstract: High-quality annotated data plays a crucial role in achieving accurate segmentation. However, such data for medical image segmentation are often scarce due to the time-consuming and labor-in...
- Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models : Abstract: State-of-the-art Text-to-Video (T2V) diffusion models can generate visually impressive results, yet they still frequently fail to compose complex scenes or follow logical temporal instructio...
- EverybodyDance: Bipartite Graph-Based Identity Correspondence for Multi-Character Animation : Abstract: Consistent pose-driven character animation has achieved remarkable progress in single-character scenarios. However, extending these advances to multi-character settings is non-trivial, espec...
- GMODiff: One-Step Gain Map Refinement with Diffusion Priors for HDR Reconstruction : Abstract: Pre-trained Latent Diffusion Models (LDMs) have recently shown strong perceptual priors for low-level vision tasks, making them a promising direction for multi-exposure High Dynamic Range (H...
- Collaborative Edge-to-Server Inference for Vision-Language Models : Abstract: We propose a collaborative edge-to-server inference framework for vision-language models (VLMs) that reduces the communication cost while maintaining inference accuracy. In typical deploymen...
- QUIDS: Quality-informed Incentive-driven Multi-agent Dispatching System for Mobile Crowdsensing : Abstract: This paper addresses the challenge of achieving optimal Quality of Information (QoI) in non-dedicated vehicular mobile crowdsensing (NVMCS) systems. The key obstacles are the interrelated is...
- Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs : Abstract: Tracking and measuring targets using a variety of sensors mounted on UAVs is an effective means to quickly and accurately locate the target. This paper proposes a fusion localization method ...
- LaverNet: Lightweight All-in-one Video Restoration via Selective Propagation : Abstract: Recent studies have explored all-in-one video restoration, which handles multiple degradations with a unified model. However, these approaches still face two challenges when dealing with tim...
- PixelArena: A benchmark for Pixel-Precision Visual Intelligence : Abstract: Multi-modal large language models that have image output are emerging. Many image generation benchmarks focus on aesthetics instead of fine-grained generation capabilities. In PixelArena, we...
- MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval : Abstract: Semantic overlap among land-cover categories, highly imbalanced label distributions, and complex inter-class co-occurrence patterns constitute significant challenges for multi-label remote-s...
- GFLAN: Generative Functional Layouts : Abstract: Automated floor plan generation lies at the intersection of combinatorial search, geometric constraint satisfaction, and functional design requirements -- a confluence that has historically ...
- TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering : Abstract: Text rendering has recently emerged as one of the most challenging frontiers in visual generation, drawing significant attention from large-scale diffusion and multimodal models. However, te...
- Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models : Abstract: Multi-view crowd counting has been proposed to deal with the severe occlusion issue of crowd counting in large and wide scenes. However, due to the difficulty of collecting and annotating mu...
- AI-Powered Dermatological Diagnosis: From Interpretable Models to Clinical Implementation A Comprehensive Framework for Accessible and Trustworthy Skin Disease Detection : Abstract: Dermatological conditions affect 1.9 billion people globally, yet accurate diagnosis remains challenging due to limited specialist availability and complex clinical presentations. Family his...
- ARMFlow: AutoRegressive MeanFlow for Online 3D Human Reaction Generation : Abstract: 3D human reaction generation faces three main challenges:(1) high motion fidelity, (2) real-time inference, and (3) autoregressive adaptability for online scenarios. Existing methods fail to...
- Image Compression Using Singular Value Decomposition : Abstract: Images are a substantial portion of the internet, making efficient compression important for reducing storage and bandwidth demands. This study investigates the use of Singular Value Decompo...
- Learning High-Quality Initial Noise for Single-View Synthesis with Diffusion Models : Abstract: Single-view novel view synthesis (NVS) models based on diffusion models have recently attracted increasing attention, as they can generate a series of novel view images from a single image p...
- Enhanced 3D Shape Analysis via Information Geometry : Abstract: Three-dimensional point clouds provide highly accurate digital representations of objects, essential for applications in computer graphics, photogrammetry, computer vision, and robotics. How...
- Open Ad-hoc Categorization with Contextualized Feature Learning : Abstract: Adaptive categorization of visual scenes is essential for AI agents to handle changing tasks. Unlike fixed common categories for plants or animals, ad-hoc categories are created dynamically ...
- Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation : Abstract: Radiology Report Generation (RRG) is a critical step toward automating healthcare workflows, facilitating accurate patient assessments, and reducing the workload of medical professionals. De...
- Avatar4D: Synthesizing Domain-Specific 4D Humans for Real-World Pose Estimation : Abstract: We present Avatar4D, a real-world transferable pipeline for generating customizable synthetic human motion datasets tailored to domain-specific applications. Unlike prior works, which focus ...
- Towards Closing the Domain Gap with Event Cameras : Abstract: Although traditional cameras are the primary sensor for end-to-end driving, their performance suffers greatly when the conditions of the data they were trained on does not match the deployme...
- C-DGPA: Class-Centric Dual-Alignment Generative Prompt Adaptation : Abstract: Unsupervised Domain Adaptation transfers knowledge from a labeled source domain to an unlabeled target domain. Directly deploying Vision-Language Models (VLMs) with prompt tuning in downstre...
- SegGraph: Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation : Abstract: This work presents a novel framework for few-shot 3D part segmentation. Recent advances have demonstrated the significant potential of 2D foundation models for low-shot 3D part segmentation....
- ResDynUNet++: A nested U-Net with residual dynamic convolution blocks for dual-spectral CT : Abstract: We propose a hybrid reconstruction framework for dual-spectral CT (DSCT) that integrates iterative methods with deep learning models. The reconstruction process consists of two complementary...
- Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space : Abstract: This paper introduces a method and application for automatically detecting behavioral interactions between grazing cattle from a single image, which is essential for smart livestock manageme...
- Flexible Camera Calibration using a Collimator System : Abstract: Camera calibration is a crucial step in photogrammetry and 3D vision applications. This paper introduces a novel camera calibration method using a designed collimator system. Our collimator ...
- Collimator-assisted high-precision calibration method for event cameras : Abstract: Event cameras are a new type of brain-inspired visual sensor with advantages such as high dynamic range and high temporal resolution. The geometric calibration of event cameras, which involv...
- LAPX: Lightweight Hourglass Network with Global Context : Abstract: Human pose estimation is a crucial task in computer vision. Methods that have SOTA (State-of-the-Art) accuracy, often involve a large number of parameters and incur substantial computational...
- Auto-Vocabulary 3D Object Detection : Abstract: Open-vocabulary 3D object detection methods are able to localize 3D boxes of classes unseen during training. Despite the name, existing methods rely on user-specified classes both at trainin...
- Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving : Abstract: Safety-critical corner cases, difficult to collect in the real world, are crucial for evaluating end-to-end autonomous driving. Adversarial interaction is an effective method to generate suc...
- CoVAR: Co-generation of Video and Action for Robotic Manipulation via Multi-Modal Diffusion : Abstract: We present a method to generate video-action pairs that follow text instructions, starting from an initial image observation and the robot's joint states. Our approach automatically provides...
- Eyes on the Grass: Biodiversity-Increasing Robotic Mowing Using Deep Visual Embeddings : Abstract: This paper presents a robotic mowing framework that actively enhances garden biodiversity through visual perception and adaptive decision-making. Unlike passive rewilding approaches, the pro...
- Are vision-language models ready to zero-shot replace supervised classification models in agriculture? : Abstract: Vision-language models (VLMs) are increasingly proposed as general-purpose solutions for visual recognition tasks, yet their reliability for agricultural decision support remains poorly unde...
- From Words to Wavelengths: VLMs for Few-Shot Multispectral Object Detection : Abstract: Multispectral object detection is critical for safety-sensitive applications such as autonomous driving and surveillance, where robust perception under diverse illumination conditions is ess...
- Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models : Abstract: Accurately predicting human behaviors is crucial for mobile robots operating in human-populated environments. While prior research primarily focuses on predicting actions in single-human sce...
- The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs : Abstract: Recent advances in multimodal large language models (MLLMs) have yielded increasingly powerful models, yet their perceptual capacities remain poorly characterized. In practice, most model fa...
- R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space : Abstract: Humans perceive and reason about their surroundings in four dimensions by building persistent, structured internal representations that encode semantic meaning, spatial layout, and temporal ...
- City Navigation in the Wild: Exploring Emergent Navigation from Web-Scale Knowledge in MLLMs : Abstract: Leveraging multimodal large language models (MLLMs) to develop embodied agents offers significant promise for addressing complex real-world tasks. However, current evaluation benchmarks rema...
- The Emergence of Chunking Structures with Hierarchical RNN : Abstract: In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This paper introduces a...
- How Good is Post-Hoc Watermarking With Language Model Rephrasing? : Abstract: Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while ...
- Needle in the Web: A Benchmark for Retrieving Targeted Web Pages in the Wild : Abstract: Large Language Models (LLMs) have evolved from simple chatbots into sophisticated agents capable of automating complex real-world tasks, where browsing and reasoning over live web content is...
- From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection : Abstract: Benefiting from the superior capabilities of large language models in natural language understanding and generation, Embeddings-as-a-Service (EaaS) has emerged as a successful commercial par...
- Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation : Abstract: Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents due to its simplicity and effectiveness. However, this architec...
- Adaptation of Agentic AI : Abstract: Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks. As...
- QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems : Abstract: Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are...
- DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack : Abstract: With the rapid development of cloud-based services, large language models (LLMs) have become increasingly accessible through various web platforms. However, this accessibility has also led t...
- ContextLeak: Auditing Leakage in Private In-Context Learning Methods : Abstract: In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt. However, when...
- Cross-Language Bias Examination in Large Language Models : Abstract: This study introduces an innovative multilingual bias evaluation framework for assessing bias in Large Language Models, combining explicit bias assessment through the BBQ benchmark with impl...
- Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in connecting vision and language, yet their proficiency in fundamental visual reasoning tasks rem...
- From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining : Abstract: Decoding speech from brain activity has typically relied on limited neural recordings collected during short and highly controlled experiments. Here, we introduce a framework to leverage wee...
- DP-Bench: A Benchmark for Evaluating Data Product Creation Systems : Abstract: A data product is created with the intention of solving a specific problem, addressing a specific business usecase or meeting a particular need, going beyond just serving data as a raw asset...
- Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms : Abstract: Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are so...
- A Systematic Analysis of Biases in Large Language Models : Abstract: Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness acros...
- Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud : Abstract: In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans throug...
- Value Lens: Using Large Language Models to Understand Human Values : Abstract: The autonomous decision-making process, which is increasingly applied to computer systems, requires that the choices made by these systems align with human values. In this context, systems m...
- Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates : Abstract: Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, ...
- Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image : Abstract: Reward models (RMs) are essential for training large language models (LLMs), but remain underexplored for omni models that handle interleaved image and text sequences. We introduce Multimoda...
- AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning : Abstract: Equipping large language models (LLMs) with search engines via reinforcement learning (RL) has emerged as an effective approach for building search agents. However, overreliance on search in...
- LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference : Abstract: Transformer-based language models have achieved remarkable performance across a wide range of tasks, yet their high inference latency poses a significant challenge for real-timeand large-sca...
- What Do Prosody and Text Convey? Characterizing How Meaningful Information is Distributed Across Multiple Channels : Abstract: Prosody -- the melody of speech -- conveys critical information often not captured by the words or text of a message. In this paper, we propose an information-theoretic approach to quantify ...
- Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs : Abstract: Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decom...
- Exploration of Augmentation Strategies in Multi-modal Retrieval-Augmented Generation for the Biomedical Domain: A Case Study Evaluating Question Answering in Glycobiology : Abstract: Multi-modal retrieval-augmented generation (MM-RAG) promises grounded biomedical QA, but it is unclear when to (i) convert figures/tables into text versus (ii) use optical character recognit...
- From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs : Abstract: Retrieval-Augmented Generation (RAG) grounds large language models (LLMs) in external evidence, but fails when retrieved sources conflict or contain outdated or subjective information. Prior...
- GinSign: Grounding Natural Language Into System Signatures for Temporal Logic Translation : Abstract: Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capab...
- JustRL: Scaling a 1.5B LLM with a Simple RL Recipe : Abstract: Recent advances in reinforcement learning for large language models have converged on increasing complexity: multi-stage training pipelines, dynamic hyperparameter schedules, and curriculum ...
- Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics : Abstract: We introduce Refusal Steering, an inference-time method to exercise fine-grained control over Large Language Models refusal behaviour on politically sensitive topics without retraining. We r...
- UM_FHS at the CLEF 2025 SimpleText Track: Comparing No-Context and Fine-Tune Approaches for GPT-4.1 Models in Sentence and Document-Level Text Simplification : Abstract: This work describes our submission to the CLEF 2025 SimpleText track Task 1, addressing both sentenceand document-level simplification of scientific texts. The methodology centered on using ...
- Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics : Abstract: This study investigated the application of Large Language Models (LLMs) for simplifying biomedical texts to enhance health literacy. Using a public dataset, which included plain language ada...
- Bridging the Reality Gap: Efficient Adaptation of ASR systems for Challenging Low-Resource Domains : Abstract: Automatic Speech Recognition (ASR) holds immense potential to streamline clinical documentation, such as digitizing handwritten prescriptions and reports, thereby increasing patient throughp...
- Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs : Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given rise to SpeechLLMs, which aim to translate spoken language directly, thereby bypassing t...
- Hacking Neural Evaluation Metrics with Single Hub Text : Abstract: Strongly human-correlated evaluation metrics serve as an essential compass for the development and improvement of generation models and must be highly reliable and robust. Recent embedding-b...
- Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures : Abstract: The evaluation of Large Language Models (LLMs) for translation tasks has primarily focused on high-resource languages, leaving a significant gap in understanding their performance on low-res...
- Sigma-Moe-Tiny Technical Report : Abstract: Mixture-of-Experts (MoE) has emerged as a promising paradigm for foundation models due to its efficient and powerful scalability. In this work, we present Sigma-MoE-Tiny, an MoE language mod...
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding : Abstract: Diffusion Large Language Models (dLLMs) have demonstrated significant potential for high-speed inference. However, current confidence-driven decoding strategies are constrained by limited pa...
- An Information-Theoretic Framework for Robust Large Language Model Editing : Abstract: Large Language Models (LLMs) have become indispensable tools in science, technology, and society, enabling transformative advances across diverse fields. However, errors or outdated informat...
- Mitigating Hallucinations in Healthcare LLMs with Granular Fact-Checking and Domain-Specific Adaptation : Abstract: In healthcare, it is essential for any LLM-generated output to be reliable and accurate, particularly in cases involving decision-making and patient safety. However, the outputs are often un...
- A Domain-Adapted Pipeline for Structured Information Extraction from Police Incident Announcements on Social Media : Abstract: Structured information extraction from police incident announcements is crucial for timely and accurate data processing, yet presents considerable challenges due to the variability and infor...
- Decoding Fake Narratives in Spreading Hateful Stories: A Dual-Head RoBERTa Model with Multi-Task Learning : Abstract: Social media platforms, while enabling global connectivity, have become hubs for the rapid spread of harmful content, including hate speech and fake narratives \cite{davidson2017automated, s...
- MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation : Abstract: Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text tha...
- Convolutional Lie Operator for Sentence Classification : Abstract: Traditional Convolutional Neural Networks have been successful in capturing local, position-invariant features in text, but their capacity to model complex transformation within language can...
- Are We on the Right Way to Assessing LLM-as-a-Judge? : Abstract: LLM-as-a-Judge has been widely adopted as an evaluation method and served as supervised rewards in model training. However, existing benchmarks for LLM-as-a-Judge are mainly relying on human...
- Examining the Utility of Self-disclosure Types for Modeling Annotators of Social Norms : Abstract: Recent work has explored the use of personal information in the form of persona sentences or self-disclosures to improve modeling of individual characteristics and prediction of annotator la...
- BRAID: Bounded Reasoning for Autonomous Inference and Decisions : Abstract: Large Language Models (LLMs) exhibit nonlinear relationships between performance, cost, and token usage. This paper presents a quantitative study on structured prompting using BRAID (Bounded...
- TabReX : Tabular Referenceless eXplainable Evaluation : Abstract: Evaluating the quality of tables generated by large language models (LLMs) remains an open challenge: existing metrics either flatten tables into text, ignoring structure, or rely on fixed r...
- TACE: A unified Irreducible Cartesian Tensor Framework for Atomistic Machine Learning : Abstract: Here, we introduce the Tensor Atomic Cluster Expansion (TACE), a unified framework formulated entirely in Cartesian space, enabling systematic and consistent prediction of arbitrary structur...
- Memory Backdoor Attacks on Neural Networks : Abstract: Neural networks are often trained on proprietary datasets, making them attractive attack targets. We present a novel dataset extraction method leveraging an innovative training time backdoor...
- Artificial Intelligence for Microbiology and Microbiome Research : Abstract: Advancements in artificial intelligence (AI) have transformed many scientific fields, with microbiology and microbiome research now experiencing significant breakthroughs through machine lea...
- Iterative Feature Exclusion Ranking for Deep Tabular Learning : Abstract: Tabular data is a common format for storing information in rows and columns to represent data entries and their features. Although deep neural networks have become the main approach for mode...
- Provable optimal transport with transformers: The essence of depth and prompt engineering : Abstract: Despite their empirical success, the internal mechanism by which transformer models align tokens during language processing remains poorly understood. This paper provides a mechanistic and t...
- From Logits to Hierarchies: Hierarchical Clustering made Simple : Abstract: The hierarchical structure inherent in many real-world datasets makes the modeling of such hierarchies a crucial objective in both unsupervised and supervised machine learning. While recent ...
- Ensembles provably learn equivariance through data augmentation : Abstract: Recently, it was proved that group equivariance emerges in ensembles of neural networks as the result of full augmentation in the limit of infinitely wide neural networks (neural tangent ker...
- Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review : Abstract: Recent technological advancements in multimodal machine learning--including the rise of large language models (LLMs)--have improved our ability to collect, process, and analyze diverse multi...
- Unsupervised discovery of the shared and private geometry in multi-view data : Abstract: Studying complex real-world phenomena often involves data from multiple views (e.g. sensor modalities or brain regions), each capturing different aspects of the underlying system. Within neu...
- DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs : Abstract: Dynamic graph modeling aims to uncover evolutionary patterns in real-world systems, enabling accurate social recommendation and early detection of cancer cells. Inspired by the success of re...
- Bandits with Preference Feedback: A Stackelberg Game Perspective : Abstract: Bandits with preference feedback present a powerful tool for optimizing unknown target functions when only pairwise comparisons are allowed instead of direct value queries. This model allows...
- PILA: Physics-Informed Low Rank Augmentation for Interpretable Earth Observation : Abstract: Physically meaningful representations are essential for Earth Observation (EO), yet existing physical models are often simplified and incomplete. This leads to discrepancies between simulati...
- Models That Prove Their Own Correctness : Abstract: How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for ...
- Online Bandits with (Biased) Offline Data: Adaptive Learning under Distribution Mismatch : Abstract: Traditional online learning models are typically initialized from scratch. By contrast, contemporary real-world applications often have access to historical datasets that can potentially enh...
- Optimization with Access to Auxiliary Information : Abstract: We investigate the fundamental optimization question of minimizing a target function $f$, whose gradients are expensive to compute or have limited availability, given access to some auxiliar...
- Neural networks for dengue forecasting: a systematic review : Abstract: Background: Early forecasts of dengue are an important tool for disease mitigation. Neural networks are powerful predictive models that have made contributions to many areas of public health...
- Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning : Abstract: Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superfic...
- SFTok: Bridging the Performance Gap in Discrete Tokenizers : Abstract: Recent advances in multimodal models highlight the pivotal role of image tokenization in high-resolution image generation. By compressing images into compact latent representations, tokenize...
- In-Context Algebra : Abstract: We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions...
- LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation : Abstract: Video Large Language Models (VLLMs) unlock world-knowledge-aware video understanding through pretraining on internet-scale data and have already shown promise on tasks such as movie analysis...
- Cartesian-nj: Extending e3nn to Irreducible Cartesian Tensor Product and Contracion : Abstract: Equivariant atomistic machine learning models have brought substantial gains in both extrapolation capability and predictive accuracy. Depending on the basis of the space, two distinct types...
- PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies : Abstract: A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging d...
- Learning Confidence Ellipsoids and Applications to Robust Subspace Recovery : Abstract: We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $α$, the goal is to f...
- Pixel Seal: Adversarial-only training for invisible image and video watermarking : Abstract: Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously difficult, with current approaches often stru...
- On the Universal Representation Property of Spiking Neural Networks : Abstract: Inspired by biology, spiking neural networks (SNNs) process information via discrete spikes over time, offering an energy-efficient alternative to the classical computing paradigm and classi...
- ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning : Abstract: Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learnin...
- Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning : Abstract: Reactive jammers pose a severe security threat to robotic-swarm networks by selectively disrupting inter-agent communications and undermining formation integrity and mission success. Convent...
- Few-Shot Specific Emitter Identification via Integrated Complex Variational Mode Decomposition and Spatial Attention Transfer : Abstract: Specific emitter identification (SEI) utilizes passive hardware characteristics to authenticate transmitters, providing a robust physical-layer security solution. However, most deep-learning...
- Non-Linear Strong Data-Processing for Quantum Hockey-Stick Divergences : Abstract: Data-processing is a desired property of classical and quantum divergences and information measures. In information theory, the contraction coefficient measures how much the distinguishabili...
- On The Hidden Biases of Flow Matching Samplers : Abstract: We study the implicit bias of flow matching (FM) samplers via the lens of empirical flow matching. Although population FM may produce gradient-field velocities resembling optimal transport (...
- Olaf: Bringing an Animated Character to Life in the Physical World : Abstract: Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot. This provides an ideal platform for innovation in both mechanical design a...
- How accurate are foundational machine learning interatomic potentials for heterogeneous catalysis? : Abstract: Foundational machine learning interatomic potentials (MLIPs) are being developed at a rapid pace, promising closer and closer approximation to ab initio accuracy. This unlocks the possibilit...
- SARMAE: Masked Autoencoder for SAR Representation Learning : Abstract: Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data sc...
- Riemannian Stochastic Interpolants for Amorphous Particle Systems : Abstract: Modern generative models hold great promise for accelerating diverse tasks involving the simulation of physical systems, but they must be adapted to the specific constraints of each domain. ...
- Muon is Provably Faster with Momentum Variance Reduction : Abstract: Recent empirical research has demonstrated that deep learning optimizers based on the linear minimization oracle (LMO) over specifically chosen Non-Euclidean norm balls, such as Muon and Sci...
- Non-Asymptotic Global Convergence of PPO-Clip : Abstract: Reinforcement learning (RL) has gained attention for aligning large language models (LLMs) via reinforcement learning from human feedback (RLHF). The actor-only variants of Proximal Policy O...
- Predictive Inorganic Synthesis based on Machine Learning using Small Data sets: a case study of size-controlled Cu Nanoparticles : Abstract: Copper nanoparticles (Cu NPs) have a broad applicability, yet their synthesis is sensitive to subtle changes in reaction parameters. This sensitivity, combined with the time- and resource-in...
- A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection : Abstract: As large language models (LLMs) are increasingly adopted for code vulnerability detection, their reliability and robustness across diverse vulnerability types have become a pressing concern....
- Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders : Abstract: This paper introduces a cepstrum-based pitch modification method that can be applied to any mel-spectrogram representation. As a result, this method is compatible with any mel-based vocoder ...
- Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning : Abstract: Generalizing causal knowledge across diverse environments is challenging, especially when estimates from large-scale datasets must be applied to smaller or systematically different contexts,...
- Global universal approximation with Brownian signatures : Abstract: We establish $L^p$-type universal approximation theorems for general and non-anticipative functionals on suitable rough path spaces, showing that linear functionals acting on signatures of t...
- Can Transformers overcome the lack of data in the simulation of history-dependent flows? : Abstract: It is well known that the lack of information about certain variables necessary for the description of a dynamical system leads to the introduction of historical dependence (lack of Markovia...
- In-Context Probing for Membership Inference in Fine-Tuned Language Models : Abstract: Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs), especially when models are adapted to domain-specific tasks using sensitive dat...
- Pixel Super-Resolved Fluorescence Lifetime Imaging Using Deep Learning : Abstract: Fluorescence lifetime imaging microscopy (FLIM) is a powerful quantitative technique that provides metabolic and molecular contrast, offering strong translational potential for label-free, r...
- Interpretable Deep Learning for Stock Returns: A Consensus-Bottleneck Asset Pricing Model : Abstract: We introduce the \textit{Consensus-Bottleneck Asset Pricing Model} (CB-APM), a partially interpretable neural network that replicates the reasoning processes of sell-side analysts by capturi...
- DAG Learning from Zero-Inflated Count Data Using Continuous Optimization : Abstract: We address network structure learning from zero-inflated count data by casting each node as a zero-inflated generalized linear model and optimizing a smooth, score-based objective under a di...
- Physics-Informed Neural Networks for Modeling the Martian Induced Magnetosphere : Abstract: Understanding the magnetic field environment around Mars and its response to upstream solar wind conditions provide key insights into the processes driving atmospheric ion escape. To date, g...
- Science Consultant Agent : Abstract: The Science Consultant Agent is a web-based Artificial Intelligence (AI) tool that helps practitioners select and implement the most effective modeling strategy for AI-based solutions. It op...
- Artificial Intelligence-Enabled Holistic Design of Catalysts Tailored for Semiconducting Carbon Nanotube Growth : Abstract: Catalyst design is crucial for materials synthesis, especially for complex reaction networks. Strategies like collaborative catalytic systems and multifunctional catalysts are effective but ...
- Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference : Abstract: The evolution of Large Language Model (LLM) serving towards complex, distributed architectures--specifically the P/D-separated, large-scale DP+EP paradigm--introduces distinct scheduling cha...
- BayesSum: Bayesian Quadrature in Discrete Spaces : Abstract: This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estim...
- TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times : Abstract: We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly re...
- Scaling Text2SQL via LLM-efficient Schema Filtering with Functional Dependency Graph Rerankers : Abstract: Most modern Text2SQL systems prompt large language models (LLMs) with entire schemas -- mostly column information -- alongside the user's question. While effective on small databases, this a...
- FOD-Diff: 3D Multi-Channel Patch Diffusion Model for Fiber Orientation Distribution : Abstract: Diffusion MRI (dMRI) is a critical non-invasive technique to estimate fiber orientation distribution (FOD) for characterizing white matter integrity. Estimating FOD from single-shell low ang...
- Graph Neural Networks for Interferometer Simulations : Abstract: In recent years, graph neural networks (GNNs) have shown tremendous promise in solving problems in high energy physics, materials science, and fluid dynamics. In this work, we introduce a ne...
- Concurrence: A dependence criterion for time series, applied to biological data : Abstract: Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that curre...
- Information theory and discriminative sampling for model discovery : Abstract: Fisher information and Shannon entropy are fundamental tools for understanding and analyzing dynamical systems from complementary perspectives. They can characterize unknown parameters by qu...
- Time-Frequency Analysis for Neural Networks : Abstract: We develop a quantitative approximation theory for shallow neural networks using tools from time-frequency analysis. Working in weighted modulation spaces $M^{p,q}_m(\mathbf{R}^{d})$, we pro...
- Hierarchical Neural Surfaces for 3D Mesh Compression : Abstract: Implicit Neural Representations (INRs) have been demonstrated to achieve state-of-the-art compression of a broad range of modalities such as images, videos, 3D surfaces, and audio. Most stud...
- Social Story Frames: Contextual Reasoning about Narrative Intent and Reception : Abstract: Reading stories evokes rich interpretive, affective, and evaluative responses, such as inferences about narrative intent or judgments about characters. Yet, computational models of reader re...
- Secure AI-Driven Super-Resolution for Real-Time Mixed Reality Applications : Abstract: Immersive formats such as 360° and 6DoF point cloud videos require high bandwidth and low latency, posing challenges for real-time AR/VR streaming. This work focuses on reducing bandwidth co...
- Foundation Models in Biomedical Imaging: Turning Hype into Reality : Abstract: Foundation models (FMs) are driving a prominent shift in artificial intelligence across different domains, including biomedical imaging. These models are designed to move beyond narrow patte...
- An empirical analysis of zero-day vulnerabilities disclosed by the zero day initiative : Abstract: Zero-day vulnerabilities represent some of the most critical threats in cybersecurity, as they correspond to previously unknown flaws in software or hardware that are actively exploited befo...
- Consensus dimension reduction via multi-view learning : Abstract: A plethora of dimension reduction methods have been developed to visualize high-dimensional data in low dimensions. However, different dimension reduction methods often output different and ...
- Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM : Abstract: Large Language Model (LLM) agents are increasingly deployed to automate complex workflows in mobile and desktop environments. However, current model-centric agent architectures struggle to s...
- AI Epidemiology: achieving explainable AI through expert oversight patterns : Abstract: AI Epidemiology is a framework for governing and explaining advanced AI systems by applying population-level surveillance methods to AI outputs. The approach mirrors the way in which epidemi...
- Auto-Tuning Safety Guardrails for Black-Box Large Language Models : Abstract: Large language models (LLMs) are increasingly deployed behind safety guardrails such as system prompts and content filters, especially in settings where product teams cannot modify model wei...
- Hyperparameter Tuning-Based Optimized Performance Analysis of Machine Learning Algorithms for Network Intrusion Detection : Abstract: Network Intrusion Detection Systems (NIDS) are essential for securing networks by identifying and mitigating unauthorized activities indicative of cyberattacks. As cyber threats grow increas...
- RAMBO: Reliability Analysis for Mamba through Bit-flip attack Optimization : Abstract: State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering linear-time scalability together with stron...
- Enhanced Web User Interface Design Via Cross-Device Responsiveness Assessment Using An Improved HCI-INTEGRATED DL Schemes : Abstract: User Interface (UI) optimization is essential in the digital era to enhance user satisfaction in web environments. Nevertheless, the existing UI optimization models had overlooked the Cross-...
- Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real : Abstract: Data scarcity and distribution shift pose major challenges for masked face detection and recognition. We propose a two-step generative data augmentation framework that combines rule-based ma...
- Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier? : Abstract: The increasing use of generative models such as diffusion models for synthetic data augmentation has greatly reduced the cost of data collection and labeling in downstream perception tasks. ...
- PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling : Abstract: The scarcity of cyberattack data hinders the development of robust intrusion detection systems. This paper introduces PHANTOM, a novel adversarial variational framework for generating high-f...
- Bayesian Modeling for Uncertainty Management in Financial Risk Forecasting and Compliance : Abstract: A Bayesian analytics framework that precisely quantifies uncertainty offers a significant advance for financial risk management. We develop an integrated approach that consistently enhances ...
- Anubuddhi: A Multi-Agent AI System for Designing and Simulating Quantum Optics Experiments : Abstract: We present Anubuddhi, a multi-agent AI system that designs and simulates quantum optics experiments from natural language prompts without requiring specialized programming knowledge. The sys...
- The Red Queen's Trap: Limits of Deep Evolution in High-Frequency Trading : Abstract: The integration of Deep Reinforcement Learning (DRL) and Evolutionary Computation (EC) is frequently hypothesized to be the "Holy Grail" of algorithmic trading, promising systems that adapt ...
- TinyMyo: a Tiny Foundation Model for Flexible EMG Signal Processing at the Edge : Abstract: Surface electromyography (EMG) is a non-invasive sensing modality used in several domains, including biomechanics, rehabilitation, prosthetic control, and emerging human-machine interaction ...
- Random matrix theory of sparse neuronal networks with heterogeneous timescales : Abstract: Training recurrent neuronal networks consisting of excitatory (E) and inhibitory (I) units with additive noise for working memory computation slows and diversifies inhibitory timescales, lea...
- Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward : Abstract: This paper examines the exploration-exploitation trade-off in reinforcement learning with verifiable rewards (RLVR), a framework for improving the reasoning of Large Language Models (LLMs). ...
- Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning : Abstract: Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, typically with reinforcement l...
- Impacts of Racial Bias in Historical Training Data for News AI : Abstract: AI technologies have rapidly moved into business and research applications that involve large text corpora, including computational journalism research and newsroom settings. These models, t...
- Training Together, Diagnosing Better: Federated Learning for Collagen VI-Related Dystrophies : Abstract: The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of a...
- Sequencing to Mitigate Catastrophic Forgetting in Continual Learning : Abstract: To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, and exploit knowledge throughout its lifetime. This ability, known as Continual learning, prov...
- Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models : Abstract: Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges. However, many existing approaches assume static models trained centrally and ...
- Meta-RL Induces Exploration in Language Agents : Abstract: Reinforcement learning (RL) has enabled the training of large language model (LLM) agents to interact with the environment and to solve multi-turn long-horizon tasks. However, the RL-trained...
- Tiny Recursive Control: Iterative Reasoning for Efficient Optimal Control : Abstract: Neural network controllers increasingly demand millions of parameters, and language model approaches push into the billions. For embedded aerospace systems with strict power and latency cons...
- MEPIC: Memory Efficient Position Independent Caching for LLM Serving : Abstract: Modern LLM applications such as deep-research assistants, coding agents, and Retrieval-Augmented Generation (RAG) systems, repeatedly process long prompt histories containing shared document...
- Pattern recognition in complex systems via vector-field representations of spatio-temporal data : Abstract: A complex system comprises multiple interacting entities whose interdependencies form a unified whole, exhibiting emergent behaviours not present in individual components. Examples include t...
- NRGPT: An Energy-based Alternative for GPT : Abstract: Generative Pre-trained Transformer (GPT) architectures are the most popular design for language modeling. Energy-based modeling is a different paradigm that views inference as a dynamical pr...
- Machine Learning Algorithms: Detection Official Hajj and Umrah Travel Agency Based on Text and Metadata Analysis : Abstract: The rapid digitalization of Hajj and Umrah services in Indonesia has significantly facilitated pilgrims but has concurrently opened avenues for digital fraud through counterfeit mobile appli...
- KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling : Abstract: Recent selective state space models (SSMs), such as Mamba and Mamba-2, have demonstrated strong performance in sequence modeling owing to input-dependent selection mechanisms. However, these...
- Polyharmonic Spline Packages: Composition, Efficient Procedures for Computation and Differentiation : Abstract: In a previous paper it was shown that a machine learning regression problem can be solved within the framework of random function theory, with the optimal kernel analytically derived from sy...
- Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering : Abstract: In actuality, phishing attacks remain one of the most prevalent cybersecurity risks in existence today, with malevolent actors constantly changing their strategies to successfully trick user...
- Towards Reproducibility in Predictive Process Mining: SPICE - A Deep Learning Library : Abstract: In recent years, Predictive Process Mining (PPM) techniques based on artificial neural networks have evolved as a method for monitoring the future behavior of unfolding business processes an...
- CLARiTy: A Vision Transformer for Multi-Label Classification and Weakly-Supervised Localization of Chest X-ray Pathologies : Abstract: The interpretation of chest X-rays (CXRs) poses significant challenges, particularly in achieving accurate multi-label pathology classification and spatial localization. These tasks demand d...
- Blog Data Showdown: Machine Learning vs Neuro-Symbolic Models for Gender Classification : Abstract: Text classification problems, such as gender classification from a blog, have been a well-matured research area that has been well studied using machine learning algorithms. It has several a...
- DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI : Abstract: The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, c...
- Exploiting Radio Frequency Fingerprints for Device Identification: Tackling Cross-receiver Challenges in the Source-data-free Scenario : Abstract: With the rapid proliferation of edge computing, Radio Frequency Fingerprint Identification (RFFI) has become increasingly important for secure device authentication. However, practical deplo...
- Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game : Abstract: We introduce Stackelberg Learning from Human Feedback (SLHF), a new framework for preference optimization. SLHF frames the alignment problem as a sequential-move game between two policies: a...
- Abacus: Self-Supervised Event Counting-Aligned Distributional Pretraining for Sequential User Modeling : Abstract: Modeling user purchase behavior is a critical challenge in display advertising systems, necessary for real-time bidding. The difficulty arises from the sparsity of positive user events and t...
- Persistent Multiscale Density-based Clustering : Abstract: Clustering is a cornerstone of modern data analysis. Detecting clusters in exploratory data analyses (EDA) requires algorithms that make few assumptions about the data. Density-based cluster...
- Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning : Abstract: Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventin...
- IoMT-based Automated Leukemia Classification using CNN and Higher Order Singular Value : Abstract: The Internet of Things (IoT) is a concept by which objects find identity and can communicate with each other in a network. One of the applications of the IoT is in the field of medicine, whi...
- Topic Modelling Black Box Optimization : Abstract: Choosing the number of topics $T$ in Latent Dirichlet Allocation (LDA) is a key design decision that strongly affects both the statistical fit and interpretability of topic models. In this w...
- A Novel Proposal in Wind Turbine Blade Failure Detection: An Integrated Approach to Energy Efficiency and Sustainability : Abstract: This paper presents a novel methodology for detecting faults in wind turbine blades using com-putational learning techniques. The study evaluates two models: the first employs logistic regre...
- Emergent Bias and Fairness in Multi-Agent Decision Systems : Abstract: Multi-agent systems have demonstrated the ability to improve performance on a variety of predictive tasks by leveraging collaborative decision making. However, the lack of effective evaluati...
- Multi-Fidelity Delayed Acceptance: hierarchical MCMC sampling for Bayesian inverse problems combining multiple solvers through deep neural networks : Abstract: Inverse uncertainty quantification (UQ) tasks such as parameter estimation are computationally demanding whenever dealing with physics-based models, and typically require repeated evaluation...
- Geometric Laplace Neural Operator : Abstract: Neural operators have emerged as powerful tools for learning mappings between function spaces, enabling efficient solutions to partial differential equations across varying inputs and domain...
- NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning : Abstract: Effective irrigation and nitrogen fertilization have a significant impact on crop yield. However, existing research faces two limitations: (1) the high complexity of optimizing water-nitroge...
- Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference : Abstract: Attention is the dominant source of latency during long-context LLM inference, an increasingly popular workload with reasoning models and RAG. We propose Kascade, a training-free sparse atte...
- Quantitative Verification of Fairness in Tree Ensembles : Abstract: This work focuses on quantitative verification of fairness in tree ensembles. Unlike traditional verification approaches that merely return a single counterexample when the fairness is viola...
- Multivariate Uncertainty Quantification with Tomographic Quantile Forests : Abstract: Quantifying predictive uncertainty is essential for safe and trustworthy real-world AI deployment. Yet, fully nonparametric estimation of conditional distributions remains challenging for mu...
- Pretrained Battery Transformer (PBT): A battery life prediction foundation model : Abstract: Early prediction of battery cycle life is essential for accelerating battery research, manufacturing, and deployment. Although machine learning methods have shown encouraging results, progre...
- Feature-Selective Representation Misdirection for Machine Unlearning : Abstract: As large language models (LLMs) are increasingly adopted in safety-critical and regulated sectors, the retention of sensitive or prohibited knowledge introduces escalating risks, ranging fro...
- CKA-Guided Modular Quantization: Beyond Bit-Width to Algorithmic Diversity : Abstract: Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differe...
- Sharpness-aware Second-order Latent Factor Model for High-dimensional and Incomplete Data : Abstract: Second-order Latent Factor (SLF) model, a class of low-rank representation learning methods, has proven effective at extracting node-to-node interaction patterns from High-dimensional and In...
- Sharpness-aware Federated Graph Learning : Abstract: One of many impediments to applying graph neural networks (GNNs) to large-scale real-world graph data is the challenge of centralized training, which requires aggregating data from different...
- Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models : Abstract: Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks...
- Neural emulation of gravity-driven geohazard runout : Abstract: Predicting geohazard runout is critical for protecting lives, infrastructure and ecosystems. Rapid mass flows, including landslides and avalanches, cause several thousand deaths across a wid...
- Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithms on Smooth Functions : Abstract: Rank-based zeroth-order (ZO) optimization -- which relies only on the ordering of function evaluations -- offers strong robustness to noise and monotone transformations, and underlies many s...
- A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering : Abstract: Spatial transcriptomics enables genome-wide expression analysis within native tissue context, yet identifying spatial domains remains challenging due to complex gene-spatial interactions. Ex...
- A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments : Abstract: Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. T...
- INTELLECT-3: Technical Report : Abstract: We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves...
- Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure : Abstract: Machine unlearning is a newly popularized technique for removing specific training data from a trained model, enabling it to comply with data deletion requests. While it protects the rights ...
- BUILD with Precision: Bottom-Up Inference of Linear DAGs : Abstract: Learning the structure of directed acyclic graphs (DAGs) from observational data is a central problem in causal discovery, statistical signal processing, and machine learning. Under a linear...
- AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation : Abstract: Market manipulation now routinely originates from coordinated social media campaigns, not isolated trades. Retail investors, regulators, and brokerages need tools that connect online narrati...
- Privacy Blur: Quantifying Privacy and Utility for Image Data Release : Abstract: Image data collected in the wild often contains private information such as faces and license plates, and responsible data release must ensure that this information stays hidden. At the same...
- In-Context Multi-Operator Learning with DeepOSets : Abstract: In-context Learning (ICL) is the remarkable capability displayed by some machine learning models to learn from examples in a prompt, without any further weight updates. ICL had originally be...
- CauSTream: Causal Spatio-Temporal Representation Learning for Streamflow Forecasting : Abstract: Streamflow forecasting is crucial for water resource management and risk mitigation. While deep learning models have achieved strong predictive performance, they often overlook underlying ph...
- Explainable AI in Big Data Fraud Detection : Abstract: Big Data has become central to modern applications in finance, insurance, and cybersecurity, enabling machine learning systems to perform large-scale risk assessments and fraud detection. Ho...
- Techno-economic optimization of a heat-pipe microreactor, part I: theory and cost optimization : Abstract: Microreactors, particularly heat-pipe microreactors (HPMRs), are compact, transportable, self-regulated power systems well-suited for access-challenged remote areas where costly fossil fuels...
- Towards Fine-Tuning-Based Site Calibration for Knowledge-Guided Machine Learning: A Summary of Results : Abstract: Accurate and cost-effective quantification of the agroecosystem carbon cycle at decision-relevant scales is essential for climate mitigation and sustainable agriculture. However, both transf...
- Surrogate Neural Architecture Codesign Package (SNAC-Pack) : Abstract: Neural Architecture Search is a powerful approach for automating model design, but existing methods struggle to accurately optimize for real hardware performance, often relying on proxy metr...
- Higher-Order LaSDI: Reduced Order Modeling with Multiple Time Derivatives : Abstract: Solving complex partial differential equations is vital in the physical sciences, but often requires computationally expensive numerical methods. Reduced-order models (ROMs) address this by ...
- Provably Extracting the Features from a General Superposition : Abstract: It is widely believed that complex machine learning models generally encode features through linear representations, but these features exist in superposition, making them challenging to rec...
- Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models : Abstract: We propose Dynamic Rank Reinforcement Learning (DR-RL), a novel framework that adaptively optimizes the low-rank factorization of Multi-Head Self-Attention (MHSA) in Large Language Models (L...
- Tracking Wildfire Assets with Commodity RFID and Gaussian Process Modeling : Abstract: This paper presents a novel, cost-effective, and scalable approach to track numerous assets distributed in forested environments using commodity Radio Frequency Identification (RFID) targeti...
- Governance by Evidence: Regulated Predictors in Decision-Tree Models : Abstract: Decision-tree methods are widely used on structured tabular data and are valued for interpretability across many sectors. However, published studies often list the predictors they use (for e...
- AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines : Abstract: Efficient AI inference on AMD's Versal AI Engine (AIE) is challenging due to tightly coupled VLIW execution, explicit datapaths, and local memory management. Prior work focused on first-gene...
- SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks : Abstract: Deep neural networks achieve impressive performance but remain difficult to interpret and control. We present SALVE (Sparse Autoencoder-Latent Vector Editing), a unified "discover, validate,...
- In-Context Semi-Supervised Learning : Abstract: There has been significant recent interest in understanding the capacity of Transformers for in-context learning (ICL), yet most theory focuses on supervised settings with explicitly labeled...
- BarcodeMamba+: Advancing State-Space Models for Fungal Biodiversity Research : Abstract: Accurate taxonomic classification from DNA barcodes is a cornerstone of global biodiversity monitoring, yet fungi present extreme challenges due to sparse labelling and long-tailed taxa dist...
- DSO: Direct Steering Optimization for Bias Mitigation : Abstract: Generative models are often deployed to make decisions on behalf of users, such as vision-language models (VLMs) identifying which person in a room is a doctor to help visually impaired indi...
- A Unification of Discrete, Gaussian, and Simplicial Diffusion : Abstract: To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euc...
- Introduction to Symbolic Regression in the Physical Sciences : Abstract: Symbolic regression (SR) has emerged as a powerful method for uncovering interpretable mathematical relationships from data, offering a novel route to both scientific discovery and efficient...
- Boosting t-SNE Efficiency for Sequencing Data: Insights from Kernel Selection : Abstract: Dimensionality reduction techniques are essential for visualizing and analyzing high-dimensional biological sequencing data. t-distributed Stochastic Neighbor Embedding (t-SNE) is widely use...
- Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence : Abstract: We evaluate adversarial robustness in tabular machine learning models used in financial decision making. Using credit scoring and fraud detection data, we apply gradient based attacks and me...
- TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration : Abstract: Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied...
- TENG++: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets under General Boundary Conditions : Abstract: Partial Differential Equations (PDEs) are central to modeling complex systems across physical, biological, and engineering domains, yet traditional numerical methods often struggle with high...
- Bridging Data and Physics: A Graph Neural Network-Based Hybrid Twin Framework : Abstract: Simulating complex unsteady physical phenomena relies on detailed mathematical models, simulated for instance by using the Finite Element Method (FEM). However, these models often exhibit di...
- Data Valuation for LLM Fine-Tuning: Efficient Shapley Value Approximation via Language Model Arithmetic : Abstract: Data is a critical asset for training large language models (LLMs), alongside compute resources and skilled workers. While some training data is publicly available, substantial investment is...
- AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs : Abstract: Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoR...
- Cross-Sample Augmented Test-Time Adaptation for Personalized Intraoperative Hypotension Prediction : Abstract: Intraoperative hypotension (IOH) poses significant surgical risks, but accurate prediction remains challenging due to patient-specific variability. While test-time adaptation (TTA) offers a ...
- Machine Learning Framework for Thrombosis Risk Prediction in Rotary Blood Pumps : Abstract: Thrombosis in rotary blood pumps arises from complex flow conditions that remain difficult to translate into reliable and interpretable risk predictions using existing computational models. ...
- A Tutorial on Dimensionless Learning: Geometric Interpretation and the Effect of Noise : Abstract: Dimensionless learning is a data-driven framework for discovering dimensionless numbers and scaling laws from experimental measurements. This tutorial introduces the method, explaining how i...
- Semantic-Constrained Federated Aggregation: Convergence Theory and Privacy-Utility Bounds for Knowledge-Enhanced Distributed Learning : Abstract: Federated learning enables collaborative model training across distributed data sources but suffers from slow convergence under non-IID data conditions. Existing solutions employ algorithmic...
- Yantra AI -- An intelligence platform which interacts with manufacturing operations : Abstract: Industry 4.0 is growing quickly, which has changed smart production by encouraging the use of real-time tracking, machine learning, and AI-driven systems to make operations run more smoothly...
- Twin Restricted Kernel Machines for Multiview Classification : Abstract: Multi-view learning (MVL) is an emerging field in machine learning that focuses on improving generalization performance by leveraging complementary information from multiple perspectives or ...
- ReactorFold: Generative discovery of nuclear reactor cores via emergent physical reasoning : Abstract: Designing nuclear reactor cores requires navigating large discrete design spaces governed by complex neutronic interactions. Traditional deterministic, metaheuristic, and machine-learning-as...
- KAN-Matrix: Visualizing Nonlinear Pairwise and Multivariate Contributions for Physical Insight : Abstract: Interpreting complex datasets remains a major challenge for scientists, particularly due to high dimensionality and collinearity among variables. We introduce a novel application of Kolmogor...
- TAO-Net: Two-stage Adaptive OOD Classification Network for Fine-grained Encrypted Traffic Classification : Abstract: Encrypted traffic classification aims to identify applications or services by analyzing network traffic data. One of the critical challenges is the continuous emergence of new applications, ...
- GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction : Abstract: Agentic Workflows (AWs) have emerged as a promising paradigm for solving complex tasks. However, the scalability of automating their generation is severely constrained by the high cost and l...
- A Special Case of Quadratic Extrapolation Under the Neural Tangent Kernel : Abstract: It has been demonstrated both theoretically and empirically that the ReLU MLP tends to extrapolate linearly for an out-of-distribution evaluation point. The machine learning literature provi...
- Surely Large Multimodal Models (Don't) Excel in Visual Species Recognition? : Abstract: Visual Species Recognition (VSR) is pivotal to biodiversity assessment and conservation, evolution research, and ecology and ecosystem management. Training a machine-learned model for VSR ty...
- D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models : Abstract: Image classification is a task essential for machine perception to achieve human-level image understanding. Multimodal models such as CLIP have been able to perform well on this task by lear...
- A Unified Generative-Predictive Framework for Deterministic Inverse Design : Abstract: Inverse design of heterogeneous material microstructures is a fundamentally ill-posed and famously computationally expensive problem. This is exacerbated by the high-dimensional design space...
- LLaDA2.0: Scaling Up Diffusion Language Models to 100B : Abstract: This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models ...
- How Do Graph Signals Affect Recommendation: Unveiling the Mystery of Low and High-Frequency Graph Signals : Abstract: Spectral graph neural networks (GNNs) are highly effective in modeling graph signals, with their success in recommendation often attributed to low-pass filtering. However, recent studies hig...
- SHARe-KAN: Holographic Vector Quantization for Memory-Bound Inference : Abstract: Kolmogorov-Arnold Networks (KANs) face a fundamental memory wall: their learned basis functions create parameter counts that impose extreme bandwidth demands, hindering deployment in memory-...
- Hybrid Quantum-Classical Ensemble Learning for S\&P 500 Directional Prediction : Abstract: Financial market prediction is a challenging application of machine learning, where even small improvements in directional accuracy can yield substantial value. Most models struggle to excee...
- DiscoverDCP: A Data-Driven Approach for Construction of Disciplined Convex Programs via Symbolic Regression : Abstract: We propose DiscoverDCP, a data-driven framework that integrates symbolic regression with the rule sets of Disciplined Convex Programming (DCP) to perform system identification. By enforcing ...
- Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges : Abstract: Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative para...
- Enigma: Application-Layer Privacy for Quantum Optimization on Untrusted Computers : Abstract: The Early Fault-Tolerant (EFT) era is emerging, where modest Quantum Error Correction (QEC) can enable quantum utility before full-scale fault tolerance. Quantum optimization is a leading ca...
- RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation : Abstract: Tree search-based methods have made significant progress in enhancing the code generation capabilities of large language models. However, due to the difficulty in effectively evaluating inte...
- TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles : Abstract: While modern Autonomous Vehicle (AV) systems can develop reliable driving policies under regular traffic conditions, they frequently struggle with safety-critical traffic scenarios. This dif...
- BashArena: A Control Setting for Highly Privileged AI Agents : Abstract: Future AI agents might run autonomously with elevated privileges. If these agents are misaligned, they might abuse these privileges to cause serious damage. The field of AI control develops ...
- BERT and CNN integrated Neural Collaborative Filtering for Recommender Systems : Abstract: Every day, a significant number of users visit the internet for different needs. The owners of a website generate profits from the user interaction with the contents or items of the website....
- How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code? : Abstract: The success of large language models for code relies on vast amounts of code data, including public open-source repositories, such as GitHub, and private, confidential code from companies. T...
- On Assessing the Relevance of Code Reviews Authored by Generative Models : Abstract: The use of large language models like ChatGPT in code review offers promising efficiency gains but also raises concerns about correctness and safety. Existing evaluation methods for code rev...
- Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality : Abstract: The rapid development of generative artificial intelligence (AI) and large language models (LLMs), and the availability of services that make them accessible, have led the general public to ...
- Managing Ambiguity: A Proof of Concept of Human-AI Symbiotic Sense-making based on Quantum-Inspired Cognitive Mechanism of Rogue Variable Detection : Abstract: Organizations increasingly operate in environments characterized by volatility, uncertainty, complexity, and ambiguity (VUCA), where early indicators of change often emerge as weak, fragment...
- Graph Pattern-based Association Rules Evaluated Under No-repeated-anything Semantics in the Graph Transactional Setting : Abstract: We introduce graph pattern-based association rules (GPARs) for directed labeled multigraphs such as RDF graphs. GPARs support both generative tasks, where a graph is extended, and evaluative...
- VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments : Abstract: This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limi...
- Governing rapid technological change: Policy Delphi on the future of European AI governance : Abstract: The rapid advancements in artificial intelligence (AI) present unique challenges for policymakers that seek to govern the technology. In this context, the Delphi method has become an establi...
- Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning : Abstract: Data-driven evolutionary algorithms has shown surprising results in addressing expensive optimization problems through robust surrogate modeling. Though promising, existing surrogate modelin...
- HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens : Abstract: Proteins inherently possess a consistent sequence-structure duality. The abundance of protein sequence data, which can be readily represented as discrete tokens, has driven fruitful developm...
- QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management : Abstract: Due to the significant variations in unmanned aerial vehicle (UAV) altitude and horizontal mobility, it becomes difficult for any single network to ensure continuous and reliable threedimens...
- "I am here for you": How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable : Abstract: General-purpose conversational AI chatbots and AI companions increasingly provide young adolescents with emotionally supportive conversations, raising questions about how conversational styl...
- Restless Multi-Process Multi-Armed Bandits with Applications to Self-Driving Microscopies : Abstract: High-content screening microscopy generates large amounts of live-cell imaging data, yet its potential remains constrained by the inability to determine when and where to image most effectiv...
- Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks : Abstract: Agentic AI introduces security vulnerabilities that traditional LLM safeguards fail to address. Although recent work by Unit 42 at Palo Alto Networks demonstrated that ChatGPT-4o successfull...
- A Roadmap for Applying Graph Neural Networks to Numerical Data: Insights from Cementitious Materials : Abstract: Machine learning (ML) has been increasingly applied in concrete research to optimize performance and mixture design. However, one major challenge in applying ML to cementitious materials is ...
- MALCDF: A Distributed Multi-Agent LLM Framework for Real-Time Cyber : Abstract: Traditional, centralized security tools often miss adaptive, multi-vector attacks. We present the Multi-Agent LLM Cyber Defense Framework (MALCDF), a practical setup where four large languag...
- Let the Barbarians In: How AI Can Accelerate Systems Performance Research : Abstract: Artificial Intelligence (AI) is beginning to transform the research process by automating the discovery of new solutions. This shift depends on the availability of reliable verifiers, which ...
- Sharing State Between Prompts and Programs : Abstract: The rise of large language models (LLMs) has introduced a new type of programming: natural language programming. By writing prompts that direct LLMs to perform natural language processing, c...
- Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation : Abstract: Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. In Vertical FL...
- Scaling Causal Mediation for Complex Systems: A Framework for Root Cause Analysis : Abstract: Modern operational systems ranging from logistics and cloud infrastructure to industrial IoT, are governed by complex, interdependent processes. Understanding how interventions propagate thr...
- Workflows vs Agents for Code Translation : Abstract: Translating algorithms from high-level languages like MATLAB to hardware description languages (HDLs) is a resource-intensive but necessary step for deployment on FPGAs and ASICs. While larg...
- CODE ACROSTIC: Robust Watermarking for Code Generation : Abstract: Watermarking large language models (LLMs) is vital for preventing their misuse, including the fabrication of fake news, plagiarism, and spam. It is especially important to watermark LLM-gene...
- Cyberswarm: a novel swarm intelligence algorithm inspired by cyber community dynamics : Abstract: Recommendation systems face challenges in dynamically adapting to evolving user preferences and interactions within complex social networks. Traditional approaches often fail to account for ...
- One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs : Abstract: Finetuning pretrained large language models (LLMs) has become the standard paradigm for developing downstream applications. However, its security implications remain unclear, particularly re...
- Multiscale Cross-Modal Mapping of Molecular, Pathologic, and Radiologic Phenotypes in Lipid-Deficient Clear Cell Renal CellCarcinoma : Abstract: Clear cell renal cell carcinoma (ccRCC) exhibits extensive intratumoral heterogeneity on multiple biological scales, contributing to variable clinical outcomes and limiting the effectiveness...
- Factor(U,T): Controlling Untrusted AI by Monitoring their Plans : Abstract: As AI capabilities advance, we increasingly rely on powerful models to decompose complex tasks $\unicode{x2013}$ but what if the decomposer itself is malicious? Factored cognition protocols ...
- VERAFI: Verified Agentic Financial Intelligence through Neurosymbolic Policy Generation : Abstract: Financial AI systems suffer from a critical blind spot: while Retrieval-Augmented Generation (RAG) excels at finding relevant documents, language models still generate calculation errors and...
- Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs : Abstract: Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the impla...
- Zero-Knowledge Audit for Internet of Agents: Privacy-Preserving Communication Verification with Model Context Protocol : Abstract: Existing agent communication frameworks face critical limitations in providing verifiable audit trails without compromising the privacy and confidentiality of agent interactions. The protect...
- Promoting Fairness in Information Access within Social Networks : Abstract: The advent of online social networks has facilitated fast and wide spread of information. However, some users, especially members of minority groups, may be less likely to receive informatio...
- Tourists Profiling by Interest Analysis : Abstract: With the recent digital revolution, analyzing of tourists' behaviors and research fields associated with it have changed profoundly. It is now easier to examine behaviors of tourists using d...
- Algorithmic Criminal Liability in Greenwashing: Comparing India, United States, and European Union : Abstract: AI-powered greenwashing has emerged as an insidious challenge within corporate sustainability governance, exacerbating the opacity of environmental disclosures and subverting regulatory over...
- Artism: AI-Driven Dual-Engine System for Art Generation and Critique : Abstract: This paper proposes a dual-engine AI architectural method designed to address the complex problem of exploring potential trajectories in the evolution of art. We present two interconnected c...
- Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning : Abstract: Human beings solve complex problems through critical thinking, where reasoning and evaluation are intertwined to converge toward correct solutions. However, most existing large language mode...
- A Decision-Theoretic Approach for Managing Misalignment : Abstract: When should we delegate decisions to AI systems? While the value alignment literature has developed techniques for shaping AI values, less attention has been paid to how to determine, under ...
- Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision : Abstract: High-quality mathematical reasoning supervision requires diverse reasoning styles, long-form traces, and effective tool integration, capabilities that existing datasets provide only in limit...
- Intent-Driven UAM Rescheduling : Abstract: Due to the restricted resources, efficient scheduling in vertiports has received much more attention in the field of Urban Air Mobility (UAM). For the scheduling problem, we utilize a Mixed ...
- Outer-Learning Framework for Playing Multi-Player Trick-Taking Card Games: A Case Study in Skat : Abstract: In multi-player card games such as Skat or Bridge, the early stages of the game, such as bidding, game selection, and initial card selection, are often more critical to the success of the pl...
- Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations : Abstract: This paper deals with improving the capabilities of Large Language Models (LLM) to provide route instructions for pedestrian wayfinders by means of qualitative spatial relations.
- SCOPE: Prompt Evolution for Enhancing Agent Effectiveness : Abstract: Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts. However, a critical bottleneck remains: while agents have access to this ...
- Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis : Abstract: Controller synthesis is a formal method approach for automatically generating Labeled Transition System (LTS) controllers that satisfy specified properties. The efficiency of the synthesis p...
- CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications : Abstract: The automated and intelligent processing of massive remote sensing (RS) datasets is critical in Earth observation (EO). Existing automated systems are normally task-specific, lacking a unifi...
- A Clustering-Based Variable Ordering Framework for Relaxed Decision Diagrams for Maximum Weighted Independent Set Problem : Abstract: Efficient exact algorithms for Discrete Optimization (DO) rely heavily on strong primal and dual bounds. Relaxed Decision Diagrams (DDs) provide a versatile mechanism for deriving such dual ...
- Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models : Abstract: Large language models (LLMs) have demonstrated impressive performance across various language tasks. However, existing LLM reasoning strategies mainly rely on the LLM itself with fast or slo...
- Agentic AI for Integrated Sensing and Communication: Analysis, Framework, and Case Study : Abstract: Integrated sensing and communication (ISAC) has emerged as a key development direction in the sixth-generation (6G) era, which provides essential support for the collaborative sensing and co...
- LADY: Linear Attention for Autonomous Driving Efficiency without Transformers : Abstract: End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a qua...
- Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation : Abstract: The evaluation of Large Language Models (LLMs) in complex reasoning domains typically relies on performance alignment with ground-truth oracles. In the domain of chess, this standard manifes...
- AgroAskAI: A Multi-Agentic AI Framework for Supporting Smallholder Farmers' Enquiries Globally : Abstract: Agricultural regions in rural areas face damage from climate-related risks, including droughts, heavy rainfall, and shifting weather patterns. Prior research calls for adaptive risk-manageme...
- IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection : Abstract: Large Language Models (LLMs) currently exhibit low success rates in generating correct and intent-aligned Infrastructure as Code (IaC). This research investigated methods to improve LLM-base...
Research Sources: 391 | Generated: 12/19/2025
