AI RESEARCH PAPERS & ACADEMIC SOURCES
- From Pretraining to Privacy: Federated Ultrasound Foundation Model with Self-Supervised Learning : Abstract: Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, traditional ultrasound diagnostics relies heavily on physician exp...
- If you can describe it, they can see it: Cross-Modal Learning of Visual Concepts from Textual Descriptions : Abstract: Humans can visualize new and unknown concepts from their natural language description, based on their experience and previous knowledge. Insipired by this, we present a way to extend this ab...
- Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors : Abstract: Adversarial camouflage is a widely used physical attack against vehicle detectors for its superiority in multi-view attack performance. One promising approach involves using differentiable n...
- Cascaded Dual Vision Transformer for Accurate Facial Landmark Detection : Abstract: Facial landmark detection is a fundamental problem in computer vision for many downstream applications. This paper introduces a new facial landmark detector based on vision transformers, whi...
- SynJAC: Synthetic-data-driven Joint-granular Adaptation and Calibration for Domain Specific Scanned Document Key Information Extraction : Abstract: Visually Rich Documents (VRDs), comprising elements such as charts, tables, and paragraphs, convey complex information across diverse domains. However, extracting key information from these ...
- ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction : Abstract: NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary ...
- DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving : Abstract: Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potentia...
- MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training : Abstract: While leveraging abundant human videos and simulated robot data poses a scalable solution to the scarcity of real-world robot data, the generalization capability of existing vision-language-...
- A Preprocessing Framework for Video Machine Vision under Compression : Abstract: There has been a growing trend in compressing and transmitting videos from terminals for machine vision tasks. Nevertheless, most video coding optimization method focus on minimizing distort...
- Generative Preprocessing for Image Compression with Pre-trained Diffusion Models : Abstract: Preprocessing is a well-established technique for optimizing compression, yet existing methods are predominantly Rate-Distortion (R-D) optimized and constrained by pixel-level fidelity. This...
- EPSM: A Novel Metric to Evaluate the Safety of Environmental Perception in Autonomous Driving : Abstract: Extensive evaluation of perception systems is crucial for ensuring the safety of intelligent vehicles in complex driving scenarios. Conventional performance metrics such as precision, recall...
- BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization : Abstract: We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and dept...
- Meta-learners for few-shot weakly-supervised optic disc and cup segmentation on fundus images : Abstract: This study develops meta-learners for few-shot weakly-supervised segmentation (FWS) to address the challenge of optic disc (OD) and optic cup (OC) segmentation for glaucoma diagnosis with li...
- A Gaussian Parameterization for Direct Atomic Structure Identification in Electron Tomography : Abstract: Atomic electron tomography (AET) enables the determination of 3D atomic structures by acquiring a sequence of 2D tomographic projection measurements of a particle and then computationally so...
- Artificial Intelligence for the Assessment of Peritoneal Carcinosis during Diagnostic Laparoscopy for Advanced Ovarian Cancer : Abstract: Advanced Ovarian Cancer (AOC) is often diagnosed at an advanced stage with peritoneal carcinosis (PC). Fagotti score (FS) assessment at diagnostic laparoscopy (DL) guides treatment planning ...
- PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents : Abstract: This paper proposes PyFi, a novel framework for pyramid-like financial image understanding that enables vision language models (VLMs) to reason through question chains in a progressive, simp...
- Spatia: Video Generation with Updatable Spatial Memory : Abstract: Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we p...
- In Pursuit of Pixel Supervision for Visual Pre-training : Abstract: At the most basic level, pixels are the source of the visual information through which we perceive the world. Pixels contain information at all levels, ranging from low-level attributes to h...
- DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models : Abstract: In recent multimodal research, the diffusion paradigm has emerged as a promising alternative to the autoregressive paradigm (AR), owing to its unique decoding advantages. However, due to the...
- Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering : Abstract: We present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile devices. GPiCA utilizes a unique ...
- Multi-View Foundation Models : Abstract: Foundation models are vital tools in various Computer Vision applications. They take as input a single RGB image and output a deep feature representation that is useful for various applicati...
- GateFusion: Hierarchical Gated Cross-Modal Fusion for Active Speaker Detection : Abstract: Active Speaker Detection (ASD) aims to identify who is currently speaking in each frame of a video. Most state-of-the-art approaches rely on late fusion to combine visual and audio features,...
- End-to-End Training for Autoregressive Video Diffusion via Self-Resampling : Abstract: Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-tra...
- VLIC: Vision-Language Models As Perceptual Judges for Human-Aligned Image Compression : Abstract: Evaluations of image compression performance which include human preferences have generally found that naive distortion functions such as MSE are insufficiently aligned to human perception. ...
- Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning : Abstract: The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing metho...
- Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift : Abstract: Soft labels generated by teacher models have become a dominant paradigm for knowledge transfer and recent large-scale dataset distillation such as SRe2L, RDED, LPLD, offering richer supervis...
- InpaintDPO: Mitigating Spatial Relationship Hallucinations in Foreground-conditioned Inpainting via Diverse Preference Optimization : Abstract: Foreground-conditioned inpainting, which aims at generating a harmonious background for a given foreground subject based on the text prompt, is an important subfield in controllable image ge...
- IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning : Abstract: We propose \textbf{IC-Effect}, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while...
- Towards Physically-Based Sky-Modeling For Image Based Lighting : Abstract: Accurate environment maps are a key component for rendering photorealistic outdoor scenes with coherent illumination. They enable captivating visual arts, immersive virtual reality, and a wi...
- OccSTeP: Benchmarking 4D Occupancy Spatio-Temporal Persistence : Abstract: Autonomous driving requires a persistent understanding of 3D scenes that is robust to temporal disturbances and accounts for potential future actions. We introduce a new concept of 4D Occupa...
- Persistent feature reconstruction of resident space objects (RSOs) within inverse synthetic aperture radar (ISAR) images : Abstract: With the rapidly growing population of resident space objects (RSOs) in the near-Earth space environment, detailed information about their condition and capabilities is needed to provide Spa...
- Robust Multi-view Camera Calibration from Dense Matches : Abstract: Estimating camera intrinsics and extrinsics is a fundamental problem in computer vision, and while advances in structure-from-motion (SfM) have improved accuracy and robustness, open challen...
- Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition : Abstract: Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In ...
- FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision : Abstract: We introduce FlexAvatar, a method for creating high-quality and complete 3D head avatars from a single image. A core challenge lies in the limited availability of multi-view data and the ten...
- MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors : Abstract: In this paper, we focus on online zero-shot monocular 3D instance segmentation, a novel practical setting where existing approaches fail to perform because they rely on posed RGB-D sequences...
- On the Effectiveness of Textual Prompting with Lightweight Fine-Tuning for SAM3 Remote Sensing Segmentation : Abstract: Remote sensing (RS) image segmentation is constrained by the limited availability of annotated data and a gap between overhead imagery and natural images used to train foundational models. T...
- GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models : Abstract: The text encoder is a critical component of text-to-image and text-to-video diffusion models, fundamentally determining the semantic fidelity of the generated content. However, its developme...
- BLANKET: Anonymizing Faces in Infant Video Recordings : Abstract: Ensuring the ethical use of video data involving human subjects, particularly infants, requires robust anonymization methods. We propose BLANKET (Baby-face Landmark-preserving ANonymization ...
- An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain : Abstract: The remote sensing community has recently seen the emergence of methods based on Large Vision and Language Models (LVLMs) that can address multiple tasks at the intersection of computer visi...
- EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration : Abstract: Visual Emotion Comprehension (VEC) aims to infer sentiment polarities or emotion categories from affective cues embedded in images. In recent years, Multimodal Large Language Models (MLLMs) ...
- DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations : Abstract: Portrait animation from a single source image and a driving video is a long-standing problem. Recent approaches tend to adopt diffusion-based image/video generation models for realistic and ...
- VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital Forensics : Abstract: Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produ...
- Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting : Abstract: Feed-forward 3D Gaussian Splatting (3DGS) models enable real-time scene generation but are hindered by suboptimal pixel-aligned primitive placement, which relies on a dense, rigid grid and l...
- The LUMirage: An independent evaluation of zero-shot performance in the LUMIR challenge : Abstract: The LUMIR challenge represents an important benchmark for evaluating deformable image registration methods on large-scale neuroimaging data. While the challenge demonstrates that modern deep...
- RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting : Abstract: Estimating 3D human poses from 2D images remains challenging due to occlusions and projective ambiguity. Multi-view learning-based approaches mitigate these issues but often fail to generali...
- Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception : Abstract: Wildlife object detection plays a vital role in biodiversity conservation, ecological monitoring, and habitat protection. However, this task is often challenged by environmental variability,...
- ST-DETrack: Identity-Preserving Branch Tracking in Entangled Plant Canopies via Dual Spatiotemporal Evidence : Abstract: Automated extraction of individual plant branches from time-series imagery is essential for high-throughput phenotyping, yet it remains computationally challenging due to non-rigid growth dy...
- CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning : Abstract: Face recognition systems store face templates for efficient matching. Once leaked, these templates pose a threat: inverting them can yield photorealistic surrogates that compromise privacy a...
- Step-GUI Technical Report : Abstract: Recent advances in multimodal large language models unlock unprecedented opportunities for GUI automation. However, a fundamental challenge remains: how to efficiently acquire high-quality t...
- Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry : Abstract: Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structure...
- Preserving Marker Specificity with Lightweight Channel-Independent Representation Learning : Abstract: Multiplexed tissue imaging measures dozens of protein markers per cell, yet most deep learning models still apply early channel fusion, assuming shared structure across markers. We investiga...
- See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball : Abstract: Computer vision and video understanding have transformed sports analytics by enabling large-scale, automated analysis of game dynamics from broadcast footage. Despite significant advances in...
- SemanticBridge -- A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap Analysis : Abstract: We propose a novel dataset that has been specifically designed for 3D semantic segmentation of bridges and the domain gap analysis caused by varying sensors. This addresses a critical need i...
- Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics : Abstract: Human conversation involves continuous exchanges of speech and nonverbal cues such as head nods, gaze shifts, and facial expressions that convey attention and emotion. Modeling these bidirec...
- Vision-based module for accurately reading linear scales in a laboratory : Abstract: Capabilities and the number of vision-based models are increasing rapidly. And these vision models are now able to do more tasks like object detection, image classification, instance segment...
- A Masked Reverse Knowledge Distillation Method Incorporating Global and Local Information for Image Anomaly Detection : Abstract: Knowledge distillation is an effective image anomaly detection and localization scheme. However, a major drawback of this scheme is its tendency to overly generalize, primarily due to the si...
- MECAD: A multi-expert architecture for continual anomaly detection : Abstract: In this paper we propose MECAD, a novel approach for continual anomaly detection using a multi-expert architecture. Our system dynamically assigns experts to object classes based on feature ...
- Prototypical Learning Guided Context-Aware Segmentation Network for Few-Shot Anomaly Detection : Abstract: Few-shot anomaly detection (FSAD) denotes the identification of anomalies within a target category with a limited number of normal samples. Existing FSAD methods largely rely on pre-trained ...
- Automated Motion Artifact Check for MRI (AutoMAC-MRI): An Interpretable Framework for Motion Artifact Detection and Severity Assessment : Abstract: Motion artifacts degrade MRI image quality and increase patient recalls. Existing automated quality assessment methods are largely limited to binary decisions and provide little interpretabi...
- KD360-VoxelBEV: LiDAR and 360-degree Camera Cross Modality Knowledge Distillation for Bird's-Eye-View Segmentation : Abstract: We present the first cross-modality distillation framework specifically tailored for single-panoramic-camera Bird's-Eye-View (BEV) segmentation. Our approach leverages a novel LiDAR image re...
- SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation : Abstract: Weakly Supervised Semantic Segmentation (WSSS) with image level labels aims to produce pixel level predictions without requiring dense annotations. While recent approaches have leveraged gen...
- MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement : Abstract: Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by integrating a high-resolution panchromatic (PAN) image with its corresponding low-resolution multispectral (MS)...
- Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification : Abstract: Medical artificial intelligence (AI) systems, particularly multimodal vision-language models (VLM), often exhibit intersectional biases where models are systematically less confident in diag...
- Null-LoRA: Low-Rank Adaptation on Null Space : Abstract: Parameter-efficient fine-tuning methods have gained considerable popularity for adapting large-scale models to downstream tasks, particularly LoRA and its variants. Existing methods perform ...
- SLCFormer: Spectral-Local Context Transformer with Physics-Grounded Flare Synthesis for Nighttime Flare Removal : Abstract: Lens flare is a common nighttime artifact caused by strong light sources scattering within camera lenses, leading to hazy streaks, halos, and glare that degrade visual quality. However, exis...
- From Camera to World: A Plug-and-Play Module for Human Mesh Transformation : Abstract: Reconstructing accurate 3D human meshes in the world coordinate system from in-the-wild images remains challenging due to the lack of camera rotation information. While existing methods achi...
- TBC: A Target-Background Contrast Metric for Low-Altitude Infrared and Visible Image Fusion : Abstract: Infrared and visible image fusion is a pivotal technology in low-altitude UAV reconnaissance missions, providing high-quality data support for downstream tasks such as target detection and t...
- ERIENet: An Efficient RAW Image Enhancement Network under Low-Light Environment : Abstract: RAW images have shown superior performance than sRGB images in many image processing tasks, especially for low-light image enhancement. However, most existing methods for RAW-based low-light...
- Robust and Calibrated Detection of Authentic Multimedia Content : Abstract: Generative models can synthesize highly realistic content, so-called deepfakes, that are already being misused at scale to undermine digital media authenticity. Current deepfake detection me...
- Criticality Metrics for Relevance Classification in Safety Evaluation of Object Detection in Automated Driving : Abstract: Ensuring safety is the primary objective of automated driving, which necessitates a comprehensive and accurate perception of the environment. While numerous performance evaluation metrics ex...
- Cross-modal ultra-scale learning with tri-modalities of renal biopsy images for glomerular multi-disease auxiliary diagnosis : Abstract: Constructing a multi-modal automatic classification model based on three types of renal biopsy images can assist pathologists in glomerular multi-disease identification. However, the substan...
- EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence : Abstract: Recent spatial intelligence approaches typically attach 3D cues to 2D reasoning pipelines or couple MLLMs with black-box reconstruction modules, leading to weak spatial consistency, limited ...
- Explainable Action Form Assessment by Exploiting Multimodal Chain-of-Thoughts Reasoning : Abstract: Evaluating whether human action is standard or not and providing reasonable feedback to improve action standardization is very crucial but challenging in real-world scenarios. However, curre...
- Borrowing from anything: A generalizable framework for reference-guided instance editing : Abstract: Reference-guided instance editing is fundamentally limited by semantic entanglement, where a reference's intrinsic appearance is intertwined with its extrinsic attributes. The key challenge ...
- 3DProxyImg: Controllable 3D-Aware Animation Synthesis from Single Image via 2D-3D Aligned Proxy Embedding : Abstract: 3D animation is central to modern visual media, yet traditional production pipelines remain labor-intensive, expertise-demanding, and computationally expensive. Recent AIGC-based approaches ...
- Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets : Abstract: The rapid evolution of text-to-image generation models has revolutionized visual content creation. While commercial products like Nano Banana Pro have garnered significant attention, their p...
- Uni-Parser Technical Report : Abstract: This technical report introduces Uni-Parser, an industrial-grade document parsing engine tailored for scientific literature and patents, delivering high throughput, robust accuracy, and cost...
- PMMD: A pose-guided multi-view multi-modal diffusion for person generation : Abstract: Generating consistent human images with controllable pose and appearance is essential for applications in virtual try on, image editing, and digital human creation. Current methods often suf...
- Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank : Abstract: Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key componen...
- Asynchronous Event Stream Noise Filtering for High-frequency Structure Deformation Measurement : Abstract: Large-scale structures suffer high-frequency deformations due to complex loads. However, harsh lighting conditions and high equipment costs limit measurement methods based on traditional hig...
- MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance : Abstract: Scenes reconstructed by 3D Gaussian Splatting (3DGS) trained on low-resolution (LR) images are unsuitable for high-resolution (HR) rendering. Consequently, a 3DGS super-resolution (SR) metho...
- Model Agnostic Preference Optimization for Medical Image Segmentation : Abstract: Preference optimization offers a scalable supervision paradigm based on relative preference signals, yet prior attempts in medical image segmentation remain model-specific and rely on low-di...
- Evaluating the Capability of Video Question Generation for Expert Knowledge Elicitation : Abstract: Skilled human interviewers can extract valuable information from experts. This raises a fundamental question: what makes some questions more effective than others? To address this, a quantit...
- Beyond Proximity: A Keypoint-Trajectory Framework for Classifying Affiliative and Agonistic Social Networks in Dairy Cattle : Abstract: Precision livestock farming requires objective assessment of social behavior to support herd welfare monitoring, yet most existing approaches infer interactions using static proximity thresh...
- Where is the Watermark? Interpretable Watermark Detection at the Block Level : Abstract: Recent advances in generative AI have enabled the creation of highly realistic digital content, raising concerns around authenticity, ownership, and misuse. While watermarking has become an ...
- Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities : Abstract: Person recognition systems often rely on audio, visual, or behavioral cues, but real-world conditions frequently result in missing or degraded modalities. To address this challenge, we propo...
- Puzzle Curriculum GRPO for Vision-Centric Reasoning : Abstract: Recent reinforcement learning (RL) approaches like outcome-supervised GRPO have advanced chain-of-thought reasoning in Vision Language Models (VLMs), yet key issues linger: (i) reliance on c...
- TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation : Abstract: We introduce TalkVerse, a large-scale, open corpus for single-person, audio-driven talking video generation designed to enable fair, reproducible comparison across methods. While current sta...
- Improving Pre-trained Segmentation Models using Post-Processing : Abstract: Gliomas are the most common malignant brain tumors in adults and are among the most lethal. Despite aggressive treatment, the median survival rate is less than 15 months. Accurate multiparam...
- PANDA-PLUS-Bench: A Clinical Benchmark for Evaluating Robustness of AI Foundation Models in Prostate Cancer Diagnosis : Abstract: Artificial intelligence foundation models are increasingly deployed for prostate cancer Gleason grading, where GP3/GP4 distinction directly impacts treatment decisions. However, these models...
- Vibe Spaces for Creatively Connecting and Expressing Visual Concepts : Abstract: Creating new visual concepts often requires connecting distinct ideas through their most relevant shared attributes -- their vibe. We introduce Vibe Blending, a novel task for generating coh...
- Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris : Abstract: Biologists have long combined visuals with textual field notes to re-identify (Re-ID) animals. Contemporary AI tools automate this for species with distinctive morphological features but rem...
- Isolated Sign Language Recognition with Segmentation and Pose Estimation : Abstract: The recent surge in large language models has automated translations of spoken and written languages. However, these advances remain largely inaccessible to American Sign Language (ASL) user...
- HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering : Abstract: Video Large Language Models (Video-LLMs) are rapidly improving, yet current Video Question Answering (VideoQA) benchmarks often allow questions to be answered from a single salient cue, unde...
- Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification : Abstract: Vision-language models (VLMs) have demonstrated significant potential in Visual Question Answering (VQA). However, the susceptibility of VLMs to hallucinations can lead to overconfident yet ...
- AquaDiff: Diffusion-Based Underwater Image Enhancement for Addressing Color Distortion : Abstract: Underwater images are severely degraded by wavelength-dependent light absorption and scattering, resulting in color distortion, low contrast, and loss of fine details that hinder vision-base...
- The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics : Abstract: Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We p...
- SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning : Abstract: For robots navigating in human-populated environments, safety and social compliance are equally critical, yet prior work has mostly emphasized safety. Socially compliant navigation that acco...
- SkyCap: Bitemporal VHR Optical-SAR Quartets for Amplitude Change Detection and Foundation-Model Evaluation : Abstract: Change detection for linear infrastructure monitoring requires reliable high-resolution data and regular acquisition cadence. Optical very-high-resolution (VHR) imagery is interpretable and ...
- Explaining the Reasoning of Large Language Models Using Attribution Graphs : Abstract: Large language models (LLMs) exhibit remarkable capabilities, yet their reasoning remains opaque, raising safety and trust concerns. Attribution methods, which assign credit to input feature...
- VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression? : Abstract: The computational and memory overheads associated with expanding the context window of LLMs severely limit their scalability. A noteworthy solution is vision-text compression (VTC), exemplif...
- Emotion Recognition in Signers : Abstract: Recognition of signers' emotions suffers from one theoretical challenge and one practical challenge, namely, the overlap between grammatical and affective facial expressions and the scarcity...
- ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I : Abstract: The rapid development of Generative AI is bringing innovative changes to education and assessment. As the prevalence of students utilizing AI for assignments increases, concerns regarding ac...
- Quantifying Return on Security Controls in LLM Systems : Abstract: Although large language models (LLMs) are increasingly used in security-critical workflows, practitioners lack quantitative guidance on which safeguards are worth deploying. This paper intro...
- HERO: Hierarchical Traversable 3D Scene Graphs for Embodied Navigation Among Movable Obstacles : Abstract: 3D Scene Graphs (3DSGs) constitute a powerful representation of the physical world, distinguished by their abilities to explicitly model the complex spatial, semantic, and functional relatio...
- Revisiting the Reliability of Language Models in Instruction-Following : Abstract: Advanced LLMs have achieved near-ceiling instruction-following accuracy on benchmarks such as IFEval. However, these impressive scores do not necessarily translate to reliable services in re...
- SoMe: A Realistic Benchmark for LLM-based Social Media Agents : Abstract: Intelligent agents powered by large language models (LLMs) have recently demonstrated impressive capabilities and gained increasing popularity on social media platforms. While LLM agents are...
- Effectively Detecting and Responding to Online Harassment with Large Language Models : Abstract: Online harassment has been a persistent issue in the online space. Predominantly, research focused on online harassment in public social media platforms, while less is placed on private mess...
- Characterizing Mamba's Selective Memory using Auto-Encoders : Abstract: State space models (SSMs) are a promising alternative to transformers for language modeling because they use fixed memory during inference. However, this fixed memory usage requires some inf...
- Evaluating Metrics for Safety with LLM-as-Judges : Abstract: LLMs (Large Language Models) are increasingly used in text processing pipelines to intelligently respond to a variety of inputs and generation tasks. This raises the possibility of replacing...
- You Never Know a Person, You Only Know Their Defenses: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations : Abstract: Psychological defenses are strategies, often automatic, that people use to manage distress. Rigid or overuse of defenses is negatively linked to mental health and shapes what speakers disclo...
- Bolmo: Byteifying the Next Generation of Language Models : Abstract: We introduce Bolmo, the first family of competitive fully open byte-level language models (LMs) at the 1B and 7B parameter scales. In contrast to prior research on byte-level LMs, which focu...
- An Empirical Study on Chinese Character Decomposition in Multiword Expression-Aware Neural Machine Translation : Abstract: Word meaning, representation, and interpretation play fundamental roles in natural language understanding (NLU), natural language processing (NLP), and natural language generation (NLG) task...
- From Data to Dialogue: Unlocking Language for All : Abstract: Traditional linguists have proposed the use of a General Service List (GSL) to assist new language learners in identifying the most important words in English. This process requires linguist...
- Learning inflection classes using Adaptive Resonance Theory : Abstract: The concept of inflection classes is an abstraction used by linguists, and provides a means to describe patterns in languages that give an analogical base for deducing previously unencounter...
- CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing : Abstract: Large language models (LLMs) are increasingly applied in long-context scenarios such as multi-turn conversations. However, long contexts pose significant challenges for inference efficiency,...
- When a Nation Speaks: Machine Learning and NLP in People's Sentiment Analysis During Bangladesh's 2024 Mass Uprising : Abstract: Sentiment analysis, an emerging research area within natural language processing (NLP), has primarily been explored in contexts like elections and social media trends, but there remains a si...
- Toward expert-level motivational interviewing for health behavior improvement with LLMs : Abstract: Background: Motivational interviewing (MI) is an effective counseling approach for promoting health behavior change, but its impact is constrained by the need for highly trained human counse...
- ORACLE: Time-Dependent Recursive Summary Graphs for Foresight on News Data Using LLMs : Abstract: ORACLE turns daily news into week-over-week, decision-ready insights for one of the Finnish University of Applied Sciences. The platform crawls and versions news, applies University-specific...
- Dual-Density Inference for Efficient Language Model Reasoning : Abstract: Large Language Models (LLMs) have shown impressive capabilities in complex reasoning tasks. However, current approaches employ uniform language density for both intermediate reasoning and fi...
- Adversarial versification in portuguese as a jailbreak operator in LLMs : Abstract: Recent evidence shows that the versification of prompts constitutes a highly effective adversarial mechanism against aligned LLMs. The study 'Adversarial poetry as a universal single-turn ja...
- Why Your Academic Field Is Everywhere at Once: A Case Study of Arabic Linguistics : Abstract: This study applies Brookes' Measure of Categorical Dispersion (Δ) to analyze the thematic structure of contemporary Arabic Applied Linguistics research. Using a comprehensive, real-world dat...
- Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies : Abstract: Extracting structured information from zeolite synthesis experimental procedures is critical for materials discovery, yet existing methods have not systematically evaluated Large Language Mo...
- Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues : Abstract: The deployment of Large Language Models (LLMs) in interactive systems necessitates a deep alignment with the nuanced and dynamic preferences of individual users. Current alignment techniques...
- Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capability of Large Language Models (LLMs). Current RLVR approaches typically conduct training acro...
- SynGP500: A Clinically-Grounded Synthetic Dataset of Australian General Practice Medical Notes : Abstract: We introduce SynGP500, a clinician-curated collection of 500 synthetic Australian general practice medical notes. The dataset integrates curriculum-based clinical breadth (RACGP 2022 Curricu...
- The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres : Abstract: Moralizations - arguments that invoke moral values to justify demands or positions - are a yet underexplored form of persuasive communication. We present the Moralization Corpus, a novel mul...
- FAME: Fictional Actors for Multilingual Erasure : Abstract: LLMs trained on web-scale data raise concerns about privacy and the right to be forgotten. To address these issues, Machine Unlearning provides techniques to remove specific information from...
- Yes-MT's Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024 : Abstract: This paper presents the systems submitted by the Yes-MT team for the Low-Resource Indic Language Translation Shared Task at WMT 2024 (Pakray et al., 2024), focusing on translating between En...
- RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA : Abstract: Large language models (LLMs) often generate hallucinations in knowledge-intensive QA due to parametric knowledge limitations. While existing methods like KG-CoT improve reliability by integr...
- From NLG Evaluation to Modern Student Assessment in the Era of ChatGPT: The Great Misalignment Problem and Pedagogical Multi-Factor Assessment (P-MFA) : Abstract: This paper explores the growing epistemic parallel between NLG evaluation and grading of students in a Finnish University. We argue that both domains are experiencing a Great Misalignment Pr...
- MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers : Abstract: Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a...
- Rakuten Data Release: A Large-Scale and Long-Term Reviews Corpus for Hotel Domain : Abstract: This paper presents a large-scale corpus of Rakuten Travel Reviews. Our collection contains 7.3 million customer reviews for 16 years, ranging from 2009 to 2024. Each record in the dataset c...
- Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning : Abstract: Test-time reinforcement learning mitigates the reliance on annotated data by using majority voting results as pseudo-labels, emerging as a complementary direction to reinforcement learning w...
- SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification : Abstract: Disclaimer: Samples in this paper may be harmful and cause discomfort. Multimodal large language models (MLLMs) enable multimodal generation but inherit toxic, biased, and NSFW signals fro...
- DASH: Dialogue-Aware Similarity and Handshake Recognition for Topic Segmentation in Public-Channel Conversations : Abstract: Dialogue Topic Segmentation (DTS) is crucial for understanding task-oriented public-channel communications, such as maritime VHF dialogues, which feature informal speech and implicit transit...
- Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams : Abstract: Multimodal scientific reasoning remains a significant challenge for large language models (LLMs), particularly in chemistry, where problem-solving relies on symbolic diagrams, molecular stru...
- Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models : Abstract: The quadratic computational complexity of MultiHead SelfAttention (MHSA) remains a fundamental bottleneck in scaling Large Language Models (LLMs) for longcontext tasks. While sparse and line...
- DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline : Abstract: Objectives: To evaluate large language model (LLM) performance on pharmacy licensure-style question-answering (QA) tasks and develop an external knowledge integration method to improve their...
- Integrating Large Language Models and Knowledge Graphs to Capture Political Viewpoints in News Media : Abstract: News sources play a central role in democratic societies by shaping political and social discourse through specific topics, viewpoints and voices. Understanding these dynamics is essential f...
- T5Gemma 2: Seeing, Reading, and Understanding Longer : Abstract: We introduce T5Gemma 2, the next generation of the T5Gemma family of lightweight open encoder-decoder models, featuring strong multilingual, multimodal and long-context capabilities. T5Gemma...
- Dynamical stability for dense patterns in discrete attractor neural networks : Abstract: Neural networks storing multiple discrete attractors are canonical models of biological memory. Previously, the dynamical stability of such networks could only be guaranteed under highly res...
- Dexterous Manipulation through Imitation Learning: A Survey : Abstract: Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated fing...
- Robust Tensor Principal Component Analysis: Exact Recovery via Deterministic Model : Abstract: Tensor, also known as multi-dimensional array, arises from many applications in signal processing, manufacturing processes, healthcare, among others. As one of the most popular methods in te...
- Scalable Bayesian Optimization via Focalized Sparse Gaussian Processes : Abstract: Bayesian optimization is an effective technique for black-box optimization, but its applicability is typically limited to low-dimensional and small-budget problems due to the cubic complexit...
- Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data : Abstract: Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a broader set of monitor...
- WaveGNN: Integrating Graph Neural Networks and Transformers for Decay-Aware Classification of Irregular Clinical Time-Series : Abstract: Clinical time series are often irregularly sampled, with varying sensor frequencies, missing observations, and misaligned timestamps. Prior approaches typically address these irregularities ...
- Imbalances in Neurosymbolic Learning: Characterization and Mitigating Strategies : Abstract: We study one of the most popular problems in **neurosymbolic learning** (NSL), that of learning neural classifiers given only the result of applying a symbolic component $σ$ to the gold labe...
- Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis : Abstract: Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one ...
- Variational Continual Test-Time Adaptation : Abstract: Continual Test-Time Adaptation (CTTA) task investigates effective domain adaptation under the scenario of continuous domain shifts during testing time. Due to the utilization of solely unlab...
- Optimal Prediction Using Expert Advice and Randomized Littlestone Dimension : Abstract: A classical result in online learning characterizes the optimal mistake bound achievable by deterministic learners using the Littlestone dimension (Littlestone '88). We prove an analogous re...
- Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants : Abstract: Interpreting the internal activations of neural networks can produce more faithful explanations of their behavior, but is difficult due to the complex structure of activation space. Existing...
- Dynamic Rebatching for Efficient Early-Exit Inference with DREX : Abstract: Early-Exit (EE) is a Large Language Model (LLM) architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the model's layers. However, traditio...
- mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs : Abstract: Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected static web data. As a result, de...
- High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations : Abstract: Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical s...
- Stylized Synthetic Augmentation further improves Corruption Robustness : Abstract: This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of deep vision models to common ...
- Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers : Abstract: Large language model (LLM) activations are notoriously difficult to understand, with most existing techniques using complex, specialized methods for interpreting them. Recent work has propos...
- Prospects for quantum advantage in machine learning from the representability of functions : Abstract: Demonstrating quantum advantage in machine learning tasks requires navigating a complex landscape of proposed models and algorithms. To bring clarity to this search, we introduce a framework...
- PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning : Abstract: Continual learning remains a fundamental challenge in machine learning, requiring models to learn from a stream of tasks without forgetting previously acquired knowledge. A major obstacle in...
- How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness : Abstract: Large language models are increasingly adapted to downstream tasks through fine-tuning. Full supervised fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank...
- Learning continuous SOC-dependent thermal decomposition kinetics for Li-ion cathodes using KA-CRNNs : Abstract: Thermal runaway in lithium-ion batteries is strongly influenced by the state of charge (SOC). Existing predictive models typically infer scalar kinetic parameters at a full SOC or a few disc...
- A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point : Abstract: Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network p...
- IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion : Abstract: High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time. However, existing distillation methods typically...
- Evaluating Large Language Models in Scientific Discovery : Abstract: Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothe...
- Photonics-Enhanced Graph Convolutional Networks : Abstract: Photonics can offer a hardware-native route for machine learning (ML). However, efficient deployment of photonics-enhanced ML requires hybrid workflows that integrate optical processing with...
- A Conditioned UNet for Music Source Separation : Abstract: In this paper we propose a conditioned UNet for Music Source Separation (MSS). MSS is generally performed by multi-output neural networks, typically UNets, with each output representing a pa...
- Autonomous Pressure Control in MuVacAS via Deep Reinforcement Learning and Deep Learning Surrogate Models : Abstract: The development of nuclear fusion requires materials that can withstand extreme conditions. The IFMIF-DONES facility, a high-power particle accelerator, is being designed to qualify these ma...
- Attention in Motion: Secure Platooning via Transformer-based Misbehavior Detection : Abstract: Vehicular platooning promises transformative improvements in transportation efficiency and safety through the coordination of multi-vehicle formations enabled by Vehicle-to-Everything (V2X) ...
- Online Partitioned Local Depth for semi-supervised applications : Abstract: We introduce an extension of the partitioned local depth (PaLD) algorithm that is adapted to online applications such as semi-supervised prediction. The new algorithm we present, online PaLD...
- SMART: Semantic Matching Contrastive Learning for Partially View-Aligned Clustering : Abstract: Multi-view clustering has been empirically shown to improve learning performance by leveraging the inherent complementary information across multiple views of data. However, in real-world sc...
- Remotely Detectable Robot Policy Watermarking : Abstract: The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ...
- Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models : Abstract: Vision transformers in vision-language models apply uniform computational effort across all images, expending 175.33 GFLOPs (ViT-L/14) whether analysing a straightforward product photograph ...
- Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models : Abstract: Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibit...
- Time-Varying Audio Effect Modeling by End-to-End Adversarial Training : Abstract: Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, t...
- LLMQ: Efficient Lower-Precision Pretraining for Consumer GPUs : Abstract: We present LLMQ, an end-to-end CUDA/C++ implementation for medium-sized language-model training, e.g. 3B to 32B parameters, on affordable, commodity GPUs. These devices are characterized by ...
- Model inference for ranking from pairwise comparisons : Abstract: We consider the problem of ranking objects from noisy pairwise comparisons, for example, ranking tennis players from the outcomes of matches. We follow a standard approach to this problem an...
- Assessing the Visual Enumeration Abilities of Specialized Counting Architectures and Vision-Language Models : Abstract: Counting the number of items in a visual scene remains a fundamental yet challenging task in computer vision. Traditional approaches to solving this problem rely on domain-specific counting ...
- ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset : Abstract: We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\sqrt{s}=...
- Label-consistent clustering for evolving data : Abstract: Data analysis often involves an iterative process, where solutions must be continuously refined in response to new data. Typically, as new data becomes available, an existing solution must b...
- BEAT2AASIST model with layer fusion for ESDD 2026 Challenge : Abstract: Recent advances in audio generation have increased the risk of realistic environmental sound manipulation, motivating the ESDD 2026 Challenge as the first large-scale benchmark for Environme...
- Adaptive Weighted Genetic Algorithm-Optimized SVR for Robust Long-Term Forecasting of Global Stock Indices for investment decisions : Abstract: Long-term price forecasting remains a formidable challenge due to the inherent uncertainty over the long term, despite some success in short-term predictions. Nonetheless, accurate long-term...
- The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops : Abstract: The transition of Large Language Models (LLMs) from stochastic chat interfaces to reliable software components necessitates a fundamental re-engineering of interaction paradigms. Current met...
- SeBERTis: A Framework for Producing Classifiers of Security-Related Issue Reports : Abstract: Monitoring issue tracker submissions is a crucial software maintenance activity. A key goal is the prioritization of high risk, security-related bugs. If such bugs can be recognized early, t...
- Efficient Nudged Elastic Band Method using Neural Network Bayesian Algorithm Execution : Abstract: The discovery of a minimum energy pathway (MEP) between metastable states is crucial for scientific tasks including catalyst and biomolecular design. However, the standard nudged elastic ban...
- Imitation Game: Reproducing Deep Learning Bugs Leveraging an Intelligent Agent : Abstract: Despite their wide adoption in various domains (e.g., healthcare, finance, software engineering), Deep Learning (DL)-based applications suffer from many bugs, failures, and vulnerabilities. ...
- Intrusion Detection in Internet of Vehicles Using Machine Learning : Abstract: The Internet of Vehicles (IoV) has evolved modern transportation through enhanced connectivity and intelligent systems. However, this increased connectivity introduces critical vulnerabiliti...
- Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation : Abstract: Computing next-token likelihood ratios between two language models (LMs) is a standard task in training paradigms such as knowledge distillation. Since this requires both models to share the...
- EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving : Abstract: Reusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior...
- Boundary condition enforcement with PINNs: a comparative study and verification on 3D geometries : Abstract: Since their advent nearly a decade ago, physics-informed neural networks (PINNs) have been studied extensively as a novel technique for solving forward and inverse problems in physics and en...
- Cloud Security Leveraging AI: A Fusion-Based AISOC for Malware and Log Behaviour Detection : Abstract: Cloud Security Operations Center (SOC) enable cloud governance, risk and compliance by providing insights visibility and control. Cloud SOC triages high-volume, heterogeneous telemetry from ...
- Deep learning water-unsuppressed MRSI at ultra-high field for simultaneous quantitative metabolic, susceptibility and myelin water imaging : Abstract: Purpose: Magnetic Resonance Spectroscopic Imaging (MRSI) maps endogenous brain metabolism while suppressing the overwhelming water signal. Water-unsuppressed MRSI (wu-MRSI) allows simultaneo...
- Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models : Abstract: Focusing on low-resource languages is an essential step toward democratizing generative AI. In this work, we contribute to reducing the multimodal NLP resource gap for Romanian. We translate...
- Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction : Abstract: End-to-end (E2E) spoken dialogue systems are increasingly replacing cascaded pipelines for voice-based human-AI interaction, processing raw audio directly without intermediate transcription....
- Incentives or Ontology? A Structural Rebuttal to OpenAI's Hallucination Thesis : Abstract: OpenAI has recently argued that hallucinations in large language models result primarily from misaligned evaluation incentives that reward confident guessing rather than epistemic humility. ...
- Magnification-Aware Distillation (MAD): A Self-Supervised Framework for Unified Representation Learning in Gigapixel Whole-Slide Images : Abstract: Whole-slide images (WSIs) contain tissue information distributed across multiple magnification levels, yet most self-supervised methods treat these scales as independent views. This separati...
- GR-Agent: Adaptive Graph Reasoning Agent under Incomplete Knowledge : Abstract: Large language models (LLMs) achieve strong results on knowledge graph question answering (KGQA), but most benchmarks assume complete knowledge graphs (KGs) where direct supporting triples e...
- CAPE: Capability Achievement via Policy Execution : Abstract: Modern AI systems lack a way to express and enforce requirements. Pre-training produces intelligence, and post-training optimizes preferences, but neither guarantees that models reliably sat...
- Compute the edge p-Laplacian centrality for air traffic network : Abstract: The problem that we would like to solve in this paper is to compute the edge p-Laplacian centrality for the air traffic network. In this problem, instead of computing the edge p-Laplacian ce...
- Quantum-Augmented AI/ML for O-RAN: Hierarchical Threat Detection with Synergistic Intelligence and Interpretability (Technical Report) : Abstract: Open Radio Access Networks (O-RAN) enhance modularity and telemetry granularity but also widen the cybersecurity attack surface across disaggregated control, user and management planes. We p...
- Where to Explore: A Reach and Cost-Aware Approach for Unbiased Data Collection in Recommender Systems : Abstract: Exploration is essential to improve long-term recommendation quality, but it often degrades short-term business performance, especially in remote-first TV environments where users engage pas...
- Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning : Abstract: Transformer-based language models display impressive reasoning-like behavior, yet remain brittle on tasks that require stable symbolic manipulation. This paper develops a unified perspective...
- SGEMAS: A Self-Growing Ephemeral Multi-Agent System for Unsupervised Online Anomaly Detection via Entropic Homeostasis : Abstract: Current deep learning approaches for physiological signal monitoring suffer from static topologies and constant energy consumption. We introduce SGEMAS (Self-Growing Ephemeral Multi-Agent Sy...
- The Graph-Embedded Hazard Model (GEHM): Stochastic Network Survival Dynamics on Economic Graphs : Abstract: This paper develops a nonlinear evolution framework for modelling survival dynamics on weighted economic networks by coupling a graph-based $p$-Laplacian diffusion operator with a stochastic...
- Learning Model Parameter Dynamics in a Combination Therapy for Bladder Cancer from Sparse Biological Data : Abstract: In a mathematical model of interacting biological organisms, where external interventions may alter behavior over time, traditional models that assume fixed parameters usually do not capture...
- FrontierCS: Evolving Challenges for Evolving Intelligence : Abstract: We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs and top-tier competitive program...
- Multi-Modal Semantic Communication : Abstract: Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence...
- Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning : Abstract: Reinforcement learning has become essential for strengthening the reasoning abilities of large language models, yet current exploration mechanisms remain fundamentally misaligned with how th...
- A Multivariate Statistical Framework for Detection, Classification and Pre-localization of Anomalies in Water Distribution Networks : Abstract: This paper presents a unified framework, for the detection, classification, and preliminary localization of anomalies in water distribution networks using multivariate statistical analysis. ...
- SoFlow: Solution Flow Models for One-Step Generative Modeling : Abstract: The multi-step denoising process in diffusion and Flow Matching models causes major efficiency issues, which motivates research on few-step generation. We present Solution Flow Models (SoFlo...
- Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary : Abstract: Recent advances in explainable recommendations have explored the integration of language models to analyze natural language rationales for user-item interactions. Despite their potential, ex...
- Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction : Abstract: Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historicall...
- How Smoothing is N-simplicial Attention? : Abstract: Going from pure Multilayer Perceptron (MLP) to a learnable graph message-passing mechanism at each layer has been foundational to state-of-the-art results, despite the computational trade-of...
- Corrective Diffusion Language Models : Abstract: Diffusion language models are structurally well-suited for iterative error correction, as their non-causal denoising dynamics allow arbitrary positions in a sequence to be revised. However, ...
- Joint Learning of Unsupervised Multi-view Feature and Instance Co-selection with Cross-view Imputation : Abstract: Feature and instance co-selection, which aims to reduce both feature dimensionality and sample size by identifying the most informative features and instances, has attracted considerable att...
- Tracking Temporal Dynamics of Vector Sets with Gaussian Process : Abstract: Understanding the temporal evolution of sets of vectors is a fundamental challenge across various domains, including ecology, crime analysis, and linguistics. For instance, ecosystem structu...
- Soft Geometric Inductive Bias for Object Centric Dynamics : Abstract: Equivariance is a powerful prior for learning physical dynamics, yet exact group equivariance can degrade performance if the symmetries are broken. We propose object-centric world models bui...
- Robustness and uncertainty: two complementary aspects of the reliability of the predictions of a classifier : Abstract: We consider two conceptually different approaches for assessing the reliability of the individual predictions of a classifier: Robustness Quantification (RQ) and Uncertainty Quantification (...
- Multi-stage Bayesian optimisation for dynamic decision-making in self-driving labs : Abstract: Self-driving laboratories (SDLs) are combining recent technological advances in robotics, automation, and machine learning based data analysis and decision-making to perform autonomous exper...
- Metanetworks as Regulatory Operators: Learning to Edit for Requirement Compliance : Abstract: As machine learning models are increasingly deployed in high-stakes settings, e.g. as decision support systems in various societal sectors or in critical infrastructure, designers and audito...
- From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning : Abstract: Data Reconstruction Attacks (DRA) pose a significant threat to Federated Learning (FL) systems by enabling adversaries to infer sensitive training data from local clients. Despite extensive ...
- Copyright Infringement Risk Reduction via Chain-of-Thought and Task Instruction Prompting : Abstract: Large scale text-to-image generation models can memorize and reproduce their training dataset. Since the training dataset often contains copyrighted material, reproduction of training datase...
- Double Horizon Model-Based Policy Optimization : Abstract: Model-based reinforcement learning (MBRL) reduces the cost of real-environment sampling by generating synthetic trajectories (called rollouts) from a learned dynamics model. However, choosin...
- FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments : Abstract: Model-based reinforcement learning (MBRL) and model-free reinforcement learning (MFRL) evolve along distinct paths but converge in the design of Dyna-Q [1]. However, modern RL methods still ...
- Statistics of Min-max Normalized Eigenvalues in Random Matrices : Abstract: Random matrix theory has played an important role in various areas of pure mathematics, mathematical physics, and machine learning. From a practical perspective of data science, input data a...
- FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows : Abstract: Any-to-any generation seeks to translate between arbitrary subsets of modalities, enabling flexible cross-modal synthesis. Despite recent success, existing flow-based approaches are challeng...
- EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning : Abstract: At the boundary between the known and the unknown, an agent inevitably confronts the dilemma of whether to explore or to exploit. Epistemic uncertainty reflects such boundaries, representing...
- Robustness Evaluation of Machine Learning Models for Fault Classification and Localization In Power System Protection : Abstract: The growing penetration of renewable and distributed generation is transforming power systems and challenging conventional protection schemes that rely on fixed settings and local measuremen...
- A Regime-Aware Fusion Framework for Time Series Classification : Abstract: Kernel-based methods such as Rocket are among the most effective default approaches for univariate time series classification (TSC), yet they do not perform equally well across all datasets....
- Empirical Investigation of the Impact of Phase Information on Fault Diagnosis of Rotating Machinery : Abstract: Predictive maintenance of rotating machinery increasingly relies on vibration signals, yet most learning-based approaches either discard phase during spectral feature extraction or use raw t...
- Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference : Abstract: Deep neural networks are widely deployed with quantization techniques to reduce memory and computational costs by lowering the numerical precision of their parameters. While quantization alt...
- Quantum Machine Learning for Cybersecurity: A Taxonomy and Future Directions : Abstract: The increasing number of cyber threats and rapidly evolving tactics, as well as the high volume of data in recent years, have caused classical machine learning, rules, and signature-based de...
- Topological Metric for Unsupervised Embedding Quality Evaluation : Abstract: Modern representation learning increasingly relies on unsupervised and self-supervised methods trained on large-scale unlabeled data. While these approaches achieve impressive generalization...
- Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory : Abstract: Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptr...
- Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis : Abstract: Physiological signals such as electrocardiograms (ECG) and electroencephalograms (EEG) provide complementary insights into human health and cognition, yet multi-modal integration is challeng...
- O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization : Abstract: We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a ...
- Accelerating High-Throughput Catalyst Screening by Direct Generation of Equilibrium Adsorption Structures : Abstract: The adsorption energy serves as a crucial descriptor for the large-scale screening of catalysts. Nevertheless, the limited distribution of training data for the extensively utilised machine ...
- Chorus: Harmonizing Context and Sensing Signals for Data-Free Model Customization in IoT : Abstract: In real-world IoT applications, sensor data is usually collected under diverse and dynamic contextual conditions where factors such as sensor placements or ambient environments can significa...
- DEER: Draft with Diffusion, Verify with Autoregressive Models : Abstract: Efficiency, as a critical practical challenge for LLM-driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) decoding. Speculative d...
- Understanding NTK Variance in Implicit Neural Representations : Abstract: Implicit Neural Representations (INRs) often converge slowly and struggle to recover high-frequency details due to spectral bias. While prior work links this behavior to the Neural Tangent K...
- An Efficient Gradient-Based Inference Attack for Federated Learning : Abstract: Federated Learning is a machine learning setting that reduces direct data exposure, improving the privacy guarantees of machine learning models. Yet, the exchange of model updates between th...
- Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany : Abstract: This study examines the generalization performance and interpretability of machine learning (ML) models used for predicting crop yield and yield anomalies in Germany's NUTS-3 regions. Using ...
- From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts? : Abstract: A central goal of interpretability is to recover representations of causally relevant concepts from the activations of neural networks. The quality of these concept representations is typica...
- TrajSyn: Privacy-Preserving Dataset Distillation from Federated Model Trajectories for Server-Side Adversarial Training : Abstract: Deep learning models deployed on edge devices are increasingly used in safety-critical applications. However, their vulnerability to adversarial perturbations poses significant risks, especi...
- Automatic Reward Shaping from Multi-Objective Human Heuristics : Abstract: Designing effective reward functions remains a central challenge in reinforcement learning, especially in multi-objective environments. In this work, we propose Multi-Objective Reward Shapin...
- FADTI: Fourier and Attention Driven Diffusion for Multivariate Time Series Imputation : Abstract: Multivariate time series imputation is fundamental in applications such as healthcare, traffic forecasting, and biological modeling, where sensor failures and irregular sampling lead to perv...
- How Many Heads Make an SSM? A Unified Framework for Attention and State Space Models : Abstract: Sequence modeling has produced diverse architectures -- from classical recurrent neural networks to modern Transformers and state space models (SSMs) -- yet a unified theoretical understandi...
- Feature-Centric Unsupervised Node Representation Learning Without Homophily Assumption : Abstract: Unsupervised node representation learning aims to obtain meaningful node embeddings without relying on node labels. To achieve this, graph convolution, which aggregates information from neig...
- SigMA: Path Signatures and Multi-head Attention for Learning Parameters in fBm-driven SDEs : Abstract: Stochastic differential equations (SDEs) driven by fractional Brownian motion (fBm) are increasingly used to model systems with rough dynamics and long-range dependence, such as those arisin...
- PIP$^2$ Net: Physics-informed Partition Penalty Deep Operator Network : Abstract: Operator learning has become a powerful tool for accelerating the solution of parameterized partial differential equations (PDEs), enabling rapid prediction of full spatiotemporal fields for...
- Neural Modular Physics for Elastic Simulation : Abstract: Learning-based methods have made significant progress in physics simulation, typically approximating dynamics with a monolithic end-to-end optimized neural network. Although these models off...
- The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks : Abstract: Existing feature engineering methods based on large language models (LLMs) have not yet been applied to multi-label learning tasks. They lack the ability to model complex label dependencies ...
- The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems : Abstract: Retrieval-Augmented Generation (RAG) systems remain susceptible to hallucinations despite grounding in retrieved evidence. Current detection methods rely on semantic similarity and natural l...
- EMFusion: Conditional Diffusion Framework for Trustworthy Frequency Selective EMF Forecasting in Wireless Networks : Abstract: The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential healt...
- Spectral Representation-based Reinforcement Learning : Abstract: In real-world applications with large state and action spaces, reinforcement learning (RL) typically employs function approximations to represent core components like the policies, value fun...
- Epistemic diversity across language models mitigates knowledge collapse : Abstract: The growing use of artificial intelligence (AI) raises concerns of knowledge collapse, i.e., a reduction to the most dominant and central set of ideas. Prior work has demonstrated single-mod...
- Stock Pattern Assistant (SPA): A Deterministic and Explainable Framework for Structural Price Run Extraction and Event Correlation in Equity Markets : Abstract: Understanding how prices evolve over time often requires peeling back the layers of market noise to identify clear, structural behavior. Many of the tools commonly used for this purpose tech...
- DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding : Abstract: Process Reward Models (PRMs) have become essential for improving Large Language Models (LLMs) via test-time scaling, yet their effectiveness in coding remains limited due to the lack of mean...
- Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes : Abstract: We study reinforcement learning for controlled diffusion processes with unbounded continuous state spaces, bounded continuous actions, and polynomially growing rewards: settings that arise n...
- Prompt Repetition Improves Non-Reasoning LLMs : Abstract: When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.
- Softly Constrained Denoisers for Diffusion Models : Abstract: Diffusion models struggle to produce samples that respect constraints, a common requirement in scientific applications. Recent approaches have introduced regularization terms in the loss or ...
- Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise : Abstract: We present a novel numerical method for solving McKean-Vlasov forward-backward stochastic differential equations (MV-FBSDEs) with common noise, combining Picard iterations, elicitability and...
- Low-rank MMSE filters, Kronecker-product representation, and regularization: a new perspective : Abstract: In this work, we propose a method to efficiently find the regularization parameter for low-rank MMSE filters based on a Kronecker-product representation. We show that the regularization para...
- ATLAS: Adaptive Topology-based Learning at Scale for Homophilic and Heterophilic Graphs : Abstract: We present ATLAS (Adaptive Topology-based Learning at Scale for Homophilic and Heterophilic Graphs), a novel graph learning algorithm that addresses two important challenges in graph neural ...
- Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections : Abstract: A popular paradigm for training LM agents relies on imitation learning, fine-tuning on expert trajectories. However, we show that the off-policy nature of imitation learning for multi-turn L...
- OLR-WA: Online Weighted Average Linear Regression in Multivariate Data Streams : Abstract: Online learning updates models incrementally with new data, avoiding large storage requirements and costly model recalculations. In this paper, we introduce "OLR-WA; OnLine Regression with W...
- Task Matrices: Linear Maps for Cross-Model Finetuning Transfer : Abstract: Results in interpretability suggest that large vision and language models learn implicit linear encodings when models are biased by in-context prompting. However, the existence of similar li...
- Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse : Abstract: Self-referential learning -- training a model on data it generated itself -- promises boundless scalability but chronically suffers from model collapse: language models degenerate into repet...
- How Does Fourier Analysis Network Work? A Mechanism Analysis and a New Dual-Activation Layer Proposal : Abstract: Fourier Analysis Network (FAN) was recently proposed as a simple way to improve neural network performance by replacing part of ReLU activations with sine and cosine functions. Although seve...
- Unreliable Uncertainty Estimates with Monte Carlo Dropout : Abstract: Reliable uncertainty estimation is crucial for machine learning models, especially in safety-critical domains. While exact Bayesian inference offers a principled approach, it is often comput...
- Evaluating Weather Forecasts from a Decision Maker's Perspective : Abstract: Standard weather forecast evaluations focus on the forecaster's perspective and on a statistical assessment comparing forecasts and observations. In practice, however, forecasts are used to ...
- Guided Discrete Diffusion for Constraint Satisfaction Problems : Abstract: We propose discrete diffusion guidance for constraint satisfaction problems (CSPs) and demonstrate its ability to solve Sudoku puzzles without supervision.
- NoveltyRank: Estimating Conceptual Novelty of AI Papers : Abstract: With the growing ease of academic publishing, the volume of research papers, especially in AI-related fields, has surged dramatically. This flood of publications makes it difficult for truly...
- Inference Time Feature Injection: A Lightweight Approach for Real-Time Recommendation Freshness : Abstract: Many recommender systems in long-form video streaming reply on batch-trained models and batch-updated features, where user features are updated daily and served statically throughout the day...
- INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT : Abstract: Incidental findings in CT scans, though often benign, can have significant clinical implications and should be reported following established guidelines. Traditional manual inspection by rad...
- Semantic Geometry for policy-constrained interpretation : Abstract: We present a geometric framework for policy-constrained semantic interpretation that provably prevents hallucinated commitments in high-stakes domains. Semantic meaning is represented as dir...
- A data-driven approach to inferring travel trajectory during peak hours in urban rail transit systems : Abstract: Refined trajectory inference of urban rail transit is of great significance to the operation organization. In this paper, we develop a fully data-driven approach to inferring individual trav...
- A Critical Perspective on Finite Sample Conformal Prediction Theory in Medical Applications : Abstract: Machine learning (ML) is transforming healthcare, but safe clinical decisions demand reliable uncertainty estimates that standard ML models fail to provide. Conformal prediction (CP) is a po...
- Quantum Decision Transformers (QDT): Synergistic Entanglement and Interference for Offline Reinforcement Learning : Abstract: Offline reinforcement learning enables policy learning from pre-collected datasets without environment interaction, but existing Decision Transformer (DT) architectures struggle with long-ho...
- Generative Urban Flow Modeling: From Geometry to Airflow with Graph Diffusion : Abstract: Urban wind flow modeling and simulation play an important role in air quality assessment and sustainable city planning. A key challenge for modeling and simulation is handling the complex ge...
- HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers : Abstract: At NeurIPS 2024, Kera et al. introduced the use of transformers for computing Groebner bases, a central object in computer algebra with numerous practical applications. In this paper, we imp...
- Automatic Extraction of Rules for Generating Synthetic Patient Data From Real-World Population Data Using Glioblastoma as an Example : Abstract: The generation of synthetic data is a promising technology to make medical data available for secondary use in a privacy-compliant manner. A popular method for creating realistic patient dat...
- Hybrid Attribution Priors for Explainable and Robust Model Training : Abstract: Small language models (SLMs) are widely used in tasks that require low latency and lightweight deployment, particularly classification. As interpretability and robustness gain increasing imp...
- SEED: Spectral Entropy-Guided Evaluation of SpatialTemporal Dependencies for Multivariate Time Series Forecasting : Abstract: Effective multivariate time series forecasting often benefits from accurately modeling complex inter-variable dependencies. However, existing attention- or graph-based methods face three key...
- Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox : Abstract: The rapid adoption of large language models in financial services necessitates rigorous evaluation frameworks to assess their performance, efficiency, and practical applicability. This paper...
- How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection : Abstract: Hard-to-detect hardware bit flips, from either malicious circuitry or bugs, have already been shown to make transformers vulnerable in non-generative tasks. This work, for the first time, in...
- Improving Underwater Acoustic Classification Through Learnable Gabor Filter Convolution and Attention Mechanisms : Abstract: Remotely detecting and classifying underwater acoustic targets is critical for environmental monitoring and defence. However, the complex nature of ship-radiated and environmental underwater...
- A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour : Abstract: Many travel decisions involve a degree of experience formation, where individuals learn their preferences over time. At the same time, there is extensive scope for heterogeneity across indiv...
- SepsisSuite: Beyond Risk Stratification -- A Comparative Analysis of Deep Fusion vs. Expert Stacking for Prescriptive Sepsis AI : Abstract: Sepsis accounts for nearly 20% of global ICU admissions, yet conventional prediction models often fail to effectively integrate heterogeneous data streams, remaining either siloed by modalit...
- Autonomous Source Knowledge Selection in Multi-Domain Adaptation : Abstract: Unsupervised multi-domain adaptation plays a key role in transfer learning by leveraging acquired rich source information from multiple source domains to solve target task from an unlabeled ...
- LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts : Abstract: Neural architecture search (NAS) traditionally requires significant human expertise or automated trial-and-error to design deep learning models. We present NN-Caption, an LLM-guided neural a...
Research Sources: 290 | Generated: 12/18/2025
