AI RESEARCH PAPERS & ACADEMIC SOURCES
- Green Resilience of Cyber-Physical Systems: Doctoral Dissertation : Abstract: Cyber-physical systems (CPS) combine computational and physical components. Online Collaborative AI System (OL-CAIS) is a type of CPS that learn online in collaboration with humans to achiev...
- Zero-Shot Video Translation via Token Warping : Abstract: With the revolution of generative AI, video-related tasks have been widely studied. However, current state-of-the-art video models still lag behind image models in visual quality and user co...
- Adaptive Query Prompting for Multi-Domain Landmark Detection : Abstract: Medical landmark detection is crucial in various medical imaging modalities and procedures. Although deep learning-based methods have achieve promising performance, they are mostly designed ...
- DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks : Abstract: This study assesses the ability of Large Vision-Language Models (LVLMs) to differentiate between AI-generated and human-generated images. It introduces a new automated benchmark construction...
- Unsupervised learning of spatially varying regularization for diffeomorphic image registration : Abstract: Spatially varying regularization accommodates the deformation variations that may be necessary for different anatomical regions during deformable image registration. Historically, optimizati...
- Introducing DEFORMISE: A deep learning framework for dementia diagnosis in the elderly using optimized MRI slice selection : Abstract: Dementia, a debilitating neurological condition affecting millions worldwide, presents significant diagnostic challenges. In this work, we introduce DEFORMISE, a novel DEep learning Framewor...
- BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks : Abstract: ImageNet-1K linear-probe transfer accuracy remains the default proxy for visual representation quality, yet it no longer predicts performance on scientific imagery. Across 46 modern vision m...
- NaTex: Seamless Texture Generation as Latent Color Diffusion : Abstract: We present NaTex, a native texture generation framework that predicts texture color directly in 3D space. In contrast to previous approaches that rely on baking 2D multi-view images synthesi...
- WWE-UIE: A Wavelet & White Balance Efficient Network for Underwater Image Enhancement : Abstract: Underwater Image Enhancement (UIE) aims to restore visibility and correct color distortions caused by wavelength-dependent absorption and scattering. Recent hybrid approaches, which couple d...
- ChangeDINO: DINOv3-Driven Building Change Detection in Optical Remote Sensing Imagery : Abstract: Remote sensing change detection (RSCD) aims to identify surface changes from co-registered bi-temporal images. However, many deep learning-based RSCD methods rely solely on change-map annota...
- Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution with Implicit Representation Networks : Abstract: Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limi...
- Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach : Abstract: The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks. Besides grappling with the complexity of data annotation ...
- CRISTAL: Real-time Camera Registration in Static LiDAR Scans using Neural Rendering : Abstract: Accurate camera localization is crucial for robotics and Extended Reality (XR), enabling reliable navigation and alignment of virtual and real content. Existing visual methods often suffer f...
- Multi-Order Matching Network for Alignment-Free Depth Super-Resolution : Abstract: Recent guided depth super-resolution methods are premised on the assumption of strictly spatial alignment between depth and RGB, achieving high-quality depth reconstruction. However, in real...
- DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration : Abstract: Offline signature verification (OSV) is a frequently utilized technology in forensics. This paper proposes a new model, DetailSemNet, for OSV. Unlike previous methods that rely on holistic f...
- CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement : Abstract: Compositional zero-shot learning (CZSL) aims to learn the concepts of attributes and objects in seen compositions and to recognize their unseen compositions. Most Contrastive Language-Image ...
- End-to-End Motion Capture from Rigid Body Markers with Geodesic Loss : Abstract: Marker-based optical motion capture (MoCap), while long regarded as the gold standard for accuracy, faces practical challenges, such as time-consuming preparation and marker identification a...
- CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation : Abstract: Self-supervised surround-view depth estimation enables dense, low-cost 3D perception with a 360° field of view from multiple minimally overlapping images. Yet, most existing methods suffer f...
- Beyond Visual Cues: Leveraging General Semantics as Support for Few-Shot Segmentation : Abstract: Few-shot segmentation (FSS) aims to segment novel classes under the guidance of limited support samples by a meta-learning paradigm. Existing methods mainly mine references from support imag...
- StreetView-Waste: A Multi-Task Dataset for Urban Waste Management : Abstract: Urban waste management remains a critical challenge for the development of smart cities. Despite the growing number of litter detection datasets, the problem of monitoring overflowing waste ...
- VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference : Abstract: Vision-Language-Action (VLA) models have shown great promise for embodied AI, yet the heavy computational cost of processing continuous visual streams severely limits their real-time deploym...
- LLaVA$^3$: Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs : Abstract: Developing a multi-modal language model capable of understanding 3D scenes remains challenging due to the limited availability of 3D training data, in contrast to the abundance of 2D dataset...
- FastSurfer-CC: A robust, accurate, and comprehensive framework for corpus callosum morphometry : Abstract: The corpus callosum, the largest commissural structure in the human brain, is a central focus in research on aging and neurological diseases. It is also a critical target for interventions s...
- Flow and Depth Assisted Video Prediction with Latent Transformer : Abstract: Video prediction is a fundamental task for various downstream applications, including robotics and world modeling. Although general video prediction models have achieved remarkable performan...
- Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation : Abstract: Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, hig...
- Acquisition Time-Informed Breast Tumor Segmentation from Dynamic Contrast-Enhanced MRI : Abstract: Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) plays an important role in breast cancer screening, tumor assessment, and treatment planning and monitoring. The dynamic change...
- YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras : Abstract: Using ceiling-mounted cameras (CMCs) for indoor visual capturing opens up a wide range of applications. However, registering CMCs to the target scene layout presents a challenging task. Whil...
- BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization : Abstract: Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructur...
- Enhancing Multi-Camera Gymnast Tracking Through Domain Knowledge Integration : Abstract: We present a robust multi-camera gymnast tracking, which has been applied at international gymnastics championships for gymnastics judging. Despite considerable progress in multi-camera trac...
- Investigating Optical Flow Computation: From Local Methods to a Multiresolution Horn-Schunck Implementation with Bilinear Interpolation : Abstract: This paper presents an applied analysis of local and global methods, with a focus on the Horn-Schunck algorithm for optical flow computation. We explore the theoretical and practical aspects...
- Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution : Abstract: The rapid advancement of generative artificial intelligence has enabled the creation of synthetic images that are increasingly indistinguishable from authentic content, posing significant ch...
- EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering : Abstract: Recently, 3D Gaussian Splatting has been introduced as a compelling alternative to NeRF for Earth observation, offering com- petitive reconstruction quality with significantly reduced traini...
- Progressive Supernet Training for Efficient Visual Autoregressive Modeling : Abstract: Visual Auto-Regressive (VAR) models significantly reduce inference steps through the "next-scale" prediction paradigm. However, progressive multi-scale generation incurs substantial memory o...
- Lite Any Stereo: Efficient Zero-Shot Stereo Matching : Abstract: Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable ...
- NutriScreener: Retrieval-Augmented Multi-Pose Graph Attention Network for Malnourishment Screening : Abstract: Child malnutrition remains a global crisis, yet existing screening methods are laborious and poorly scalable, hindering early intervention. In this work, we present NutriScreener, a retrieva...
- POMA-3D: The Point Map Way to 3D Scene Understanding : Abstract: In this paper, we introduce POMA-3D, the first self-supervised 3D representation model learned from point maps. Point maps encode explicit 3D coordinates on a structured 2D grid, preserving ...
- Erase to Retain: Low Rank Adaptation Guided Selective Unlearning in Medical Segmentation Networks : Abstract: The ability to selectively remove knowledge from medical segmentation networks is increasingly important for privacy compliance, ethical deployment, and continual dataset revision. We introd...
- Generative AI for Enhanced Wildfire Detection: Bridging the Synthetic-Real Domain Gap : Abstract: The early detection of wildfires is a critical environmental challenge, with timely identification of smoke plumes being key to mitigating large-scale damage. While deep neural networks have...
- SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking : Abstract: Surgical video segmentation is crucial for computer-assisted surgery, enabling precise localization and tracking of instruments and tissues. Interactive Video Object Segmentation (iVOS) mode...
- Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric Learning : Abstract: Object detection has been widely explored for class-balanced datasets such as COCO. However, real-world scenarios introduce the challenge of long-tailed distributions, where numerous categor...
- SAM 3D: 3Dfy Anything in Images : Abstract: We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occ...
- TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming : Abstract: Recent advances in 3D Gaussian diffusion models suffer from time-intensive denoising and post-denoising processing due to the massive number of Gaussian primitives, resulting in slow generat...
- Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision : Abstract: 3D hierarchical semantic segmentation (3DHS) is crucial for embodied intelligence applications that demand a multi-grained and multi-hierarchy understanding of 3D scenes. Despite the progres...
- Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation : Abstract: Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overh...
- PartUV: Part-Based UV Unwrapping of 3D Meshes : Abstract: UV unwrapping flattens 3D surfaces to 2D with minimal distortion, often requiring the complex surface to be decomposed into multiple charts. Although extensively studied, existing UV unwrapp...
- TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing : Abstract: With the increasing demand for 3D animation, generating high-fidelity, controllable 4D avatars from textual descriptions remains a significant challenge. Despite notable efforts in 4D genera...
- SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation : Abstract: Controllable image generation has attracted increasing attention in recent years, enabling users to manipulate visual content such as identity and style. However, achieving simultaneous cont...
- V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models : Abstract: Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-...
- Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO : Abstract: While language models have become impactful in many real-world applications, video generation remains largely confined to entertainment. Motivated by video's inherent capacity to demonstrate...
- Learning to Think Fast and Slow for Visual Language Models : Abstract: When confronted with complex problems, we tend to think slowly; conversely, for simple questions, we think quickly. Such a two-system thinking mechanism allows us to efficiently allocate cog...
- EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards : Abstract: Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or exter...
- NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses : Abstract: We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-the-art methods use accurat...
- How Modality Shapes Perception and Reasoning: A Study of Error Propagation in ARC-AGI : Abstract: ARC-AGI and ARC-AGI-2 measure generalization-through-composition on small color-quantized grids, and their prize competitions make progress on these harder held-out tasks a meaningful proxy ...
- UniUltra: Interactive Parameter-Efficient SAM2 for Universal Ultrasound Segmentation : Abstract: The Segment Anything Model 2 (SAM2) demonstrates remarkable universal segmentation capabilities on natural images. However, its performance on ultrasound images is significantly degraded due...
- FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos : Abstract: Soccer video understanding has motivated the creation of datasets for tasks such as temporal action localization, spatiotemporal action detection (STAD), or multiobject tracking (MOT). The a...
- How Robot Dogs See the Unseeable : Abstract: Peering, a side-to-side motion used by animals to estimate distance through motion parallax, offers a powerful bio-inspired strategy to overcome a fundamental limitation in robotic vision: p...
- Weakly Supervised Segmentation and Classification of Alpha-Synuclein Aggregates in Brightfield Midbrain Images : Abstract: Parkinson's disease (PD) is a neurodegenerative disorder associated with the accumulation of misfolded alpha-synuclein aggregates, forming Lewy bodies and neuritic shape used for pathology d...
- TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval : Abstract: Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently...
- WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue : Abstract: As Automatic Speech Recognition (ASR) is increasingly deployed in clinical dialogue, standard evaluations still rely heavily on Word Error Rate (WER). This paper challenges that standard, in...
- Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation : Abstract: Word sense disambiguation is a fundamental challenge in natural language understanding. Current methods are primarily aimed at coarse-grained representations (e.g. WordNet synsets or FrameNe...
- Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems : Abstract: Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models (LLMs) to access multimodal knowledge bases containing both text and visual information such as...
- Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs : Abstract: Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent wo...
- Chain of Summaries: Summarization Through Iterative Questioning : Abstract: Large Language Models (LLMs) are increasingly using external web content. However, much of this content is not easily digestible by LLMs due to LLM-unfriendly formats and limitations of cont...
- Step-Audio-R1 Technical Report : Abstract: Recent advances in reasoning models have demonstrated remarkable success in text and vision domains through extended chain-of-thought deliberation. However, a perplexing phenomenon persists ...
- The Subtle Art of Defection: Understanding Uncooperative Behaviors in LLM based Multi-Agent Systems : Abstract: This paper introduces a novel framework for simulating and analyzing how uncooperative behaviors can destabilize or collapse LLM-based multi-agent systems. Our framework includes two key com...
- JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation : Abstract: While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). ...
- CARE-RAG - Clinical Assessment and Reasoning in RAG : Abstract: Access to the right evidence does not guarantee that large language models (LLMs) will reason with it correctly. This gap between retrieval and reasoning is especially concerning in clinical...
- QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation : Abstract: We present QueryGym, a lightweight, extensible Python toolkit that supports large language model (LLM)-based query reformulation. This is an important tool development since recent work on l...
- SpellForger: Prompting Custom Spell Properties In-Game using BERT supervised-trained model : Abstract: Introduction: The application of Artificial Intelligence in games has evolved significantly, allowing for dynamic content generation. However, its use as a core gameplay co-creation tool rem...
- PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization : Abstract: System prompts are critical for guiding the behavior of Large Language Models (LLMs), yet they often contain proprietary logic or sensitive information, making them a prime target for extrac...
- Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions : Abstract: Despite their advanced reasoning capabilities, state-of-the-art Multimodal Large Language Models (MLLMs) demonstrably lack a core component of human intelligence: the ability to `read the ro...
- OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe : Abstract: Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the la...
- TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models : Abstract: Efficient and lightweight adaptation of pre-trained Vision-Language Models (VLMs) to downstream tasks through collaborative interactions between local clients and a central server is a rapid...
- Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation : Abstract: Music Recommender Systems (MRS) have long relied on an information-retrieval framing, where progress is measured mainly through accuracy on retrieval-oriented subtasks. While effective, this...
- MiMo-Embodied: X-Embodied Foundation Model Technical Report : Abstract: We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Em...
- D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies : Abstract: Developing intelligent agents capable of operating a wide range of Graphical User Interfaces (GUIs) with human-level proficiency is a key milestone on the path toward Artificial General Inte...
- TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding : Abstract: We introduce TimeViper, a hybrid vision-language model designed to tackle challenges of long video understanding. Processing long videos demands both an efficient model architecture and an e...
- SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction : Abstract: Survival analysis is critical for cancer prognosis and treatment planning, yet existing methods lack the transparency essential for clinical adoption. While recent pathology agents have demo...
- Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs : Abstract: Recent advancements in neural audio codecs have not only enabled superior audio compression but also enhanced speech synthesis techniques. Researchers are now exploring their potential as un...
- Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation : Abstract: Recent advances in visual generation have increasingly explored the integration of reasoning capabilities. They incorporate textual reasoning, i.e., think, either before (as pre-planning) or...
- GPTopic: Dynamic and Interactive Topic Representations : Abstract: Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms ca...
- Crowdsourcing Lexical Diversity : Abstract: Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and ...
- Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks : Abstract: We introduce a set of training-free ABX-style discrimination tasks to evaluate how multilingual language models represent language identity (form) and semantic content (meaning). Inspired fr...
- CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples : Abstract: Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradati...
- UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment : Abstract: Image-based virtual try-on (VTON) aims to synthesize photorealistic images of a person wearing specified garments. Despite significant progress, building a universal VTON framework that can ...
- EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3 : Abstract: The Segment Anything Model 3 (SAM3) advances visual understanding with Promptable Concept Segmentation (PCS) across images and videos, but its unified architecture (shared vision backbone, D...
- Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation : Abstract: The automated analysis of historical documents, particularly maps, has drastically benefited from advances in deep learning and its success across various computer vision applications. Howev...
- RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification : Abstract: Vision Language Models (VLMs) are becoming increasingly integral to multimedia understanding; however, they often struggle with domain-specific video classification tasks, particularly in ca...
- Boosting Medical Visual Understanding From Multi-Granular Language Learning : Abstract: Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has p...
- Automated Interpretable 2D Video Extraction from 3D Echocardiography : Abstract: Although the heart has complex three-dimensional (3D) anatomy, conventional medical imaging with cardiac ultrasound relies on a series of 2D videos showing individual cardiac structures. 3D ...
- Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click : Abstract: State-of-the-art Video Scene Graph Generation (VSGG) systems provide structured visual understanding but operate as closed, feed-forward pipelines with no ability to incorporate human guidan...
- InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer : Abstract: Recently, the strong generalization ability of CLIP has facilitated open-vocabulary semantic segmentation, which labels pixels using arbitrary text. However, existing methods that fine-tune ...
- Externally Validated Multi-Task Learning via Consistency Regularization Using Differentiable BI-RADS Features for Breast Ultrasound Tumor Segmentation : Abstract: Multi-task learning can suffer from destructive task interference, where jointly trained models underperform single-task baselines and limit generalization. To improve generalization perform...
- UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition : Abstract: Achieving visual semantic understanding requires a unified framework that simultaneously handles object detection, category prediction, and attribute recognition. However, current advanced a...
- Exploiting Inter-Sample Information for Long-tailed Out-of-Distribution Detection : Abstract: Detecting out-of-distribution (OOD) data is essential for safe deployment of deep neural networks (DNNs). This problem becomes particularly challenging in the presence of long-tailed in-dist...
- Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion : Abstract: Deep neural networks used for human detection are highly vulnerable to adversarial manipulation, creating safety and privacy risks in real surveillance environments. Wearable attacks offer a...
- Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution : Abstract: The demonstrated success of sparsely-gated Mixture-of-Experts (MoE) architectures, exemplified by models such as DeepSeek and Grok, has motivated researchers to investigate their adaptation ...
- CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis : Abstract: 3D Gaussian Splatting (3DGS) has recently emerged as an efficient, high-fidelity representation for real-time scene reconstruction and rendering. However, extending 3DGS to sparse-view setti...
- Crossmodal learning for Crop Canopy Trait Estimation : Abstract: Recent advances in plant phenotyping have driven widespread adoption of multi sensor platforms for collecting crop canopy reflectance data. This includes the collection of heterogeneous data...
- LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets : Abstract: Training a model for food recognition is challenging because the training samples, which are typically crawled from the Internet, are visually different from the pictures captured by users i...
- AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers : Abstract: Visual autoregressive modeling (VAR) via next-scale prediction has emerged as a scalable image generation paradigm. While Key and Value (KV) caching in large language models (LLMs) has been ...
- LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving : Abstract: Synthesizing high-fidelity and controllable 4D LiDAR data is crucial for creating scalable simulation environments for autonomous driving. This task is inherently challenging due to the sens...
- VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning : Abstract: Traditional video reasoning segmentation methods rely on supervised fine-tuning, which limits generalization to out-of-distribution scenarios and lacks explicit reasoning. To address this, w...
- SpectralTrain: A Universal Framework for Hyperspectral Image Classification : Abstract: Hyperspectral image (HSI) classification typically involves large-scale data and computationally intensive training, which limits the practical deployment of deep learning models in real-wor...
- Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments : Abstract: We present Rad-GS, a 4D radar-camera SLAM system designed for kilometer-scale outdoor environments, utilizing 3D Gaussian as a differentiable spatial representation. Rad-GS combines the adva...
- T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs : Abstract: In large language models (LLM), in-context learning (ICL) refers to performing new tasks by conditioning on small demonstrations provided in the input context. Recent advances in visual in-c...
- Clustered Error Correction with Grouped 4D Gaussian Splatting : Abstract: Existing 4D Gaussian Splatting (4DGS) methods struggle to accurately reconstruct dynamic scenes, often failing to resolve ambiguous pixel correspondences and inadequate densification in dyna...
- Decoupling Complexity from Scale in Latent Diffusion Model : Abstract: Existing latent diffusion models typically couple scale with content complexity, using more latent tokens to represent higher-resolution images or higher-frame rate videos. However, the late...
- VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation : Abstract: Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first p...
- How Noise Benefits AI-generated Image Detection : Abstract: The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, o...
- Degradation-Aware Hierarchical Termination for Blind Quality Enhancement of Compressed Video : Abstract: Existing studies on Quality Enhancement for Compressed Video (QECV) predominantly rely on known Quantization Parameters (QPs), employing distinct enhancement models per QP setting, termed no...
- Real-Time 3D Object Detection with Inference-Aligned Learning : Abstract: Real-time 3D object detection from point clouds is essential for dynamic scene understanding in applications such as augmented reality, robotics and navigation. We introduce a novel Spatial-...
- A Spatial Semantics and Continuity Perception Attention for Remote Sensing Water Body Change Detection : Abstract: Remote sensing Water Body Change Detection (WBCD) aims to detect water body surface changes from bi-temporal images of the same geographic area. Recently, the scarcity of high spatial resolu...
- LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have enabled Simultaneous Localization and Mapping (SLAM) systems to build photorealistic maps. However, these maps lack the open-vocabulary s...
- Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval : Abstract: Multimodal embeddings are widely used in downstream tasks such as multimodal retrieval, enabling alignment of interleaved modalities in a shared representation space. While recent studies sh...
- Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers : Abstract: Diffusion Transformers (DiTs) have shown exceptional performance in image generation, yet their large parameter counts incur high computational costs, impeding deployment in resource-constra...
- Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning : Abstract: Spatial intelligence is a critical frontier for Multimodal Large Language Models (MLLMs), empowering them to comprehend the physical world. Drawing inspiration from human perception mechanis...
- Simba: Towards High-Fidelity and Geometrically-Consistent Point Cloud Completion via Transformation Diffusion : Abstract: Point cloud completion is a fundamental task in 3D vision. A persistent challenge in this field is simultaneously preserving fine-grained details present in the input while ensuring the glob...
- Layer-wise Noise Guided Selective Wavelet Reconstruction for Robust Medical Image Segmentation : Abstract: Clinical deployment requires segmentation models to stay stable under distribution shifts and perturbations. The mainstream solution is adversarial training (AT) to improve robustness; howev...
- An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs : Abstract: With the remarkable success of Vision-Language Models (VLMs) on multimodal tasks, concerns regarding their deployment efficiency have become increasingly prominent. In particular, the number...
- EvoVLA: Self-Evolving Vision-Language-Action Model : Abstract: Long-horizon robotic manipulation remains challenging for Vision-Language-Action (VLA) models despite recent progress in zero-shot generalization and simulation-to-real-world transfer. Curre...
- Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective : Abstract: Open-vocabulary semantic segmentation (OVSS) employs pixel-level vision-language alignment to associate category-related prompts with corresponding pixels. A key challenge is enhancing the m...
- Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight : Abstract: Recent advances in Vision-Language-Action (VLA) models demonstrate that visual signals can effectively complement sparse action supervisions. However, letting VLA directly predict high-dimen...
- Domain-Shared Learning and Gradual Alignment for Unsupervised Domain Adaptation Visible-Infrared Person Re-Identification : Abstract: Recently, Visible-Infrared person Re-Identification (VI-ReID) has achieved remarkable performance on public datasets. However, due to the discrepancies between public datasets and real-world...
- PrIntMesh: Precise Intersection Surfaces for 3D Organ Mesh Reconstruction : Abstract: Human organs are composed of interconnected substructures whose geometry and spatial relationships constrain one another. Yet, most deep-learning approaches treat these parts independently, ...
- When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models : Abstract: Vision-Language-Action models (VLAs) have recently demonstrated remarkable progress in embodied environments, enabling robots to perceive, reason, and act through unified multimodal understa...
- Unsupervised Image Classification with Adaptive Nearest Neighbor Selection and Cluster Ensembles : Abstract: Unsupervised image classification, or image clustering, aims to group unlabeled images into semantically meaningful categories. Early methods integrated representation learning and clusterin...
- SwiTrack: Tri-State Switch for Cross-Modal Object Tracking : Abstract: Cross-modal object tracking (CMOT) is an emerging task that maintains target consistency while the video stream switches between different modalities, with only one modality available in eac...
- Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs : Abstract: Realistic and smooth full-body tracking is crucial for immersive AR/VR applications. Existing systems primarily track head and hands via Head Mounted Devices (HMDs) and controllers, making t...
- TetraSDF: Precise Mesh Extraction with Multi-resolution Tetrahedral Grid : Abstract: Extracting meshes that exactly match the zero-level set of neural signed distance functions (SDFs) remains challenging. Sampling-based methods introduce discretization error, while continuou...
- Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM : Abstract: We present a fast, spatio-temporal scene understanding framework based on Vision Gated Generative Transformers (VGGT). The proposed pipeline is designed to enable efficient, close to real-ti...
- Explainable AI for Diabetic Retinopathy Detection Using Deep Learning with Attention Mechanisms and Fuzzy Logic-Based Interpretability : Abstract: The task of weed detection is an essential element of precision agriculture since accurate species identification allows a farmer to selectively apply herbicides and fits into sustainable ag...
- Optimizing 3D Gaussian Splattering for Mobile GPUs : Abstract: Image-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. ...
- Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling : Abstract: We present \textbf{Upsample Anything}, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training....
- An Exterior-Embedding Neural Operator Framework for Preserving Conservation Laws : Abstract: Neural operators have demonstrated considerable effectiveness in accelerating the solution of time-dependent partial differential equations (PDEs) by directly learning governing physical law...
- Synthesis of Safety Specifications for Probabilistic Systems : Abstract: Ensuring that agents satisfy safety specifications can be crucial in safety-critical environments. While methods exist for controller synthesis with safe temporal specifications, most existi...
- Variational Quantum Integrated Sensing and Communication : Abstract: The integration of sensing and communication functionalities within a common system is one of the main innovation drivers for next-generation networks. In this paper, we introduce a quantum ...
- Time dependent loss reweighting for flow matching and diffusion models is theoretically justified : Abstract: This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman d...
- Rate-optimal community detection near the KS threshold via node-robust algorithms : Abstract: We study community detection in the \emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabiliti...
- From Polynomials to Databases: Arithmetic Structures in Galois Theory : Abstract: We develop a computational framework for classifying Galois groups of irreducible degree-7 polynomials over~$\mathbb{Q}$, combining explicit resolvent methods with machine learning technique...
- Adaptive Guided Upsampling for Low-light Image Enhancement : Abstract: We introduce Adaptive Guided Upsampling (AGU), an efficient method for upscaling low-light images capable of optimizing multiple image quality characteristics at the same time, such as reduc...
- Solving Spatial Supersensing Without Spatial Supersensing : Abstract: Cambrian-S aims to take the first steps towards improving video world models with spatial supersensing by introducing (i) two benchmarks, VSI-Super-Recall (VSR) and VSI-Super-Counting (VSC),...
- Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations : Abstract: Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant ...
- Dataset Distillation for Pre-Trained Self-Supervised Vision Models : Abstract: The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset...
- Self-Supervised Discriminative Feature Learning for Deep Multi-View Clustering : Abstract: Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative i...
- A low-rank non-convex norm method for multiview graph clustering : Abstract: This study introduces a novel technique for multi-view clustering known as the "Consensus Graph-Based Multi-View Clustering Method Using Low-Rank Non-Convex Norm" (CGMVC-NC). Multi-view clus...
- Can LLMs Replace Economic Choice Prediction Labs? The Case of Language-based Persuasion Games : Abstract: Human choice prediction in economic contexts is crucial for applications in marketing, finance, public policy, and more. This task, however, is often constrained by the difficulties in acqui...
- Sparse-PGD: A Unified Framework for Sparse Adversarial Perturbations Generation : Abstract: This work studies sparse adversarial perturbations, including both unstructured and structured ones. We propose a framework based on a white-box PGD-like attack method named Sparse-PGD to ef...
- Structural Disentanglement of Causal and Correlated Concepts : Abstract: Controllable data generation aims to synthesize data by specifying values for target concepts. Achieving this reliably requires modeling the underlying generative factors and their relations...
- Provably Robust Pre-Trained Ensembles for Biomarker-Based Cancer Classification : Abstract: Certain cancer types, notably pancreatic cancer, are difficult to detect at an early stage, motivating robust biomarker-based screening. Liquid biopsies enable non-invasive monitoring of cir...
- Multi-Objective $\textit{min-max}$ Online Convex Optimization : Abstract: In online convex optimization (OCO), a single loss function sequence is revealed over a time horizon of $T$, and an online algorithm has to choose its action at time $t$, before the loss fun...
- LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space : Abstract: As research on image inversion advances, the process is generally divided into two stages. The first step is Image Embedding, involves using an encoder or optimization procedure to embed an ...
- Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models : Abstract: Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with establ...
- Bipartite Graph Variational Auto-Encoder with Fair Latent Representation to Account for Sampling Bias in Ecological Networks : Abstract: Citizen science monitoring programs can generate large amounts of valuable data, but are often affected by sampling bias. We focus on a citizen science initiative that records plant-pollinat...
- CardioLab: Laboratory Values Estimation from Electrocardiogram Features - An Exploratory Study : Abstract: Laboratory value represents a cornerstone of medical diagnostics, but suffers from slow turnaround times, and high costs and only provides information about a single point in time. The conti...
- Estimation of Cardiac and Non-cardiac Diagnosis from Electrocardiogram Features : Abstract: Ensuring timely and accurate diagnosis of medical conditions is paramount for effective patient care. Electrocardiogram (ECG) signals are fundamental for evaluating a patient's cardiac healt...
- Modelling Global Trade with Optimal Transport : Abstract: Global trade is shaped by a complex mix of factors beyond supply and demand, including tangible variables like transport costs and tariffs, as well as less quantifiable influences such as po...
- LEARNER: Contrastive Pretraining for Learning Fine-Grained Patient Progression from Coarse Inter-Patient Labels : Abstract: Predicting whether a treatment leads to meaningful improvement is a central challenge in personalized medicine, particularly when disease progression manifests as subtle visual changes over ...
- Explainable machine learning for neoplasms diagnosis via electrocardiograms: an externally validated study : Abstract: Background: Neoplasms are a major cause of mortality globally, where early diagnosis is essential for improving outcomes. Current diagnostic methods are often invasive, expensive, and inacce...
- What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning : Abstract: This study investigates the attribution patterns underlying Chain-of-Thought (CoT) reasoning in multilingual LLMs. While prior works demonstrate the role of CoT prompting in improving task p...
- Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language : Abstract: Our ability to interpret others' mental states through nonverbal cues (NVCs) is fundamental to our survival and social cohesion. While existing Theory of Mind (ToM) benchmarks have primarily...
- TOD-ProcBench: Benchmarking Complex Instruction-Following in Task-Oriented Dialogues : Abstract: In real-world task-oriented dialogue (TOD) settings, agents are required to strictly adhere to complex instructions while conducting multi-turn conversations with customers. These instructio...
- Liars' Bench: Evaluating Lie Detectors for Language Models : Abstract: Prior work has introduced techniques for detecting when large language models (LLMs) lie, that is, generating statements they believe are false. However, these techniques are typically valid...
- Learning Tractable Distributions Of Language Model Continuations : Abstract: Controlled language generation conditions text on sequence-level constraints (for example, syntax, style, or safety). These constraints may depend on future tokens, which makes directly cond...
- Early science acceleration experiments with GPT-5 : Abstract: AI models like GPT-5 are an increasingly valuable tool for scientists, but many remain unaware of the capabilities of frontier AI. We present a collection of short case studies in which GPT-...
- ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models : Abstract: The remarkable performance of Large Language Models (LLMs) highly relies on crafted prompts. However, manual prompt engineering is a laborious process, creating a core bottleneck for practic...
- TS-PEFT: Token-Selective Parameter-Efficient Fine-Tuning with Learnable Threshold Gating : Abstract: In the field of large models (LMs) for natural language processing (NLP) and computer vision (CV), Parameter-Efficient Fine-Tuning (PEFT) has emerged as a resource-efficient method that modi...
- SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning : Abstract: Effective scientific communication depends on accurate citations that validate sources and guide readers to supporting evidence. Yet academic literature faces mounting challenges: semantic c...
- SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs : Abstract: Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, t...
- SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning : Abstract: With the rapid advancement of large language models (LLMs), their deployment in real-world applications has become increasingly widespread. LLMs are expected to deliver robust performance ac...
- Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement : Abstract: Through reinforcement learning (RL) with outcome correctness rewards, large reasoning models (LRMs) with scaled inference computation have demonstrated substantial success on complex reasoni...
- NLP Datasets for Idiom and Figurative Language Tasks : Abstract: Idiomatic and figurative language form a large portion of colloquial speech and writing. With social media, this informal language has become more easily observable to people and trainers of...
- Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies : Abstract: Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common me...
- AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser : Abstract: While web data quality is crucial for large language models, most curation efforts focus on filtering and deduplication,treating HTML-to-text extraction as a fixed pre-processing step. Exist...
- ESGBench: A Benchmark for Explainable ESG Question Answering in Corporate Sustainability Reports : Abstract: We present ESGBench, a benchmark dataset and evaluation framework designed to assess explainable ESG question answering systems using corporate sustainability reports. The benchmark consists...
- Arctic-Extract Technical Report : Abstract: Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA...
- PersonaDrift: A Benchmark for Temporal Anomaly Detection in Language-Based Dementia Monitoring : Abstract: People living with dementia (PLwD) often show gradual shifts in how they communicate, becoming less expressive, more repetitive, or drifting off-topic in subtle ways. While caregivers may no...
- Anatomy of an Idiom: Tracing Non-Compositionality in Language Models : Abstract: We investigate the processing of idiomatic expressions in transformer-based language models using a novel set of techniques for circuit discovery and analysis. First discovering circuits via...
- Optimizing Quantum Key Distribution Network Performance using Graph Neural Networks : Abstract: This paper proposes an optimization of Quantum Key Distribution (QKD) Networks using Graph Neural Networks (GNN) framework. Today, the development of quantum computers threatens the security...
- Contrastive vision-language learning with paraphrasing and negation : Abstract: Contrastive vision-language models continue to be the dominant approach for image and text retrieval. Contrastive Language-Image Pre-training (CLIP) trains two neural networks in contrastive...
- Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks : Abstract: Understanding Large Language Models (LLMs) is key to ensure their safe and beneficial deployment. This task is complicated by the difficulty of interpretability of LLM structures, and the in...
- The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation : Abstract: The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimizatio...
- Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning : Abstract: Large Language Model (LLM) Agents, often trained with Reinforcement Learning (RL), are constrained by a dependency on human-curated data, limiting scalability and tethering AI to human knowl...
- Change-of-Basis Pruning via Rotational Invariance : Abstract: Structured pruning removes entire neurons or channels, but its effectiveness depends on how importance is distributed across the representation space. Change-of-basis (CoB) pruning addresses...
- Gauge-Equivariant Graph Networks via Self-Interference Cancellation : Abstract: Graph Neural Networks (GNNs) excel on homophilous graphs but often fail under heterophily due to self-reinforcing and phase-inconsistent signals. We propose a Gauge-Equivariant Graph Network...
- ILoRA: Federated Learning with Low-Rank Adaptation for Heterogeneous Client Aggregation : Abstract: Federated Learning with Low-Rank Adaptation (LoRA) faces three critical challenges under client heterogeneity: (1) Initialization-Induced Instability due to random initialization misaligning...
- A Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning : Abstract: Conventional Applicant Tracking Systems (ATS) tend to be inflexible keyword-matchers, and deny gifted candidates a role due to a few minor semantic mismatches. This article describes a new t...
- L-JacobiNet and S-JacobiNet: An Analysis of Adaptive Generalization, Stabilization, and Spectral Domain Trade-offs in GNNs : Abstract: Spectral GNNs, like ChebyNet, are limited by heterophily and over-smoothing due to their static, low-pass filter design. This work investigates the "Adaptive Orthogonal Polynomial Filter" (A...
- AssayMatch: Learning to Select Data for Molecular Activity Models : Abstract: The performance of machine learning models in drug discovery is highly dependent on the quality and consistency of the underlying training data. Due to limitations in dataset sizes, many mod...
- Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization : Abstract: Deterministic policy gradient algorithms for continuous control suffer from value estimation biases that degrade performance. While double critics reduce such biases, the exploration potenti...
- HybSpecNet: A Critical Analysis of Architectural Instability in Hybrid-Domain Spectral GNNs : Abstract: Spectral Graph Neural Networks offer a principled approach to graph filtering but face a fundamental "Stability-vs-Adaptivity" trade-off. This trade-off is dictated by the choice of spectral...
- Pathlet Variational Auto-Encoder for Robust Trajectory Generation : Abstract: Trajectory generation has recently drawn growing interest in privacy-preserving urban mobility studies and location-based service applications. Although many studies have used deep learning ...
- An Interpretability-Guided Framework for Responsible Synthetic Data Generation in Emotional Text : Abstract: Emotion recognition from social media is critical for understanding public sentiment, but accessing training data has become prohibitively expensive due to escalating API costs and platform ...
- Labels Matter More Than Models: Quantifying the Benefit of Supervised Time Series Anomaly Detection : Abstract: Time series anomaly detection (TSAD) is a critical data mining task often constrained by label scarcity. Consequently, current research predominantly focuses on Unsupervised Time-series Anom...
- Enhancing Nuclear Reactor Core Simulation through Data-Based Surrogate Models : Abstract: In recent years, there has been an increasing need for Nuclear Power Plants (NPPs) to improve flexibility in order to match the rapid growth of renewable energies. The Operator Assistance Pr...
- Achieving Skilled and Reliable Daily Probabilistic Forecasts of Wind Power at Subseasonal-to-Seasonal Timescales over France : Abstract: Accurate and reliable wind power forecasts are crucial for grid stability, balancing supply and demand, and market risk management. Even though short-term weather forecasts have been thoroug...
- CausalMamba: Interpretable State Space Modeling for Temporal Rumor Causality : Abstract: Rumor detection on social media remains a challenging task due to the complex propagation dynamics and the limited interpretability of existing models. While recent neural architectures capt...
- A Switching Framework for Online Interval Scheduling with Predictions : Abstract: We study online interval scheduling in the irrevocable setting, where each interval must be immediately accepted or rejected upon arrival. The objective is to maximize the total length of ac...
- Causal Synthetic Data Generation in Recruitment : Abstract: The importance of Synthetic Data Generation (SDG) has increased significantly in domains where data quality is poor or access is limited due to privacy and regulatory constraints. One such d...
- Towards Overcoming Data Scarcity in Nuclear Energy: A Study on Critical Heat Flux with Physics-consistent Conditional Diffusion Model : Abstract: Deep generative modeling provides a powerful pathway to overcome data scarcity in energy-related applications where experimental data are often limited, costly, or difficult to obtain. By le...
- Mind the Gap: Bridging Prior Shift in Realistic Few-Shot Crop-Type Classification : Abstract: Real-world agricultural distributions often suffer from severe class imbalance, typically following a long-tailed distribution. Labeled datasets for crop-type classification are inherently s...
- Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty : Abstract: Connected cyber-physical systems perform inference based on real-time inputs from multiple data streams. Uncertain communication delays across data streams challenge the temporal flow of the...
- Deep SOR Minimax Q-learning for Two-player Zero-sum Game : Abstract: In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen...
- Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective : Abstract: The ability of Large Language Models (LLMs) to perform complex, multi-step reasoning is a central focus of modern AI research. To evaluate and enhance this capability, the pass@k metric, whi...
- GeoPTH: A Lightweight Approach to Category-Based Trajectory Retrieval via Geometric Prototype Trajectory Hashing : Abstract: Trajectory similarity retrieval is an important part of spatiotemporal data mining, however, existing methods have the following limitations: traditional metrics are computationally expensiv...
- Graph Diffusion Counterfactual Explanation : Abstract: Machine learning models that operate on graph-structured data, such as molecular graphs or social networks, often make accurate predictions but offer little insight into why certain predicti...
- Optimizing Operation Recipes with Reinforcement Learning for Safe and Interpretable Control of Chemical Processes : Abstract: Optimal operation of chemical processes is vital for energy, resource, and cost savings in chemical engineering. The problem of optimal operation can be tackled with reinforcement learning, ...
- Learning-Enhanced Observer for Linear Time-Invariant Systems with Parametric Uncertainty : Abstract: This work introduces a learning-enhanced observer (LEO) for linear time-invariant systems with uncertain dynamics. Rather than relying solely on nominal models, the proposed framework treats...
- Beyond Generative AI: World Models for Clinical Prediction, Counterfactuals, and Planning : Abstract: Healthcare requires AI that is predictive, reliable, and data-efficient. However, recent generative models lack physical foundation and temporal reasoning required for clinical decision supp...
- Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors : Abstract: Scalable Gaussian process (GP) inference is essential for sequential decision-making tasks, yet improving GP scalability remains a challenging problem with many open avenues of research. Thi...
- Are Foundation Models Useful for Bankruptcy Prediction? : Abstract: Foundation models have shown promise across various financial applications, yet their effectiveness for corporate bankruptcy prediction remains systematically unevaluated against established...
- Optimal Fairness under Local Differential Privacy : Abstract: We investigate how to optimally design local differential privacy (LDP) mechanisms that reduce data unfairness and thereby improve fairness in downstream classification. We first derive a cl...
- Collaborative Management for Chronic Diseases and Depression: A Double Heterogeneity-based Multi-Task Learning Method : Abstract: Wearable sensor technologies and deep learning are transforming healthcare management. Yet, most health sensing studies focus narrowly on physical chronic diseases. This overlooks the critic...
- FreqFlow: Long-term forecasting using lightweight flow matching : Abstract: Multivariate time-series (MTS) forecasting is fundamental to applications ranging from urban mobility and resource management to climate modeling. While recent generative models based on den...
- Generative Modeling of Clinical Time Series via Latent Stochastic Differential Equations : Abstract: Clinical time series data from electronic health records and medical registries offer unprecedented opportunities to understand patient trajectories and inform medical decision-making. Howev...
- A Comparison Between Decision Transformers and Traditional Offline Reinforcement Learning Algorithms : Abstract: The field of Offline Reinforcement Learning (RL) aims to derive effective policies from pre-collected datasets without active environment interaction. While traditional offline RL algorithms...
- Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments : Abstract: Scalarisation functions are widely employed in MORL algorithms to enable intelligent decision-making. However, these functions often struggle to approximate the Pareto front accurately, rend...
- Correlation-Aware Feature Attribution Based Explainable AI : Abstract: Explainable AI (XAI) is increasingly essential as modern models become more complex and high-stakes applications demand transparency, trust, and regulatory compliance. Existing global attrib...
- Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense : Abstract: Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language mode...
- ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation : Abstract: In recent years, increasingly large models have achieved outstanding performance across CV tasks. However, these models demand substantial computational resources and storage, and their grow...
- Loss Functions Robust to the Presence of Label Errors : Abstract: Methods for detecting label errors in training data require models that are robust to label errors (i.e., not fit to erroneously labelled data points). However, acquiring such models often i...
- Saving Foundation Flow-Matching Priors for Inverse Problems : Abstract: Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock the...
- Dynamic Participation in Federated Learning: Benchmarks and a Knowledge Pool Plugin : Abstract: Federated learning (FL) enables clients to collaboratively train a shared model in a distributed manner, setting it apart from traditional deep learning paradigms. However, most existing FL ...
- FairLRF: Achieving Fairness through Sparse Low Rank Factorization : Abstract: As deep learning (DL) techniques become integral to various applications, ensuring model fairness while maintaining high performance has become increasingly critical, particularly in sensiti...
- Broad stochastic configuration residual learning system for norm-convergent universal approximation : Abstract: Universal approximation serves as the foundation of neural network learning algorithms. However, some networks establish their universal approximation property by demonstrating that the iter...
- Toward Valid Generative Clinical Trial Data with Survival Endpoints : Abstract: Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While ex...
- Boosting Predictive Performance on Tabular Data through Data Augmentation with Latent-Space Flow-Based Diffusion : Abstract: Severe class imbalance is common in real-world tabular learning, where rare but important minority classes are essential for reliable prediction. Existing generative oversampling methods suc...
- ECPv2: Fast, Efficient, and Scalable Global Optimization of Lipschitz Functions : Abstract: We propose ECPv2, a scalable and theoretically grounded algorithm for global optimization of Lipschitz-continuous functions with unknown Lipschitz constants. Building on the Every Call is Pr...
- Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods : Abstract: Differentially private stochastic gradient descent (DP-SGD) has become the standard algorithm for training machine learning models with rigorous privacy guarantees. Despite its widespread us...
- gfnx: Fast and Scalable Library for Generative Flow Networks in JAX : Abstract: In this paper, we present gfnx, a fast and scalable package for training and evaluating Generative Flow Networks (GFlowNets) written in JAX. gfnx provides an extensive set of environments an...
- Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies : Abstract: Palpation, the use of touch in medical examination, is almost exclusively performed by humans. We investigate a proof of concept for an artificial palpation method based on self-supervised l...
- Stabilizing Policy Gradient Methods via Reward Profiling : Abstract: Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances c...
- Evolution Strategies at the Hyperscale : Abstract: We introduce Evolution Guided General Optimization via Low-rank Learning (EGGROLL), an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population si...
- Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter : Abstract: The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these re...
- Graph-Memoized Reasoning: Foundations Structured Workflow Reuse in Intelligent Systems : Abstract: Modern large language model-based reasoning systems frequently recompute similar reasoning steps across tasks, wasting computational resources, inflating inference latency, and limiting repr...
- Human-aligned Quantification of Numerical Data : Abstract: Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges ...
- Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer : Abstract: Multimodal learning systems often face substantial uncertainty due to noisy data, low-quality labels, and heterogeneous modality characteristics. These issues become especially critical in h...
- SURFing to the Fundamental Limit of Jet Tagging : Abstract: Beyond the practical goal of improving search and measurement sensitivity through better jet tagging algorithms, there is a deeper question: what are their upper performance limits? Generati...
- Atlas Gaussian processes on restricted domains and point clouds : Abstract: In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. T...
- WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion : Abstract: Accurate 6D object pose estimation is vital for robotics, augmented reality, and scene understanding. For seen objects, high accuracy is often attainable via per-object fine-tuning but gener...
- Box6D : Zero-shot Category-level 6D Pose Estimation of Warehouse Boxes : Abstract: Accurate and efficient 6D pose estimation of novel objects under clutter and occlusion is critical for robotic manipulation across warehouse automation, bin picking, logistics, and e-commerc...
- EEG Emotion Recognition Through Deep Learning : Abstract: An advanced emotion classification model was developed using a CNN-Transformer architecture for emotion recognition from EEG brain wave signals, effectively distinguishing among three emotio...
- Machine Learning vs. Randomness: Challenges in Predicting Binary Options Movements : Abstract: Binary options trading is often marketed as a field where predictive models can generate consistent profits. However, the inherent randomness and stochastic nature of binary options make pri...
- A Primer on Quantum Machine Learning : Abstract: Quantum machine learning (QML) is a computational paradigm that seeks to apply quantum-mechanical resources to solve learning problems. As such, the goal of this framework is to leverage qua...
- Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows : Abstract: Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task f...
- Fairness in Multi-modal Medical Diagnosis with Demonstration Selection : Abstract: Multimodal large language models (MLLMs) have shown strong potential for medical image reasoning, yet fairness across demographic groups remains a major concern. Existing debiasing methods o...
- Digital Agriculture Sandbox for Collaborative Research : Abstract: Digital agriculture is transforming the way we grow food by utilizing technology to make farming more efficient, sustainable, and productive. This modern approach to agriculture generates a ...
- Towards a Safer and Sustainable Manufacturing Process: Material classification in Laser Cutting Using Deep Learning : Abstract: Laser cutting is a widely adopted technology in material processing across various industries, but it generates a significant amount of dust, smoke, and aerosols during operation, posing a r...
- Operon: Incremental Construction of Ragged Data via Named Dimensions : Abstract: Modern data processing workflows frequently encounter ragged data: collections with variable-length elements that arise naturally in domains like natural language processing, scientific meas...
- Angular Graph Fractional Fourier Transform: Theory and Application : Abstract: Graph spectral representations are fundamental in graph signal processing, offering a rigorous framework for analyzing and processing graph-structured data. The graph fractional Fourier tran...
- Approximation rates of quantum neural networks for periodic functions via Jackson's inequality : Abstract: Quantum neural networks (QNNs) are an analog of classical neural networks in the world of quantum computing, which are represented by a unitary matrix with trainable parameters. Inspired by ...
- MagBotSim: Physics-Based Simulation and Reinforcement Learning Environments for Magnetic Robotics : Abstract: Magnetic levitation is about to revolutionize in-machine material flow in industrial automation. Such systems are flexibly configurable and can include a large number of independently actuat...
- ART: A Graph-based Framework for Investigating Illicit Activity in Monero via Address-Ring-Transaction Structures : Abstract: As Law Enforcement Agencies advance in cryptocurrency forensics, criminal actors aiming to conceal illicit fund movements increasingly turn to "mixin" services or privacy-based cryptocurrenc...
- FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks : Abstract: The development of Large Language Models (LLMs) increasingly depends on high-quality supervised data, yet existing instruction-tuning and RL datasets remain costly to curate and often rely o...
- Spectral Identifiability for Interpretable Probe Geometry : Abstract: Linear probes are widely used to interpret and evaluate neural representations, yet their reliability remains unclear, as probes may appear accurate in some regimes but collapse unpredictabl...
- Sparse Autoencoders are Topic Models : Abstract: Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally un...
- VersaPants: A Loose-Fitting Textile Capacitive Sensing System for Lower-Body Motion Capture : Abstract: We present VersaPants, the first loose-fitting, textile-based capacitive sensing system for lower-body motion capture, built on the open-hardware VersaSens platform. By integrating conductiv...
- Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen : Abstract: Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. This work int...
- Unsupervised Graph Neural Network Framework for Balanced Multipatterning in Advanced Electronic Design Automation Layouts : Abstract: Multipatterning is an essential decomposition strategy in electronic design automation (EDA) that overcomes lithographic limitations when printing dense circuit layouts. Although heuristic-b...
- Classification of worldwide news articles by perceived quality, 2018-2024 : Abstract: This study explored whether supervised machine learning and deep learning models can effectively distinguish perceived lower-quality news articles from perceived higher-quality news articles...
- Graph Neural Networks for Surgical Scene Segmentation : Abstract: Purpose: Accurate identification of hepatocystic anatomy is critical to preventing surgical complications during laparoscopic cholecystectomy. Deep learning models often struggle with occlus...
- Effective Code Membership Inference for Code Completion Models via Adversarial Prompts : Abstract: Membership inference attacks (MIAs) on code completion models offer an effective way to assess privacy risks by inferring whether a given code snippet was part of the training data. Existing...
- Eye Care You: Voice Guidance Application Using Social Robot for Visually Impaired People : Abstract: In the study, the device of social robot was designed for visually impaired users, and along with a mobile application for provide functions to assist their lives. Both physical and mental c...
- Multi-Aspect Cross-modal Quantization for Generative Recommendation : Abstract: Generative Recommendation (GR) has emerged as a new paradigm in recommender systems. This approach relies on quantized representations to discretize item features, modeling users' historical...
- ItemRAG: Item-Based Retrieval-Augmented Generation for LLM-Based Recommendation : Abstract: Recently, large language models (LLMs) have been widely used as recommender systems, owing to their strong reasoning capability and their effectiveness in handling cold-start items. To bette...
- Can MLLMs Detect Phishing? A Comprehensive Security Benchmark Suite Focusing on Dynamic Threats and Multimodal Evaluation in Academic Environments : Abstract: The rapid proliferation of Multimodal Large Language Models (MLLMs) has introduced unprecedented security challenges, particularly in phishing detection within academic environments. Academi...
- Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework : Abstract: Automated web application testing is a critical component of modern software development, with frameworks like Selenium widely adopted for validating functionality through browser automation...
- SWR-Viz: AI-assisted Interactive Visual Analytics Framework for Ship Weather Routing : Abstract: Efficient and sustainable maritime transport increasingly depends on reliable forecasting and adaptive routing, yet operational adoption remains difficult due to forecast latencies and the n...
- Eq.Bot: Enhance Robotic Manipulation Learning via Group Equivariant Canonicalization : Abstract: Robotic manipulation systems are increasingly deployed across diverse domains. Yet existing multi-modal learning frameworks lack inherent guarantees of geometric consistency, struggling to h...
- Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks : Abstract: Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool ...
- PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback : Abstract: Effective presentation skills are essential in education, professional communication, and public speaking, yet learners often lack access to high-quality exemplars or personalized coaching. ...
- Behavior Trees vs Executable Ontologies: a Comparative Analysis of Robot Control Paradigms : Abstract: This paper compares two distinct approaches to modeling robotic behavior: imperative Behavior Trees (BTs) and declarative Executable Ontologies (EO), implemented through the boldsea framewor...
- Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments : Abstract: Path planning in dynamic environments is a fundamental challenge in intelligent transportation and robotics, where obstacles and conditions change over time, introducing uncertainty and requ...
- Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions : Abstract: Achieving Sustainable Development Goal 7 (Affordable and Clean Energy) requires not only technological innovation but also a deeper understanding of the socioeconomic factors influencing ene...
- RRT*former: Environment-Aware Sampling-Based Motion Planning using Transformer : Abstract: We investigate the sampling-based optimal path planning problem for robotics in complex and dynamic environments. Most existing sampling-based algorithms neglect environmental information or...
- Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs : Abstract: Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phis...
- Insights from the ICLR Peer Review and Rebuttal Process : Abstract: Peer review is a cornerstone of scientific publishing, including at premier machine learning conferences such as ICLR. As submission volumes increase, understanding the nature and dynamics o...
- Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies : Abstract: Diffusion Policy has shown great performance in robotic manipulation tasks under stochastic perturbations, due to its ability to model multimodal action distributions. Nonetheless, its relia...
- B+ANN: A Fast Billion-Scale Disk-based Nearest-Neighbor Index : Abstract: Storing and processing of embedding vectors by specialized Vector databases (VDBs) has become the linchpin in building modern AI pipelines. Most current VDBs employ variants of a graph-based...
- Optimus-Q: Utilizing Federated Learning in Adaptive Robots for Intelligent Nuclear Power Plant Operations through Quantum Cryptography : Abstract: The integration of advanced robotics in nuclear power plants (NPPs) presents a transformative opportunity to enhance safety, efficiency, and environmental monitoring in high-stakes environme...
- Sufficient Explanations in Databases and their Connections to Necessary Explanations and Repairs : Abstract: The notion of cause, as formalized by Halpern and Pearl, has been recently applied to relational databases, to characterize and compute causal explanations for query answers. In this work we...
- Joint Semantic-Channel Coding and Modulation for Token Communications : Abstract: In recent years, the Transformer architecture has achieved outstanding performance across a wide range of tasks and modalities. Token is the unified input and output representation in Transf...
- Intelligent Collaborative Optimization for Rubber Tyre Film Production Based on Multi-path Differentiated Clipping Proximal Policy Optimization : Abstract: The advent of smart manufacturing is addressing the limitations of traditional centralized scheduling and inflexible production line configurations in the rubber tyre industry, especially in...
- Incremental Maintenance of DatalogMTL Materialisations : Abstract: DatalogMTL extends the classical Datalog language with metric temporal logic (MTL), enabling expressive reasoning over temporal data. While existing reasoning approaches, such as materialisa...
- Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning : Abstract: Recent advances in reinforcement learning (RL) have significantly improved the complex reasoning capabilities of large language models (LLMs). Despite these successes, existing methods mainl...
- U2UData+: A Scalable Swarm UAVs Autonomous Flight Dataset for Embodied Long-horizon Tasks : Abstract: Swarm UAV autonomous flight for Embodied Long-Horizon (ELH) tasks is crucial for advancing the low-altitude economy. However, existing methods focus only on specific basic tasks due to datas...
- S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning : Abstract: Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the require...
- MF-Speech: Achieving Fine-Grained and Compositional Control in Speech Generation via Factor Disentanglement : Abstract: Generating expressive and controllable human speech is one of the core goals of generative artificial intelligence, but its progress has long been constrained by two fundamental challenges: ...
- Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos : Abstract: Embodied world models aim to predict and interact with the physical world through visual observations and actions. However, existing models struggle to accurately translate low-level actions...
- Making Evidence Actionable in Adaptive Learning Closing the Diagnostic Pedagogical Loop : Abstract: Adaptive learning often diagnoses precisely yet intervenes weakly, producing help that is mistimed or misaligned. This study presents evidence supporting an instructor-governed feedback loop...
- Extending Test-Time Scaling: A 3D Perspective with Context, Batch, and Turn : Abstract: Reasoning reinforcement learning (RL) has recently revealed a new scaling effect: test-time scaling. Thinking models such as R1 and o1 improve their reasoning accuracy at test time as the le...
- Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models : Abstract: Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, acc...
- TB or Not TB: Coverage-Driven Direct Preference Optimization for Verilog Stimulus Generation : Abstract: With the rapid advancement of Large Language Models (LLMs), there is growing interest in applying them to hardware design and verification. Among these stages, design verification remains th...
- TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models : Abstract: Adversarially perturbed images of text can cause sophisticated OCR systems to produce misleading or incorrect transcriptions from seemingly invisible changes to humans. Some of these perturb...
- Beyond Tsybakov: Model Margin Noise and $\mathcal{H}$-Consistency Bounds : Abstract: We introduce a new low-noise condition for classification, the Model Margin Noise (MM noise) assumption, and derive enhanced $\mathcal{H}$-consistency bounds under this condition. MM noise i...
- Attention-Based Feature Online Conformal Prediction for Time Series : Abstract: Online conformal prediction (OCP) wraps around any pre-trained predictor to produce prediction sets with coverage guarantees that hold irrespective of temporal dependencies or distribution s...
- Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution : Abstract: Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing ...
- discretize_distributions: Efficient Quantization of Gaussian Mixtures with Guarantees in Wasserstein Distance : Abstract: We present discretize_distributions, a Python package that efficiently constructs discrete approximations of Gaussian mixture distributions and provides guarantees on the approximation error...
- GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances : Abstract: We introduce GLOBE, a new neural surrogate for homogeneous PDEs that draws inductive bias from boundary-element methods and equivariant ML. GLOBE represents solutions as superpositions of le...
- Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization : Abstract: Speculative sampling reduces the latency of autoregressive decoding for target model LLMs without sacrificing inference quality, by using a cheap draft model to suggest a candidate token and...
- Unified all-atom molecule generation with neural fields : Abstract: Generative models for structure-based drug design are often limited to a specific modality, restricting their broader applicability. To address this challenge, we introduce FuncBind, a frame...
- AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization : Abstract: We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hard...
- Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone : Abstract: Diffusion-based language models have recently emerged as a promising alternative to autoregressive generation, yet their reliance on Transformer backbones limits inference efficiency due to ...
- iLTM: Integrated Large Tabular Model : Abstract: Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gra...
- Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy : Abstract: We propose a self-supervised machine learning (SSML) framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings. A self-supervised repre...
- Machine Learning Epidemic Predictions Using Agent-based Wireless Sensor Network Models : Abstract: The lack of epidemiological data in wireless sensor networks (WSNs) is a fundamental difficulty in constructing robust models to forecast and mitigate threats such as viruses and worms. Many...
- Descend or Rewind? Stochastic Gradient Descent Unlearning : Abstract: Machine unlearning algorithms aim to remove the impact of selected training data from a model without the computational expenses of retraining from scratch. Two such algorithms are ``Descent...
- Synergizing Deconfounding and Temporal Generalization For Time-series Counterfactual Outcome Estimation : Abstract: Estimating counterfactual outcomes from time-series observations is crucial for effective decision-making, e.g. when to administer a life-saving treatment, yet remains significantly challeng...
- Physics-Guided Inductive Spatiotemporal Kriging for PM2.5 with Satellite Gradient Constraints : Abstract: High-resolution mapping of fine particulate matter (PM2.5) is a cornerstone of sustainable urbanism but remains critically hindered by the spatial sparsity of ground monitoring networks. Whi...
- CARE: Turning LLMs Into Causal Reasoning Expert : Abstract: Large language models (LLMs) have recently demonstrated impressive capabilities across a range of reasoning and generation tasks. However, research studies have shown that LLMs lack the abil...
- HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming : Abstract: Two-stage Stochastic Programming (2SP) is a standard framework for modeling decision-making problems under uncertainty. While numerous methods exist, solving such problems with many scenario...
- The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs : Abstract: Large language models (LLMs) have achieved remarkable results on tasks framed as reasoning problems, yet their true ability to perform procedural reasoning, executing multi-step, rule-based ...
- Learning Interestingness in Automated Mathematical Theory Formation : Abstract: We take two key steps in automating the open-ended discovery of new mathematical theories, a grand challenge in artificial intelligence. First, we introduce $\emph{FERMAT}$, a reinforcement ...
- Ask WhAI:Probing Belief Formation in Role-Primed LLM Agents : Abstract: We present Ask WhAI, a systems-level framework for inspecting and perturbing belief states in multi-agent interactions. The framework records and replays agent interactions, supports out-of-...
- Subnational Geocoding of Global Disasters Using Large Language Models : Abstract: Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual fo...
- Project Rachel: Can an AI Become a Scholarly Author? : Abstract: This paper documents Project Rachel, an action research study that created and tracked a complete AI academic identity named Rachel So. Through careful publication of AI-generated research p...
- Uncertainty-Aware Measurement of Scenario Suite Representativeness for Autonomous Systems : Abstract: Assuring the trustworthiness and safety of AI systems, e.g., autonomous vehicles (AV), depends critically on the data-related safety properties, e.g., representativeness, completeness, etc.,...
- SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) improve answer quality through explicit chain-of-thought, yet this very capability introduces new safety risks: harmful content can be subtly injected, surface ...
- HISE-KT: Synergizing Heterogeneous Information Networks and LLMs for Explainable Knowledge Tracing with Meta-Path Optimization : Abstract: Knowledge Tracing (KT) aims to mine students' evolving knowledge states and predict their future question-answering performance. Existing methods based on heterogeneous information networks ...
- As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files : Abstract: The remarkable language ability of Large Language Models (LLMs) stems from extensive training on vast datasets, often including copyrighted material, which raises serious concerns about unau...
- SOLID: a Framework of Synergizing Optimization and LLMs for Intelligent Decision-Making : Abstract: This paper introduces SOLID (Synergizing Optimization and Large Language Models for Intelligent Decision-Making), a novel framework that integrates mathematical optimization with the context...
- Efficiency Will Not Lead to Sustainable Reasoning AI : Abstract: AI research is increasingly moving toward complex problem solving, where models are optimized not only for pattern recognition but for multi-step reasoning. Historically, computing's global ...
- Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research : Abstract: In this paper, we argue that current AI research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence repr...
- Terra Nova: A Comprehensive Challenge Environment for Intelligent Agents : Abstract: We introduce Terra Nova, a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V. A CCE is a single environment in which multiple ...
- Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining : Abstract: As Decentralized Finance (DeFi) develops, understanding user intent behind DeFi transactions is crucial yet challenging due to complex smart contract interactions, multifaceted on-/off-chain...
- Exploring the use of AI authors and reviewers at Agents4Science : Abstract: There is growing interest in using AI agents for scientific research, yet fundamental questions remain about their capabilities as scientists and reviewers. To explore these questions, we or...
- What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity : Abstract: AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its i...
- TacEleven: generative tactic discovery for football open play : Abstract: Creating offensive advantages during open play is fundamental to football success. However, due to the highly dynamic and long-sequence nature of open play, the potential tactic space grows ...
- Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm : Abstract: Membership Inference Attack (MIA) aims to determine if a data sample is used in the training dataset of a target model. Traditional MIA obtains feature of target model via shadow models and ...
- Image-Seeking Intent Prediction for Cross-Device Product Search : Abstract: Large Language Models (LLMs) are transforming personalized search, recommendations, and customer interaction in e-commerce. Customers increasingly shop across multiple devices, from voice-on...
- An LLM-Powered Agent for Real-Time Analysis of the Vietnamese IT Job Market : Abstract: Individuals entering Vietnam's dynamic Information Technology (IT) job market face a critical gap in reliable career guidance. Existing market reports are often outdated, while the manual an...
- Causally-Informed Reinforcement Learning for Adaptive Emotion-Aware Social Media Recommendation : Abstract: Social media recommendation systems play a central role in shaping users' emotional experiences. However, most systems are optimized solely for engagement metrics, such as click rate, viewin...
- ExplainRec: Towards Explainable Multi-Modal Zero-Shot Recommendation with Preference Attribution and Large Language Models : Abstract: Recent advances in Large Language Models (LLMs) have opened new possibilities for recommendation systems, though current approaches such as TALLRec face challenges in explainability and cold...
- Quantifying the Role of OpenFold Components in Protein Structure Prediction : Abstract: Models such as AlphaFold2 and OpenFold have transformed protein structure prediction, yet their inner workings remain poorly understood. We present a methodology to systematically evaluate t...
- Enabling Predictive Maintenance in District Heating Substations: A Labelled Dataset and Fault Detection Evaluation Framework based on Service Data : Abstract: Early detection of faults in district heating substations is imperative to reduce return temperatures and enhance efficiency. However, progress in this domain has been hindered by the limite...
- irace-evo: Automatic Algorithm Configuration Extended With LLM-Based Code Evolution : Abstract: Automatic algorithm configuration tools such as irace efficiently tune parameter values but leave algorithmic code unchanged. This paper introduces a first version of irace-evo, an extension...
- Evaluating Generative AI for CS1 Code Grading: Direct vs Reverse Methods : Abstract: Manual grading of programming assignments in introductory computer science courses can be time-consuming and prone to inconsistencies. While unit testing is commonly used for automatic evalu...
- Scalable and Efficient Large-Scale Log Analysis with LLMs: An IT Software Support Case Study : Abstract: IT environments typically have logging mechanisms to monitor system health and detect issues. However, the huge volume of generated logs makes manual inspection impractical, highlighting the...
- Towards Continuous Assurance with Formal Verification and Assurance Cases : Abstract: Autonomous systems must sustain justified confidence in their correctness and safety across their operational lifecycle-from design and deployment through post-deployment evolution. Traditio...
- Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech : Abstract: Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive spe...
- PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants : Abstract: Kolmogorov-Arnold Networks (KANs) promise higher expressive capability and stronger interpretability than Multi-Layer Perceptron, particularly in the domain of AI for Science. However, pract...
- Fifty Shades of Greenwashing: The Political Economy of Climate Change Advertising on Social Media : Abstract: In this paper, we provide a novel measure for greenwashing -- i.e., climate-related misinformation -- that shows how polluting companies can use social media advertising related to climate c...
- How Should the Law Treat Future AI Systems? Fictional Legal Personhood versus Legal Identity : Abstract: The law draws a sharp distinction between objects and persons, and between two kinds of persons, the ''fictional'' kind (i.e. corporations), and the ''non-fictional'' kind (individual or ''n...
- Harmful Traits of AI Companions : Abstract: Amid the growing prevalence of human -- AI interaction, large language models and other AI-based entities increasingly provide forms of companionship to human users. Such AI companionship --...
- SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification : Abstract: As more autonomous vehicles operate on public roads, understanding real-world behavior of autonomous vehicles is critical to analyzing traffic safety, making policies, and public acceptance....
- Aligning Generative Music AI with Human Preferences: Methods and Challenges : Abstract: Recent advances in generative AI for music have achieved remarkable fidelity and stylistic diversity, yet these systems often fail to align with nuanced human preferences due to the specific...
- MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm : Abstract: The AI trustworthiness crisis threatens to derail the artificial intelligence revolution, with regulatory barriers, security vulnerabilities, and accountability gaps preventing deployment in...
Research Sources: 347 | Generated: 11/21/2025
