AI Research News Feeds for January 1st, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting : Abstract: Implicit neural representations (INRs) have achieved remarkable success in image representation and compression, but they require substantial training time and memory. Meanwhile, recent 2D G...
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment : Abstract: Despite Contrastive Language-Image Pretraining (CLIP)'s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, w...
OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization : Abstract: Video diffusion models (VDMs) have demonstrated remarkable capabilities in text-to-video (T2V) generation. Despite their success, VDMs still suffer from degraded image quality and flickering...
Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction : Abstract: Camera-based 3D Semantic Occupancy Prediction (SOP) is crucial for understanding complex 3D scenes from limited 2D image observations. Existing SOP methods typically aggregate contextual fea...
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning : Abstract: Large Multimodal Models (LMMs) have made significant breakthroughs with the advancement of instruction tuning. However, while existing models can understand images and videos at a holistic l...
Reconstructing Hand-Held Objects in 3D from Images and Videos : Abstract: Objects manipulated by the hand (i.e., manipulanda) are particularly challenging to reconstruct from Internet videos. Not only does the hand occlude much of the object, but also the object i...
Matching Semantically Similar Non-Identical Objects : Abstract: Not identical but similar objects are ubiquitous in our world, ranging from four-legged animals such as dogs and cats to cars of different models and flowers of various colors. This study ad...
HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising : Abstract: Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, existing deep learning (DL)-based approaches only restore on...
PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes : Abstract: Realistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effects generation from text input...
Towards autonomous time-calibration of large quantum-dot devices: Detection, real-time feedback, and noise spectroscopy : Abstract: The performance and scalability of semiconductor quantum-dot (QD) qubits are limited by electrostatic drift and charge noise that shift operating points and destabilize qubit parameters. As ...
Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways : Abstract: The rapid expansion of generative AI has normalized large-scale synthetic media creation, enabling new forms of covert communication. Recent generative steganography methods, particularly th...
Geometric Multi-Session Map Merging with Learned Local Descriptors : Abstract: Multi-session map merging is crucial for extended autonomous operations in large-scale environments. In this paper, we present GMLD, a learning-based local descriptor framework for large-sca...
RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Contextual Adaptation : Abstract: Efficiently finding targets in complex environments is fundamental to real-world embodied applications. While recent advances in multimodal foundation models have enabled zero-shot object go...
Targeted Semantic Segmentation of Himalayan Glacial Lakes Using Time-Series SAR: Towards Automated GLOF Early Warning : Abstract: Glacial Lake Outburst Floods (GLOFs) are one of the most devastating climate change induced hazards. Existing remote monitoring approaches often prioritise maximising spatial coverage to tra...
One-Shot Structured Pruning of Quantum Neural Networks via $q$-Group Engineering and Quantum Geometric Metrics : Abstract: Quantum neural networks (QNNs) suffer from severe gate-level redundancy, which hinders their deployment on noisy intermediate-scale quantum (NISQ) devices. In this work, we propose q-iPrune,...
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation : Abstract: Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they stru...
GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction : Abstract: Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approac...
Edit3r: Instant 3D Scene Editing from Sparse Unposed Images : Abstract: We present Edit3r, a feed-forward framework that reconstructs and edits 3D scenes in a single pass from unposed, view-inconsistent, instruction-edited images. Unlike prior methods requiring ...
FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion : Abstract: Recognizing fine-grained actions from temporally corrupted skeleton sequences remains a significant challenge, particularly in real-world scenarios where online pose estimation often yields ...
From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing : Abstract: Audio-driven visual dubbing aims to synchronize a video's lip movements with new speech, but is fundamentally challenged by the lack of ideal training data: paired videos where only a subjec...
FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM : Abstract: We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches for accurate and robust tracking...
Bi-C2R: Bidirectional Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification : Abstract: Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main chall...
VIPER: Process-aware Evaluation for Generative Video Reasoning : Abstract: Recent breakthroughs in video generation have demonstrated an emerging capability termed Chain-of-Frames (CoF) reasoning, where models resolve complex tasks through the generation of continu...
Semi-Supervised Diversity-Aware Domain Adaptation for 3D Object detection : Abstract: 3D object detectors are fundamental components of perception systems in autonomous vehicles. While these detectors achieve remarkable performance on standard autonomous driving benchmarks, t...
FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation : Abstract: We introduce FinMMDocR, a novel bilingual multimodal benchmark for evaluating multimodal large language models (MLLMs) on real-world financial numerical reasoning. Compared to existing bench...
OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation : Abstract: The Segment Anything Model 2 (SAM2) has demonstrated remarkable promptable visual segmentation capabilities in video data, showing potential for extension to medical image segmentation (MIS)...
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision-language tasks. However, their performance as embodied agents, which requires...
CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture : Abstract: Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlus...
UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning : Abstract: 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis. Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmenta...
Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression : Abstract: The recent advent of 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in real-time novel view synthesis. However, the rapid proliferation of 3DGS-based algorithms has creat...
EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation : Abstract: Sound effects build an essential layer of multimodal storytelling, shaping the emotional atmosphere and the narrative semantics of videos. Despite recent advancement in video-text-to-audio (...
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation : Abstract: In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly...
From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation : Abstract: Inspired by the remarkable success of autoregressive models in language modeling, this paradigm has been widely adopted in visual generation. However, the sequential token-by-token decoding ...
FireRescue: A UAV-Based Dataset and Enhanced YOLO Model for Object Detection in Fire Rescue Scenes : Abstract: Object detection in fire rescue scenarios is importance for command and decision-making in firefighting operations. However, existing research still suffers from two main limitations. First,...
LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning : Abstract: Establishing the correct correspondence of feature points is a fundamental task in computer vision. However, the presence of numerous outliers among the feature points can significantly affe...
MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding : Abstract: 3D visual grounding aims to localize the object in 3D point cloud scenes that semantically corresponds to given natural language sentences. It is very critical for roadside infrastructure sy...
Collaborative Low-Rank Adaptation for Pre-Trained Vision Transformers : Abstract: Low-rank adaptation (LoRA) has achieved remarkable success in fine-tuning pre-trained vision transformers for various downstream tasks. Existing studies mainly focus on exploring more parame...
SliceLens: Fine-Grained and Grounded Error Slice Discovery for Multi-Instance Vision Tasks : Abstract: Systematic failures of computer vision models on subsets with coherent visual patterns, known as error slices, pose a critical challenge for robust model evaluation. Existing slice discovery...
Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning : Abstract: Change detection visual question answering (CDVQA) requires answering text queries by reasoning about semantic changes in bi-temporal remote sensing images. A straightforward approach is to ...
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios : Abstract: Visual Grounding (VG) aims to localize specific objects in an image according to natural language expressions, serving as a fundamental task in vision-language understanding. However, existi...
OCP-LS: An Efficient Algorithm for Visual Localization : Abstract: This paper proposes a novel second-order optimization algorithm. It aims to address large-scale optimization problems in deep learning because it incorporates the OCP method and appropriatel...
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation : Abstract: Recent advances in text-to-video (T2V) generation have achieved good visual quality, yet synthesizing videos that faithfully follow physical laws remains an open challenge. Existing methods ...
Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression : Abstract: The exponential growth of video traffic has placed increasing demands on bandwidth and storage infrastructure, particularly for content delivery networks (CDNs) and edge devices. While tradi...
Using Large Language Models To Translate Machine Results To Human Results : Abstract: Artificial intelligence (AI) has transformed medical imaging, with computer vision (CV) systems achieving state-of-the-art performance in classification and detection tasks. However, these s...
Exploring Compositionality in Vision Transformers using Wavelet Representations : Abstract: While insights into the workings of the transformer model have largely emerged by analysing their behaviour on language tasks, this work investigates the representations learnt by the Vision...
AI-Driven Evaluation of Surgical Skill via Action Recognition : Abstract: The development of effective training and evaluation strategies is critical. Conventional methods for assessing surgical proficiency typically rely on expert supervision, either through onsi...
DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model : Abstract: Generating realistic, dyadic talking head video requires ultra-low latency. Existing chunk-based methods require full non-causal context windows, introducing significant delays. This high la...
RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics : Abstract: Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic model size selection (DMSS) is ...
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems : Abstract: The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. Whi...
The Mechanics of CNN Filtering with Rectification : Abstract: This paper proposes elementary information mechanics as a new model for understanding the mechanical properties of convolutional filtering with rectification, inspired by physical theories o...
Spatial-aware Vision Language Model for Autonomous Driving : Abstract: While Vision-Language Models (VLMs) show significant promise for end-to-end autonomous driving by leveraging the common sense embedded in language models, their reliance on 2D image cues for...
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning : Abstract: While Vision-Language Models (VLMs) can solve complex tasks through agentic reasoning, their capabilities remain largely constrained to text-oriented chain-of-thought or isolated tool invoca...
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention : Abstract: Egocentric Referring Video Object Segmentation (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person video...
UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots : Abstract: A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility. Despite advances in h...
LiftProj: Space Lifting and Projection-Based Panorama Stitching : Abstract: Traditional image stitching techniques have predominantly utilized two-dimensional homography transformations and mesh warping to achieve alignment on a planar surface. While effective for s...
Physically-Grounded Manifold Projection with Foundation Priors for Metal Artifact Reduction in Dental CBCT : Abstract: Metal artifacts in Dental CBCT severely obscure anatomical structures, hindering diagnosis. Current deep learning for Metal Artifact Reduction (MAR) faces limitations: supervised methods suf...
MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation : Abstract: Semantic segmentation is a fundamental task in computer vision with wide-ranging applications, including autonomous driving and robotics. While RGB-based methods have achieved strong perform...
Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes : Abstract: Vision-centric autonomous driving systems rely on diverse and scalable training data to achieve robust performance. While video object editing offers a promising path for data augmentation, ...
ARM: A Learnable, Plug-and-Play Module for CLIP-based Open-vocabulary Semantic Segmentation : Abstract: Open-vocabulary semantic segmentation (OVSS) is fundamentally hampered by the coarse, image-level representations of CLIP, which lack precise pixel-level details. Existing training-free meth...
CorGi: Contribution-Guided Block-Wise Interval Caching for Training-Free Acceleration of Diffusion Transformers : Abstract: Diffusion transformer (DiT) achieves remarkable performance in visual generation, but its iterative denoising process combined with larger capacity leads to a high inference cost. Recent wor...
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models : Abstract: While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to subop...
Bayesian Self-Distillation for Image Classification : Abstract: Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-...
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset : Abstract: We present IMDD-1M, the first large-scale Industrial Multimodal Defect Dataset comprising 1,000,000 aligned image-text pairs, designed to advance multimodal learning for manufacturing and qu...
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning : Abstract: Recent studies have demonstrated significant progress in aligning text-to-image diffusion models with human preference via Reinforcement Learning from Human Feedback. However, while existing...
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation : Abstract: Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning ...
Guided Diffusion-based Generation of Adversarial Objects for Real-World Monocular Depth Estimation Attacks : Abstract: Monocular Depth Estimation (MDE) serves as a core perception module in autonomous driving systems, but it remains highly susceptible to adversarial attacks. Errors in depth estimation may pr...
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation : Abstract: Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effectiv...
RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention : Abstract: In video and image generation tasks, Diffusion Transformer (DiT) models incur extremely high computational costs due to attention mechanisms, which limits their practical applications. Furth...
Balanced Hierarchical Contrastive Learning with Decoupled Queries for Fine-grained Object Detection in Remote Sensing Images : Abstract: Fine-grained remote sensing datasets often use hierarchical label structures to differentiate objects in a coarse-to-fine manner, with each object annotated across multiple levels. However, ...
Neighbor-aware Instance Refining with Noisy Labels for Cross-Modal Retrieval : Abstract: In recent years, Cross-Modal Retrieval (CMR) has made significant progress in the field of multi-modal analysis. However, since it is time-consuming and labor-intensive to collect large-scal...
Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising : Abstract: Image denoising is an important problem in low-level vision and serves as a critical module for many image recovery tasks. Anisotropic diffusion is a wide family of image denoising approache...
Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression : Abstract: Recent advances in 2D Gaussian Splatting (2DGS) have demonstrated its potential as a compact image representation with millisecond-level decoding. However, existing 2DGS-based pipelines allo...
FitControler: Toward Fit-Aware Virtual Try-On : Abstract: Realistic virtual try-on (VTON) concerns not only faithful rendering of garment details but also coordination of the style. Prior art typically pursues the former, but neglects a key factor ...
On Exact Editing of Flow-Based Diffusion Models : Abstract: Recent methods in flow-based diffusion editing have enabled direct transformations between source and target image distribution without explicit inversion. However, the latent trajectories i...
Bridging the Perception-Cognition Gap:Re-engineering SAM2 with Hilbert-Mamba for Robust VLM-based Medical Diagnosis : Abstract: Recent studies suggest that Visual Language Models (VLMs) hold great potential for tasks such as automated medical diagnosis. However, processing complex three-dimensional (3D) multimodal me...
Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge : Abstract: This work presents a novel pipeline to recover the 3D structure of an unknown target spacecraft from a sequence of images captured during Rendezvous and Proximity Operations (RPO) in space. ...
Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation : Abstract: Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as ...
GCA-ResUNet: Medical Image Segmentation Using Grouped Coordinate Attention : Abstract: Accurate segmentation of heterogeneous anatomical structures is pivotal for computer-aided diagnosis and subsequent clinical decision-making. Although U-Net based convolutional neural networ...
Anomaly detection in satellite imagery through temporal inpainting : Abstract: Detecting surface changes from satellite imagery is critical for rapid disaster response and environmental monitoring, yet remains challenging due to the complex interplay between atmospheri...
DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation : Abstract: This paper presents an effective solution for view extrapolation in autonomous driving scenarios. Recent approaches focus on generating shifted novel view images from given viewpoints using ...
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models : Abstract: The rapid evolution of Text-to-Video (T2V) diffusion models has driven remarkable advancements in generating high-quality, temporally coherent videos from natural language descriptions. Desp...
U-Net-Like Spiking Neural Networks for Single Image Dehazing : Abstract: Image dehazing is a critical challenge in computer vision, essential for enhancing image clarity in hazy conditions. Traditional methods often rely on atmospheric scattering models, while re...
Kinematic-Based Assessment of Surgical Actions in Microanastomosis : Abstract: Proficiency in microanastomosis is a critical surgical skill in neurosurgery, where the ability to precisely manipulate fine instruments is crucial to successful outcomes. These procedures r...
Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation : Abstract: Cross-view geo-localisation (CVGL) aims to estimate the geographic location of a query image by matching it with images from a large-scale database. However, the significant view-point discr...
MGML: A Plug-and-Play Meta-Guided Multi-Modal Learning Framework for Incomplete Multimodal Brain Tumor Segmentation : Abstract: Leveraging multimodal information from Magnetic Resonance Imaging (MRI) plays a vital role in lesion segmentation, especially for brain tumors. However, in clinical practice, multimodal MRI ...
Learning to learn skill assessment for fetal ultrasound scanning : Abstract: Traditionally, ultrasound skill assessment has relied on expert supervision and feedback, a process known for its subjectivity and time-intensive nature. Previous works on quantitative and a...
Scaling Remote Sensing Foundation Models: Data Domain Tradeoffs at the Peta-Scale : Abstract: We explore the scaling behaviors of artificial intelligence to establish practical techniques for training foundation models on high-resolution electro-optical (EO) datasets that exceed the ...
MRI-to-CT Synthesis With Cranial Suture Segmentations Using A Variational Autoencoder Framework : Abstract: Quantifying normative pediatric cranial development and suture ossification is crucial for diagnosing and treating growth-related cephalic disorders. Computed tomography (CT) is widely used ...
Pretraining Frame Preservation in Autoregressive Video Memory Compression : Abstract: We present PFP, a neural network structure to compress long videos into short contexts, with an explicit pretraining objective to preserve the high-frequency details of single frames at arbi...
Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments : Abstract: Accurate Monocular Depth Estimation (MDE) is critical for robotic surgery but remains fragile in specular, fluid-filled endoscopic environments. Existing self-supervised methods, typically r...
CascadeNS: Confidence-Cascaded Neurosymbolic Model for Sarcasm Detection : Abstract: Sarcasm detection in product reviews requires balancing domain-specific symbolic pattern recognition with deep semantic understanding. Symbolic representations capture explicit linguistic ph...
Large language models and the entropy of English : Abstract: We use large language models (LLMs) to uncover long-ranged structure in English texts from a variety of sources. The conditional entropy or code length in many cases continues to decrease wi...
CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement : Abstract: Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tuning and perform poorly under dom...
Vibe Coding, Interface Flattening : Abstract: Large language models are reshaping programming by enabling 'vibe coding': the development of softwares through natural-language interaction with model-driven toolchains. This article argues...
Quantum Visual Word Sense Disambiguation: Unraveling Ambiguities Through Quantum Inference Model : Abstract: Visual word sense disambiguation focuses on polysemous words, where candidate images can be easily confused. Traditional methods use classical probability to calculate the likelihood of an i...
MAMA-Memeia! Multi-Aspect Multi-Agent Collaboration for Depressive Symptoms Identification in Memes : Abstract: Over the past years, memes have evolved from being exclusively a medium of humorous exchanges to one that allows users to express a range of emotions freely and easily. With the ever-growing...
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts : Abstract: Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mec...
Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability : Abstract: Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such mode...
Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models : Abstract: Large Language Models (LLMs) are demonstrating rapid improvements on complex reasoning benchmarks, particularly when allowed to utilize intermediate reasoning steps before converging on a fi...
Uncertainty-aware Semi-supervised Ensemble Teacher Framework for Multilingual Depression Detection : Abstract: Detecting depression from social media text is still a challenging task. This is due to different language styles, informal expression, and the lack of annotated data in many languages. To t...
BIOME-Bench: A Benchmark for Biomolecular Interaction Inference and Multi-Omics Pathway Mechanism Elucidation from Scientific Literature : Abstract: Multi-omics studies often rely on pathway enrichment to interpret heterogeneous molecular changes, but pathway enrichment (PE)-based workflows inherit structural limitations of pathway resou...
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models : Abstract: Evaluating the quality of multi-turn conversations is crucial for developing capable Large Language Models (LLMs), yet remains a significant challenge, often requiring costly human evaluatio...
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models : Abstract: We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on dis...
Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities : Abstract: We introduce the Korean Canonical Legal Benchmark (KCL), a benchmark designed to assess language models' legal reasoning capabilities independently of domain-specific knowledge. KCL provides...
HaluNet: Multi-Granular Uncertainty Modeling for Efficient Hallucination Detection in LLM Question Answering : Abstract: Large Language Models (LLMs) excel at question answering (QA) but often generate hallucinations, including factual errors or fabricated content. Detecting hallucinations from internal uncert...
Safe in the Future, Dangerous in the Past: Dissecting Temporal and Linguistic Vulnerabilities in LLMs : Abstract: As Large Language Models (LLMs) integrate into critical global infrastructure, the assumption that safety alignment transfers zero-shot from English to other languages remains a dangerous bl...
Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech : Abstract: Automatic speech transcripts are often delivered as unstructured word streams that impede readability and repurposing. We recast paragraph segmentation as the missing structuring step and fi...
IELTS Writing Revision Platform with Automated Essay Scoring and Adaptive Feedback : Abstract: This paper presents the design, development, and evaluation of a proposed revision platform assisting candidates for the International English Language Testing System (IELTS) writing exam. T...
Cleaning English Abstracts of Scientific Publications : Abstract: Scientific abstracts are often used as proxies for the content and thematic focus of research publications. However, a significant share of published abstracts contains extraneous informatio...
World model inspired sarcasm reasoning with large language model agents : Abstract: Sarcasm understanding is a challenging problem in natural language processing, as it requires capturing the discrepancy between the surface meaning of an utterance and the speaker's intentio...
QianfanHuijin Technical Report: A Novel Multi-Stage Training Paradigm for Finance Industrial LLMs : Abstract: Domain-specific enhancement of Large Language Models (LLMs) within the financial context has long been a focal point of industrial application. While previous models such as BloombergGPT and...
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking : Abstract: Complex reasoning problems often involve implicit spatial, geometric, and structural relationships that are not explicitly encoded in text. While recent reasoning models have achieved strong...
Automated Analysis of Sustainability Reports: Using Large Language Models for the Extraction and Prediction of EU Taxonomy-Compliant KPIs : Abstract: The manual, resource-intensive process of complying with the EU Taxonomy presents a significant challenge for companies. While Large Language Models (LLMs) offer a path to automation, resear...
Tracing the Flow of Knowledge From Science to Technology Using Deep Learning : Abstract: We develop a language similarity model suitable for working with patents and scientific publications at the same time. In a horse race-style evaluation, we subject eight language (similarity...
LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring : Abstract: Automated Essay Scoring (AES) has gained increasing attention in recent years, yet research on Arabic AES remains limited due to the lack of publicly available datasets. To address this, we ...
MedKGI: Iterative Differential Diagnosis with Medical Knowledge Graphs and Information-Guided Inquiring : Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant promise in clinical diagnosis. However, current models struggle to emulate the iterative, diagnostic hypothe...
Training Report of TeleChat3-MoE : Abstract: TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,tr...
Large Emotional World Model : Abstract: World Models serve as tools for understanding the current state of the world and predicting its future dynamics, with broad application potential across numerous fields. As a key component o...
Activation Steering for Masked Diffusion Language Models : Abstract: Masked diffusion language models (MDLMs) generate text through an iterative denoising process. They have recently gained attention due to mask-parallel decoding and competitive performance w...
HY-MT1.5 Technical Report : Abstract: In this report, we introduce our latest translation models, HY-MT1.5-1.8B and HY-MT1.5-7B, a new family of machine translation models developed through a holistic training framework tailored...
WISE: Web Information Satire and Fakeness Evaluation : Abstract: Distinguishing fake or untrue news from satire or humor poses a unique challenge due to their overlapping linguistic features and divergent intent. This study develops WISE (Web Information ...
CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards : Abstract: Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly ann...
Disentangling Learning from Judgment: Representation Learning for Open Response Analytics : Abstract: Open-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates conten...
MiMo-Audio: Audio Language Models are Few-Shot Learners : Abstract: Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few...
Emergent World Beliefs: Exploring Transformers in Stochastic Games : Abstract: Transformer-based large language models (LLMs) have demonstrated strong reasoning abilities across diverse fields, from solving programming challenges to competing in strategy-intensive game...
Noise-Driven Persona Formation in Reflexive Neural Language Generation : Abstract: This paper introduces the Luca-Noise Reflex Protocol (LN-RP), a computational framework for analyzing noise-driven persona emergence in large language models. By injecting stochastic noise s...
PharmaShip: An Entity-Centric, Reading-Order-Supervised Benchmark for Chinese Pharmaceutical Shipping Documents : Abstract: We present PharmaShip, a real-world Chinese dataset of scanned pharmaceutical shipping documents designed to stress-test pre-trained text-layout models under noisy OCR and heterogeneous temp...
CAT: A Metric-Driven Framework for Analyzing the Consistency-Accuracy Relation of LLMs under Controlled Input Variations : Abstract: We introduce \textsc{CAT}, a framework designed to evaluate and visualize the \emph{interplay} of \emph{accuracy} and \emph{response consistency} of Large Language Models (LLMs) under contro...
Automatic identification of diagnosis from hospital discharge letters via weakly-supervised Natural Language Processing : Abstract: Identifying patient diagnoses from discharge letters is essential to enable large-scale cohort selection and epidemiological research, but traditional supervised approaches rely on extensive...
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs : Abstract: Spurious bias, a tendency to exploit spurious correlations between superficial input attributes and prediction targets, has revealed a severe robustness pitfall in classical machine learning...
Symmetric Linear Bandits with Hidden Symmetry : Abstract: High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the li...
Myopically Verifiable Probabilistic Certificates for Safe Control and Learning : Abstract: This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invar...
Distribution-Dependent Rates for Multi-Distribution Learning : Abstract: To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a var...
Multi-fidelity Bayesian Optimization: A Review : Abstract: Resided at the intersection of multi-fidelity optimization (MFO) and Bayesian optimization (BO), MF BO has found a niche in solving expensive engineering design optimization problems, thanks...
Content-based Recommendation Engine for Video Streaming Platform : Abstract: Recommendation engines suggest content, products, or services to the user by using machine learning algorithms. This paper proposes a content-based recommendation engine that provides person...
Generative Modelling of L\'evy Area for High Order SDE Simulation : Abstract: It is well understood that, when numerically simulating SDEs with general noise, achieving a strong convergence rate better than $O(\sqrt{h})$ (where h is the step size) requires the use of ...
Machine learning for option pricing: an empirical investigation of network architectures : Abstract: We consider the supervised learning problem of learning the price of an option or the implied volatility given appropriate input data (model parameters) and corresponding output data (option...
Efficient Active Learning with Abstention : Abstract: The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved i...
Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking : Abstract: Split Federated Learning (SFL) has emerged as an efficient alternative to traditional Federated Learning (FL) by reducing client-side computation through model partitioning. However, exchang...
Revisiting Agnostic Boosting : Abstract: Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-str...
Private Linear Regression with Differential Privacy and PAC Privacy : Abstract: Linear regression is a fundamental tool for statistical analysis, which has motivated the development of linear regression methods that satisfy provable privacy guarantees so that the learne...
The Generalization Error of Supervised Machine Learning Algorithms : Abstract: In this paper, the method of gaps, a technique for deriving closed-form expressions in terms of information measures for the generalization error of supervised machine learning algorithms is...
Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching : Abstract: Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectific...
UnPaSt: unsupervised patient stratification by biclustering of omics data : Abstract: Unsupervised patient stratification is essential for disease subtype discovery, yet, despite growing evidence of molecular heterogeneity of non-oncological diseases, popular methods are benc...
Jacobian-Enhanced Neural Networks : Abstract: Jacobian-Enhanced Neural Networks (JENN) are densely connected multi-layer perceptrons, whose training process is modified to predict partial derivatives accurately. Their main benefit is be...
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling : Abstract: In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a nov...
HiGen: Hierarchical Graph Generative Networks : Abstract: Most real-world graphs exhibit a hierarchical structure, which is often overlooked by existing graph generation methods. To address this limitation, we propose a novel graph generative netwo...
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing : Abstract: We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill...
Active Learning with Neural Networks: Insights from Nonparametric Statistics : Abstract: Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the ...
Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions : Abstract: A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is...
Reliable and Resilient Collective Communication Library for LLM Training and Serving : Abstract: Modern ML training and inference now span tens to tens of thousands of GPUs, where network faults can waste 10--15\% of GPU hours due to slow recovery. Common network errors and link fluctua...
Convergence of the generalization error for deep gradient flow methods for PDEs : Abstract: The aim of this article is to provide a firm mathematical foundation for the application of deep gradient flow methods (DGFMs) for the solution of (high-dimensional) partial differential equ...
Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis : Abstract: We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While...
ProDM: Synthetic Reality-driven Property-aware Progressive Diffusion Model for Coronary Calcium Motion Correction in Non-gated Chest CT : Abstract: Coronary artery calcium (CAC) scoring from chest CT is a well-established tool to stratify and refine clinical cardiovascular disease risk estimation. CAC quantification relies on the accura...
Adaptive Dependency-aware Prompt Optimization Framework for Multi-Step LLM Pipeline : Abstract: Multi-step LLM pipelines invoke large language models multiple times in a structured sequence and can effectively solve complex tasks, but their performance heavily depends on the prompts us...
Are First-Order Diffusion Samplers Really Slower? A Fast Forward-Value Approach : Abstract: Higher-order ODE solvers have become a standard tool for accelerating diffusion probabilistic model (DPM) sampling, motivating the widespread view that first-order methods are inherently slo...
Learning Temporally Consistent Turbulence Between Sparse Snapshots via Diffusion Models : Abstract: We investigate the statistical accuracy of temporally interpolated spatiotemporal flow sequences between sparse, decorrelated snapshots of turbulent flow fields using conditional Denoising D...
Limits of quantum generative models with classical sampling hardness : Abstract: Sampling tasks have been successful in establishing quantum advantages both in theory and experiments. This has fueled the use of quantum computers for generative modeling to create samples ...
Nonlinear Noise2Noise for Efficient Monte Carlo Denoiser Training : Abstract: The Noise2Noise method allows for training machine learning-based denoisers with pairs of input and target images where both the input and target can be noisy. This removes the need for trai...
Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation : Abstract: Deep neural networks (DNNs) remain vulnerable to adversarial attacks that cause misclassification when specific perturbations are added to input images. This vulnerability also threatens the...
Sparse Offline Reinforcement Learning with Corruption Robustness : Abstract: We investigate robustness to strong data corruption in offline sparse reinforcement learning (RL). In our setting, an adversary may arbitrarily perturb a fraction of the collected trajectori...
Fairness-Aware Insurance Pricing: A Multi-Objective Optimization Approach : Abstract: Machine learning improves predictive accuracy in insurance pricing but exacerbates trade-offs between competing fairness criteria across different discrimination measures, challenging regula...
A New Decomposition Paradigm for Graph-structured Nonlinear Programs via Message Passing : Abstract: We study finite-sum nonlinear programs whose decision variables interact locally according to a graph or hypergraph. We propose MP-Jacobi (Message Passing-Jacobi), a graph-compliant decentra...
Soliton profiles: Classical Numerical Schemes vs. Neural Network - Based Solvers : Abstract: We present a comparative study of classical numerical solvers, such as Petviashvili's method or finite difference with Newton iterations, and neural network-based methods for computing groun...
3D Semantic Segmentation for Post-Disaster Assessment : Abstract: The increasing frequency of natural disasters poses severe threats to human lives and leads to substantial economic losses. While 3D semantic segmentation is crucial for post-disaster assess...
MultiRisk: Multiple Risk Control via Iterative Score Thresholding : Abstract: As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a light...
Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning : Abstract: We propose a novel framework for risk-sensitive reinforcement learning (RSRL) that incorporates robustness against transition uncertainty. We define two distinct yet coupled risk measures: a...
Probabilistic Computers for Neural Quantum States : Abstract: Neural quantum states efficiently represent many-body wavefunctions with neural networks, but the cost of Monte Carlo sampling limits their scaling to large system sizes. Here we address thi...
A Graph Neural Network with Auxiliary Task Learning for Missing PMU Data Reconstruction : Abstract: In wide-area measurement systems (WAMS), phasor measurement unit (PMU) measurement is prone to data missingness due to hardware failures, communication delays, and cyber-attacks. Existing da...
Improving the stability of the covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling : Abstract: Stochastic gradient Langevin dynamics and its variants approximate the likelihood of an entire dataset, via random (and typically much smaller) subsets, in the setting of Bayesian sampling. ...
Spectral and Spatial Graph Learning for Multispectral Solar Image Compression : Abstract: High-fidelity compression of multispectral solar imagery remains challenging for space missions, where limited bandwidth must be balanced against preserving fine spectral and spatial details...
Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features : Abstract: Large data-driven physics models like DeepMind's weather model GraphCast have empirically succeeded in parameterizing time operators for complex dynamical systems with an accuracy reaching o...
Virasoro Symmetry in Neural Network Field Theories : Abstract: Neural Network Field Theories (NN-FTs) can realize global conformal symmetries via embedding space architectures. These models describe Generalized Free Fields (GFFs) in the infinite width l...
Implicit score matching meets denoising score matching: improved rates of convergence and log-density Hessian estimation : Abstract: We study the problem of estimating the score function using both implicit score matching and denoising score matching. Assuming that the data distribution exhibiting a low-dimensional struct...
Deep Learning in Geotechnical Engineering: A Critical Assessment of PINNs and Operator Learning : Abstract: Deep learning methods -- physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) -- are increasingly proposed for geotechnical problem...
OptiVote: Non-Coherent FSO Over-the-Air Majority Vote for Communication-Efficient Distributed Federated Learning in Space Data Centers : Abstract: The rapid deployment of mega-constellations is driving the long-term vision of space data centers (SDCs), where interconnected satellites form in-orbit distributed computing and learning inf...
Topological Spatial Graph Coarsening : Abstract: Spatial graphs are particular graphs for which the nodes are localized in space (e.g., public transport network, molecules, branching biological structures). In this work, we consider the pr...
MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems : Abstract: Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing busine...
Fast reconstruction-based ROI triggering via anomaly detection in the CYGNO optical TPC : Abstract: Optical-readout Time Projection Chambers (TPCs) produce megapixel-scale images whose fine-grained topological information is essential for rare-event searches, but whose size challenges real...
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning : Abstract: A fine-grained data recipe is crucial for pre-training large language models, as it can significantly enhance training efficiency and model performance. One important ingredient in the recip...
MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model : Abstract: In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization...
Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19 : Abstract: The challenge of imbalanced data is prominent in medical image classification. This challenge arises when there is a significant disparity in the number of images belonging to a particular c...
Guiding a Diffusion Transformer with the Internal Dynamics of Itself : Abstract: The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probabil...
Variational Quantum Brushes : Abstract: Quantum brushes are computational arts software introduced by Ferreira et al (2025) that leverage quantum behavior to generate novel artistic effects. In this outreach paper, we introduce th...
Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges : Abstract: Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory. While foundation models pre-trained on large remote sensing dat...
Score-based sampling without diffusions: Guidance from a simple and modular scheme : Abstract: Sampling based on score diffusions has led to striking empirical results, and has attracted considerable attention from various research communities. It depends on availability of (approxima...
Quantitative Understanding of PDF Fits and their Uncertainties : Abstract: Parton Distribution Functions (PDFs) play a central role in describing experimental data at colliders and provide insight into the structure of nucleons. As the LHC enters an era of high-pre...
Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators : Abstract: In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, i...
Training a Huggingface Model on AWS Sagemaker (Without Tears) : Abstract: The development of Large Language Models (LLMs) has primarily been driven by resource-rich research groups and industry partners. Due to the lack of on-premise computing resources required f...
Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data : Abstract: This paper studies the policy mirror descent (PMD) method, which is a general policy optimization framework in reinforcement learning and can cover a wide range of policy gradient methods by...
RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress : Abstract: Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practi...
Fundamental limits for weighted empirical approximations of tilted distributions : Abstract: Consider the task of generating samples from a tilted distribution of a random vector whose underlying distribution is unknown, but samples from it are available. This finds applications in ...
Exploring the Potential of Spiking Neural Networks in UWB Channel Estimation : Abstract: Although existing deep learning-based Ultra-Wide Band (UWB) channel estimation methods achieve high accuracy, their computational intensity clashes sharply with the resource constraints of l...
Implicit geometric regularization in flow matching via density weighted Stein operators : Abstract: Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In h...
Statistical Guarantees in the Search for Less Discriminatory Algorithms : Abstract: Recent scholarship has argued that firms building data-driven decision systems in high-stakes domains like employment, credit, and housing should search for "less discriminatory algorithms" ...
Assessing generative modeling approaches for free energy estimates in condensed matter : Abstract: The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple interm...
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration : Abstract: Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function appr...
Tensor Computing Interface: An Application-Oriented, Lightweight Interface for Portable High-Performance Tensor Network Applications : Abstract: Tensor networks (TNs) are a central computational tool in quantum science and artificial intelligence. However, the lack of unified software interface across tensor-computing frameworks seve...
Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs : Abstract: This research project addresses the errors of financial numerical reasoning Question Answering (QA) tasks due to the lack of domain knowledge in finance. Despite recent advances in Large Lan...
A Test of Lookahead Bias in LLM Forecasts : Abstract: We develop a statistical test to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using state-of-the-art pre-training data detection techniques, we esti...
Deep learning methods for inverse problems using connections between proximal operators and Hamilton-Jacobi equations : Abstract: Inverse problems are important mathematical problems that seek to recover model parameters from noisy data. Since inverse problems are often ill-posed, they require regularization or incorpo...
Energy-Tweedie: Score meets Score, Energy meets Energy : Abstract: Denoising and score estimation have long been known to be linked via the classical Tweedie's formula. In this work, we first extend the latter to a wider range of distributions often called ...
Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting : Abstract: Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under t...
Governing Cloud Data Pipelines with Agentic AI : Abstract: Cloud data pipelines increasingly operate under dynamic workloads, evolving schemas, cost constraints, and strict governance requirements. Despite advances in cloud-native orchestration fram...
Spike-Timing-Dependent Plasticity for Bernoulli Message Passing : Abstract: Bayesian inference provides a principled framework for understanding brain function, while neural activity in the brain is inherently spike-based. This paper bridges these two perspectives b...
Scaling Open-Ended Reasoning to Predict the Future : Abstract: High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up...
Many Minds from One Model: Bayesian Transformers for Population Intelligence : Abstract: Despite their scale and success, modern transformers are almost universally trained as single-minded systems: optimization produces one deterministic set of parameters, representing a single...
On the geometry and topology of representations: the manifolds of modular addition : Abstract: The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield...
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning : Abstract: Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grape...
Diffusion Language Models are Provably Optimal Parallel Samplers : Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for thi...
Efficiently Estimating Data Efficiency for Language Model Fine-tuning : Abstract: While large language models (LLMs) demonstrate reasonable zero-shot capability across many downstream tasks, fine-tuning is a common practice to improve their performance. However, a task's ...
Attribution-Guided Distillation of Matryoshka Sparse Autoencoders : Abstract: Sparse autoencoders (SAEs) aim to disentangle model activations into monosemantic, human-interpretable features. In practice, learned features are often redundant and vary across training ru...
Frequent subgraph-based persistent homology for graph classification : Abstract: Persistent homology (PH) has recently emerged as a powerful tool for extracting topological features. Integrating PH into machine learning and deep learning models enhances topology awarenes...
Spectral Graph Neural Networks for Cognitive Task Classification in fMRI Connectomes : Abstract: Cognitive task classification using machine learning plays a central role in decoding brain states from neuroimaging data. By integrating machine learning with brain network analysis, comple...
PRISM: A hierarchical multiscale approach for time series forecasting : Abstract: Forecasting is critical in areas such as finance, biology, and healthcare. Despite the progress in the field, making accurate forecasts remains challenging because real-world time series con...
Characterization of Transfer Using Multi-task Learning Curves : Abstract: Transfer effects manifest themselves both during training using a fixed data set and in inductive inference using accumulating data. We hypothesize that perturbing the data set by including ...
AODDiff: Probabilistic Reconstruction of Aerosol Optical Depth via Diffusion-based Bayesian Inference : Abstract: High-quality reconstruction of Aerosol Optical Depth (AOD) fields is critical for Atmosphere monitoring, yet current models remain constrained by the scarcity of complete training data and a...
Discovering Coordinated Joint Options via Inter-Agent Relative Dynamics : Abstract: Temporally extended actions improve the ability to explore and plan in single-agent settings. In multi-agent settings, the exponential growth of the joint state space with the number of agen...
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback : Abstract: Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes tra...
DTI-GP: Bayesian operations for drug-target interactions using deep kernel Gaussian processes : Abstract: Precise probabilistic information about drug-target interaction (DTI) predictions is vital for understanding limitations and boosting predictive performance. Gaussian processes (GP) offer a ...
Self-Supervised Neural Architecture Search for Multimodal Deep Neural Networks : Abstract: Neural architecture search (NAS), which automates the architectural design process of deep neural networks (DNN), has attracted increasing attention. Multimodal DNNs that necessitate feature...
Gradient Descent as Implicit EM in Distance-Based Neural Models : Abstract: Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. Th...
From Trial to Deployment: A SEM Analysis of Traveler Adoptions to Fully Operational Autonomous Taxis : Abstract: Autonomous taxi services represent a transformative advancement in urban mobility, offering safety, efficiency, and round-the-clock operations. While existing literature has explored user ac...
FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference : Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of language processing tasks. However, this success comes at the cost of substantial computation and...
Causal Discovery with Mixed Latent Confounding via Precision Decomposition : Abstract: We study causal discovery from observational data in linear Gaussian systems affected by \emph{mixed latent confounding}, where some unobserved factors act broadly across many variables whil...
Mobility-Assisted Decentralized Federated Learning: Convergence Analysis and A Data-Driven Approach : Abstract: Decentralized Federated Learning (DFL) has emerged as a privacy-preserving machine learning paradigm that enables collaborative training among users without relying on a central server. Howe...
HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs : Abstract: Heterogeneous graph neural networks (HGNNs) have achieved strong performance in many real-world applications, yet targeted backdoor poisoning on heterogeneous graphs remains less studied. We...
A Scalable Framework for logP Prediction: From Terabyte-Scale Data Integration to Interpretable Ensemble Modeling : Abstract: This study presents a large-scale predictive modeling framework for logP prediction using 426850 bioactive compounds rigorously curated from the intersection of three authoritative chemical ...
CPR: Causal Physiological Representation Learning for Robust ECG Analysis under Distribution Shifts : Abstract: Deep learning models for Electrocardiogram (ECG) diagnosis have achieved remarkable accuracy but exhibit fragility against adversarial perturbations, particularly Smooth Adversarial Perturba...
From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme : Abstract: Generating humorous memes is a challenging multimodal task that moves beyond direct image-to-caption supervision. It requires a nuanced reasoning over visual content, contextual cues, and su...
Generalising E-prop to Deep Networks : Abstract: Recurrent networks are typically trained with backpropagation through time (BPTT). However, BPTT requires storing the history of all states in the network and then replaying them sequentiall...
Generative forecasting with joint probability models : Abstract: Chaotic dynamical systems exhibit strong sensitivity to initial conditions and often contain unresolved multiscale processes, making deterministic forecasting fundamentally limited. Generati...
Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics : Abstract: Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. Wh...
Sparse classification with positive-confidence data in high dimensions : Abstract: High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While establish...
Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models : Abstract: Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behavior. Flexible I...
Lifting Vision: Ground to Aerial Localization with Reasoning Guided Planning : Abstract: Multimodal intelligence development recently show strong progress in visual understanding and high level reasoning. Though, most reasoning system still reply on textual information as the ma...
Early Prediction of Sepsis using Heart Rate Signals and Genetic Optimized LSTM Algorithm : Abstract: Sepsis, characterized by a dysregulated immune response to infection, results in significant mortality, morbidity, and healthcare costs. The timely prediction of sepsis progression is crucia...
Micro-Macro Tensor Neural Surrogates for Uncertainty Quantification in Collisional Plasma : Abstract: Plasma kinetic equations exhibit pronounced sensitivity to microscopic perturbations in model parameters and data, making reliable and efficient uncertainty quantification (UQ) essential for...
Paired Seed Evaluation: Statistical Reliability for Learning-Based Simulators : Abstract: Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across executions. Learning-based simulat...
Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction : Abstract: While conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. Although exact distribution-free con...
Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study : Abstract: This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an a...
Multi-Scenario Highway Lane-Change Intention Prediction: A Temporal Physics-Informed Multi-Modal Framework : Abstract: Lane-change intention prediction is safety-critical for autonomous driving and ADAS, but remains difficult in naturalistic traffic due to noisy kinematics, severe class imbalance, and limite...
Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning : Abstract: We consider the design of mixing matrices to minimize the operation cost for decentralized federated learning (DFL) in wireless networks, with focus on minimizing the maximum per-node energy...
How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns : Abstract: Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to...
Hyperspherical Graph Representation Learning via Adaptive Neighbor-Mean Alignment and Uniformity : Abstract: Graph representation learning (GRL) aims to encode structural and semantic dependencies of graph-structured data into low-dimensional embeddings. However, existing GRL methods often rely on ...
Information-Theoretic Quality Metric of Low-Dimensional Embeddings : Abstract: In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, ran...
Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems : Abstract: Generative artificial intelligence (GenAI) is shifting from conversational assistants toward agentic systems -- autonomous decision-making systems that sense, decide, and act within operatio...
DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks : Abstract: Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models a...
Improved Balanced Classification with Theoretically Grounded Loss Functions : Abstract: The balanced loss is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes...
Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias : Abstract: Conventional deep learning prioritizes unconstrained optimization, yet biological systems operate under strict metabolic constraints. We propose that these physical constraints shape dynamic...
Rethinking Dense Linear Transformations: Stagewise Pairwise Mixing (SPM) for Near-Linear Training in Neural Networks : Abstract: Dense linear layers are a dominant source of computational and parametric cost in modern machine learning models, despite their quadratic complexity and often being misaligned with the compo...
Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR : Abstract: Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for ef...
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding : Abstract: Speculative decoding improves LLM inference by generating and verifying multiple tokens in parallel, but existing systems suffer from suboptimal performance due to a mismatch between dynamic...
Flow Matching Neural Processes : Abstract: Neural processes (NPs) are a class of models that learn stochastic processes directly from data and can be used for inference, sampling and conditional sampling. We introduce a new NP model ...
Trellis: Learning to Compress Key-Value Memory in Attention Models : Abstract: Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Tran...
Exploiting the Prior of Generative Time Series Imputation : Abstract: Time series imputation, i.e., filling the missing values of a time recording, finds various applications in electricity, finance, and weather modelling. Previous methods have introduced gene...
MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling : Abstract: State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to in...
TabMixNN: A Unified Deep Learning Framework for Structural Mixed Effects Modeling on Tabular Data : Abstract: We present TabMixNN, a flexible PyTorch-based deep learning framework that synthesizes classical mixed-effects modeling with modern neural network architectures for tabular data analysis. Ta...
A Granular Grassmannian Clustering Framework via the Schubert Variety of Best Fit : Abstract: In many classification and clustering tasks, it is useful to compute a geometric representative for a dataset or a cluster, such as a mean or median. When datasets are represented by subspac...
Exploring Cumulative Effects in Survival Data Using Deep Learning Networks : Abstract: In epidemiological research, modeling the cumulative effects of time-dependent exposures on survival outcomes presents a challenge due to their intricate temporal dynamics. Conventional spli...
Neural Optimal Design of Experiment for Inverse Problems : Abstract: We introduce Neural Optimal Design of Experiments, a learning-based framework for optimal experimental design in inverse problems that avoids classical bilevel optimization and indirect spar...
Learning Coupled System Dynamics under Incomplete Physical Constraints and Missing Data : Abstract: Advances in data acquisition and computational methods have accelerated the use of differential equation based modelling for complex systems. Such systems are often described by coupled (or ...
A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios : Abstract: For complex simulation problems, inferring parameters of scientific interest often precludes the use of classical likelihood-based techniques due to intractable likelihood functions. Simulat...
A Comprehensive Study of Deep Learning Model Fixing Approaches : Abstract: Deep Learning (DL) has been widely adopted in diverse industrial domains, including autonomous driving, intelligent healthcare, and aided programming. Like traditional software, DL systems a...
Network Traffic Analysis with Process Mining: The UPSIDE Case Study : Abstract: Online gaming is a popular activity involving the adoption of complex systems and network infrastructures. The relevance of gaming, which generates large amounts of market revenue, drove res...
Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice : Abstract: Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-bas...
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation : Abstract: We introduce Bielik 7B v0.1, a 7-billion-parameter generative text model for Polish language processing. Trained on curated Polish corpora, this model addresses key challenges in language mo...
A Systematic Survey on Large Language Models for Algorithm Design : Abstract: Algorithm design is crucial for effective problem-solving across various domains. The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this fi...
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities : Abstract: Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As...
Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems : Abstract: This paper presents a novel online transfer learning approach in state-based potential games (TL-SbPGs) for distributed self-optimization in manufacturing systems. The approach targets pract...
LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models : Abstract: Temporal Reasoning (TR) is a critical ability for LLMs to understand and reason over temporal information and relationships between events. To study the TR ability in LLMs, prior works provi...
FEDSTR: Money-In AI-Out | A Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol : Abstract: The NOSTR is a communication protocol for the social web, based on the w3c websockets standard. Although it is still in its infancy, it is well known as a social media protocol, with thousan...
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time : Abstract: We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the...
Coordinated Humanoid Manipulation with Choice Policies : Abstract: Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We pr...
Vulcan: Instance-Optimal Systems Heuristics Through LLM-Driven Search : Abstract: Resource-management tasks in modern operating and distributed systems continue to rely primarily on hand-designed heuristics for tasks such as scheduling, caching, or active queue management...
AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG : Abstract: Retrieval-augmented generation (RAG) is highly sensitive to the quality of selected context, yet standard top-k retrieval often returns redundant or near-duplicate chunks that waste token bu...
Generative Classifiers Avoid Shortcut Solutions : Abstract: Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stems from an overreliance on feat...
Modeling Language as a Sequence of Thoughts : Abstract: Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occurrence statistics, they fail ...
Classifying long legal documents using short random chunks : Abstract: Classifying legal documents is a challenge, besides their specialized vocabulary, sometimes they can be very long. This means that feeding full documents to a Transformers-based models for c...
DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments : Abstract: Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet...
A Modal Logic for Possibilistic Reasoning with Fuzzy Formal Contexts : Abstract: We introduce a two-sort weighted modal logic for possibilistic reasoning with fuzzy formal contexts. The syntax of the logic includes two types of weighted modal operators corresponding to c...
SymSeqBench: a unified framework for the generation and analysis of rule-based symbolic sequences and datasets : Abstract: Sequential structure is a key feature of multiple domains of natural cognition and behavior, such as language, movement and decision-making. Likewise, it is also a central property of tasks ...
Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions : Abstract: Compressed deep learning models are crucial for deploying computer vision systems on resource-constrained devices. However, model compression may affect robustness, especially under natural ...
The Impact of LLMs on Online News Consumption and Production : Abstract: Large language models (LLMs) change how consumers acquire information online; their bots also crawl news publishers' websites for training data and to answer consumer queries; and they provi...
ShowUI-$\pi$: Flow-based Generative Models as GUI Dexterous Hands : Abstract: Building intelligent agents capable of dexterous manipulation is essential for achieving human-like automation in both robotics and digital environments. However, existing GUI agents rely on...
Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning : Abstract: Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most benefi...
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control : Abstract: Achieving provable stability in model-free reinforcement learning (RL) remains a challenge, particularly in balancing exploration with rigorous safety. This article introduces MSACL, a frame...
HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films : Abstract: Existing open-source film restoration methods show limited performance compared to commercial methods due to training with low-quality synthetic data and employing noisy optical flows. In ad...
RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment : Abstract: Search relevance plays a central role in web e-commerce. While large language models (LLMs) have shown significant results on relevance task, existing benchmarks lack sufficient complexity f...
AI-Driven Cloud Resource Optimization for Multi-Cluster Environments : Abstract: Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches r...
mHC: Manifold-Constrained Hyper-Connections : Abstract: Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and ...
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements : Abstract: Benchmarks play a crucial role in tracking the rapid advancement of large language models (LLMs) and identifying their capability boundaries. However, existing benchmarks predominantly curat...
Big AI is accelerating the metacrisis: What can we do? : Abstract: The world is in the grip of ecological, meaning, and language crises which are converging into a metacrisis. Big AI is accelerating them all. Language engineers are playing a central role, p...
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI : Abstract: Personalized AI agents rely on access to a user's digital footprint, which often includes sensitive data from private emails, chats and purchase histories. Yet this access creates a fundamen...
Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control : Abstract: Cross-modal systems trained on 2D visual inputs are presented with a dimensional shift when processing 3D scenes. An in-scene camera bridges the dimensionality gap but requires learning a co...
Practising responsibility: Ethics in NLP as a hands-on course : Abstract: As Natural Language Processing (NLP) systems become more pervasive, integrating ethical considerations into NLP education has become essential. However, this presents inherent challenges in ...
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories) : Abstract: Large language models (LLMs) have made rapid progress in formal theorem proving, yet current benchmarks under-measure the kind of abstraction and library-mediated reasoning that organizes mo...
HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment : Abstract: Slate recommendation, where users are presented with a ranked list of items simultaneously, is widely adopted in online platforms. Recent advances in generative models have shown promise in ...
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow : Abstract: Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate suc...
AstroReview: An LLM-driven Multi-Agent Framework for Telescope Proposal Peer Review and Refinement : Abstract: Competitive access to modern observatories has intensified as proposal volumes outpace available telescope time, making timely, consistent, and transparent peer review a critical bottleneck ...
LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving : Abstract: Real-world autonomous driving must adhere to complex human social rules that extend beyond legally codified traffic regulations. Many of these semantic constraints, such as yielding to emerg...
BandiK: Efficient Multi-Task Decomposition Using a Multi-Bandit Framework : Abstract: The challenge of effectively transferring knowledge across multiple tasks is of critical importance and is also present in downstream tasks with foundation models. However, the nature of tra...
Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting : Abstract: Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised ...
Nested Learning: The Illusion of Deep Learning Architectures : Abstract: Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, sel...
R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory : Abstract: We present R-Debater, an agentic framework for generating multi-turn debates built on argumentative memory. Grounded in rhetoric and memory studies, the system views debate as a process of r...
An Adaptive, Disentangled Representation for Multidimensional MRI Reconstruction : Abstract: We present a new approach for representing and reconstructing multidimensional magnetic resonance imaging (MRI) data. Our method builds on a novel, learned feature-based image representation...
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots : Abstract: Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous na...
Renormalization Group Guided Tensor Network Structure Search : Abstract: Tensor network structure search (TN-SS) aims to automatically discover optimal network topologies and rank configurations for efficient tensor decomposition in high-dimensional data represen...
Do Large Language Models Know What They Are Capable Of? : Abstract: We investigate whether large language models (LLMs) can predict whether they will succeed on a given task and whether their predictions improve as they progress through multi-step tasks. We ...
Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation : Abstract: Autonomous mobile robots operating in complex, dynamic environments face the dual challenge of navigating large-scale, structurally diverse spaces with static obstacles while safely interact...
DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information : Abstract: Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limita...
AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels : Abstract: Benign laryngeal voice disorders affect nearly one in five individuals and often manifest as dysphonia, while also serving as non-invasive indicators of broader physiological dysfunction. We...
AutoFed: Manual-Free Federated Traffic Prediction via Personalized Prompt : Abstract: Accurate traffic prediction is essential for Intelligent Transportation Systems, including ride-hailing, urban road planning, and vehicle fleet management. However, due to significant privac...
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space : Abstract: Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally...
Chat-Driven Optimal Management for Virtual Network Services : Abstract: This paper proposes a chat-driven network management framework that integrates natural language processing (NLP) with optimization-based virtual network allocation, enabling intuitive and re...
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time : Abstract: Large Language Models (LLMs) often rely on long chain-of-thought (CoT) reasoning to solve complex tasks. While effective, these trajectories are frequently inefficient, leading to high laten...
SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System : Abstract: Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. ...
Localized Calibrated Uncertainty in Code Language Models : Abstract: Large Language models (LLMs) can generate complicated source code from natural language prompts. However, LLMs can generate output that deviates from what the user wants, requiring supervisi...
More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization : Abstract: For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the...
Generative AI-enhanced Sector-based Investment Portfolio Construction : Abstract: This paper investigates how Large Language Models (LLMs) from leading providers (OpenAI, Google, Anthropic, DeepSeek, and xAI) can be applied to quantitative sector-based portfolio construct...
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice : Abstract: Data teams at frontier AI companies routinely train small proxy models to make critical decisions about pretraining data recipes for full-scale training runs. However, the community has a li...
Automated Classification of First-Trimester Fetal Heart Views Using Ultrasound-Specific Self-Supervised Learning : Abstract: Congenital heart disease remains the most common congenital anomaly and a leading cause of neonatal morbidity and mortality. Although first-trimester fetal echocardiography offers an opportu...
HOLOGRAPH: Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors : Abstract: Causal discovery from observational data remains fundamentally limited by identifiability constraints. Recent work has explored leveraging Large Language Models (LLMs) as sources of prior ca...
F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model : Abstract: With the advent of Generative AI, Single Image Super-Resolution (SISR) quality has seen substantial improvement, as the strong priors learned by Text-2-Image Diffusion (T2IDiff) Foundation M...
Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models : Abstract: The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the ...
Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations : Abstract: Semantic communications conveys task-relevant meaning rather than focusing solely on message reconstruction, improving bandwidth efficiency and robustness for next-generation wireless system...
PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression : Abstract: Transformer-based large language models (LLMs) have demonstrated remarkable potential across a wide range of practical applications. However, long-context inference remains a significant cha...
Comparing Approaches to Automatic Summarization in Less-Resourced Languages : Abstract: Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced langua...
Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack : Abstract: In this paper, we describe the automated simulation and reporting pipeline implemented for our autonomous racing stack, ur.autopilot. The backbone of the simulation is based on a high-fideli...
FAST-IDS: A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection in Connected and Autonomous Vehicles : Abstract: We have implemented a multi-stage IDS for CAVs that can be deployed to resourec-constrained environments after hybrid model compression.
Tubular Riemannian Laplace Approximations for Bayesian Neural Networks : Abstract: Laplace approximations are among the simplest and most practical methods for approximate Bayesian inference in neural networks, yet their Euclidean formulation struggles with the highly anis...
Skim-Aware Contrastive Learning for Efficient Document Representation : Abstract: Although transformer-based models have shown strong performance in word- and sentence-level tasks, effectively representing long documents, especially in fields like law and medicine, remain...
FedSecureFormer: A Fast, Federated and Secure Transformer Framework for Lightweight Intrusion Detection in Connected and Autonomous Vehicles : Abstract: This works presents an encoder-only transformer built with minimum layers for intrusion detection in the domain of Connected and Autonomous Vehicles using Federated Learning.
DermaVQA-DAS: Dermatology Assessment Schema (DAS) & Datasets for Closed-Ended Question Answering & Segmentation in Patient-Generated Dermatology Images : Abstract: Recent advances in dermatological image analysis have been driven by large-scale annotated datasets; however, most existing benchmarks focus on dermatoscopic images and lack patient-authored...
Empower Low-Altitude Economy: A Reliability-Aware Dynamic Weighting Allocation for Multi-modal UAV Beam Prediction : Abstract: The low-altitude economy (LAE) is rapidly expanding driven by urban air mobility, logistics drones, and aerial sensing, while fast and accurate beam prediction in uncrewed aerial vehicles (U...
Generative Video Compression: Towards 0.01% Compression Rate for Video Transmission : Abstract: Whether a video can be compressed at an extreme compression rate as low as 0.01%? To this end, we achieve the compression rate as 0.02% at some cases by introducing Generative Video Compress...
Virtual-Eyes: Quantitative Validation of a Lung CT Quality-Control Pipeline for Foundation-Model Cancer Risk Prediction : Abstract: Robust preprocessing is rarely quantified in deep-learning pipelines for low-dose CT (LDCT) lung cancer screening. We develop and validate Virtual-Eyes, a clinically motivated 16-bit CT qual...
DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments : Abstract: Deep reinforcement learning (DRL) methods have demonstrated potential for autonomous navigation and obstacle avoidance of unmanned ground vehicles (UGVs) in crowded environments. Most existi...
One-shot synthesis of rare gastrointestinal lesions improves diagnostic accuracy and clinical training : Abstract: Rare gastrointestinal lesions are infrequently encountered in routine endoscopy, restricting the data available for developing reliable artificial intelligence (AI) models and training novic...
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation : Abstract: Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding. However, they suffer from a critical vulnerability: an over-reliance on language priors, which ...
PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds : Abstract: Potato yield is a key indicator for optimizing cultivation practices in agriculture. Potato yield can be estimated on harvesters using RGB-D cameras, which capture three-dimensional (3D) inf...
Developing controlled natural language for formal specification patterns using AI assistants : Abstract: Using an AI assistant, we developed a method for systematically constructing controlled natural language for requirements based on formal specification patterns containing logical attributes...
GARDO: Reinforcing Diffusion Models without Reward Hacking : Abstract: Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment. However, since precisely specifying a ground-truth object...
Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training : Abstract: General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for ex...
OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization : Abstract: The presence of outliers in Large Language Models (LLMs) weights and activations makes them difficult to quantize. Recent work has leveraged rotations to mitigate these outliers. In this wor...
Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design : Abstract: Automated neural network architecture design remains a significant challenge in computer vision. Task diversity and computational constraints require both effective architectures and efficie...
Multilevel Fair Allocation : Abstract: We introduce the concept of multilevel fair allocation of resources with tree-structured hierarchical relations among agents. While at each level it is possible to consider the problem local...
Enhancing LLM Planning Capabilities through Intrinsic Self-Critique : Abstract: We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchma...
Factorized Learning for Temporally Grounded Video-Language Models : Abstract: Recent video-language models have shown great potential for video understanding, but still struggle with accurate temporal grounding for event-level perception. We observe that two main fact...
FedLiTeCAN : A Federated Lightweight Transformer for Fast and Robust CAN Bus Intrusion Detection : Abstract: This work implements a lightweight Transformer model for IDS in the domain of Connected and Autonomous Vehicles
Random Multiplexing : Abstract: As wireless communication applications evolve from traditional multipath environments to high-mobility scenarios like unmanned aerial vehicles, multiplexing techniques have advanced accordin...
Pathology Context Recalibration Network for Ocular Disease Recognition : Abstract: Pathology context and expert experience play significant roles in clinical ocular disease diagnosis. Although deep neural networks (DNNs) have good ocular disease recognition results, they o...
Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models : Abstract: Large Language Models (LLMs) like LLaMA, Mistral, and Gemma are increasingly used in decision-critical domains such as healthcare, law, and finance, yet their reliability remains uncertain. ...
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives : Abstract: Although Large Audio-Language Models (LALMs) deliver state-of-the-art (SOTA) performance, they frequently suffer from hallucinations, e.g. generating text not grounded in the audio input. We...
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race? : Abstract: As large language models (LLMs) are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model alignment to trigger harmful outputs, pre...
Kidney Exchange: Faster Parameterized Algorithms and Tighter Lower Bounds : Abstract: The kidney exchange mechanism allows many patient-donor pairs who are otherwise incompatible with each other to come together and exchange kidneys along a cycle. However, due to infrastructu...
PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing : Abstract: Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across e...
RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations : Abstract: Text-guided object segmentation requires both cross-modal reasoning and pixel grounding abilities. Most recent methods treat text-guided segmentation as one-shot grounding, where the model p...
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing : Abstract: Large vision-language models (VLMs) exhibit strong performance across various tasks. However, these VLMs encounter significant challenges when applied to the remote sensing domain due to the...
iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning : Abstract: Large language models (LLMs), when guided by explicit textual plans, can perform reliable step-by-step reasoning during problem-solving. However, generating accurate and effective textual pl...
TESO Tabu Enhanced Simulation Optimization for Noisy Black Box Problems : Abstract: Simulation optimization (SO) is frequently challenged by noisy evaluations, high computational costs, and complex, multimodal search landscapes. This paper introduces Tabu-Enhanced Simulatio...
Tracing the Heart's Pathways: ECG Representation Learning from a Cardiac Conduction Perspective : Abstract: The multi-lead electrocardiogram (ECG) stands as a cornerstone of cardiac diagnosis. Recent strides in electrocardiogram self-supervised learning (eSSL) have brightened prospects for enhanci...
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation : Abstract: Text-to-audio-video (T2AV) generation underpins a wide range of applications demanding realistic audio-visual content, including virtual reality, world modeling, gaming, and filmmaking. Howe...
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process : Abstract: Despite the growing reasoning capabilities of recent large language models (LLMs), their internal mechanisms during the reasoning process remain underexplored. Prior approaches often rely on...
MeLeMaD: Adaptive Malware Detection via Chunk-wise Feature Selection and Meta-Learning : Abstract: Confronting the substantial challenges of malware detection in cybersecurity necessitates solutions that are both robust and adaptable to the ever-evolving threat environment. The paper intr...
Coding With AI: From a Reflection on Industrial Practices to Future Computer Science and Software Engineering Education : Abstract: Recent advances in large language models (LLMs) have introduced new paradigms in software development, including vibe coding, AI-assisted coding, and agentic coding, fundamentally reshaping ...
Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing : Abstract: We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflow...
A Community-Aware Framework for Influence Maximization with Explicit Accounting for Inter-Community Influence : Abstract: Influence Maximization (IM) seeks to identify a small set of seed nodes in a social network to maximize expected information spread under a diffusion model. While community-based approaches ...
Efficient Context Scaling with LongCat ZigZag Attention : Abstract: We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute bud...
Physics-informed Graph Neural Networks for Operational Flood Modeling : Abstract: Flood models inform strategic disaster management by simulating the spatiotemporal hydrodynamics of flooding. While physics-based numerical flood models are accurate, their substantial compu...
An Comparative Analysis about KYC on a Recommendation System Toward Agentic Recommendation System : Abstract: This research presents a cutting-edge recommendation system utilizing agentic AI for KYC (Know Your Customer in the financial domain), and its evaluation across five distinct content vertica...
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling : Abstract: Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reason...
Interactive Machine Learning: From Theory to Scale : Abstract: Machine learning has achieved remarkable success across a wide range of applications, yet many of its most effective methods rely on access to large amounts of labeled data or extensive onli...
A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe : Abstract: Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While In...
Efficient Deep Learning for Short-Term Solar Irradiance Time Series Forecasting: A Benchmark Study in Ho Chi Minh City : Abstract: Reliable forecasting of Global Horizontal Irradiance (GHI) is essential for mitigating the variability of solar energy in power grids. This study presents a comprehensive benchmark of ten de...
How Large Language Models Systematically Misrepresent American Climate Opinions : Abstract: Federal agencies and researchers increasingly use large language models to analyze and simulate public opinion. When AI mediates between the public and policymakers, accuracy across intersec...
Autoregressive long-horizon prediction of plasma edge dynamics : Abstract: Accurate modeling of scrape-off layer (SOL) and divertor-edge dynamics is vital for designing plasma-facing components in fusion devices. High-fidelity edge fluid/neutral codes such as SOLPS...
Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack : Abstract: Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted ...
Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining : Abstract: This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduc...
Lifelong Domain Adaptive 3D Human Pose Estimation : Abstract: 3D Human Pose Estimation (3D HPE) is vital in various applications, from person re-identification and action recognition to virtual reality. However, the reliance on annotated 3D data collec...
Seeking Late Night Life Lines: Experiences of Conversational AI Use in Mental Health Crisis : Abstract: Online, people often recount their experiences turning to conversational AI agents (e.g., ChatGPT, Claude, Copilot) for mental health support -- going so far as to replace their therapists. ...
Security Without Detection: Economic Denial as a Primitive for Edge and IoT Defense : Abstract: Detection-based security fails against sophisticated attackers using encryption, stealth, and low-rate techniques, particularly in IoT/edge environments where resource constraints preclude M...
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering : Abstract: As Large Language Models (LLMs) evolve from code generators into collaborative partners for software engineers, our methods for evaluation are lagging. Current benchmarks, focused on code co...
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation : Abstract: Recent advances in mechanistic interpretability suggest that intermediate attention layers encode token-level hypotheses that are iteratively refined toward the final output. In this work, w...
Retrieval Augmented Question Answering: When Should LLMs Admit Ignorance? : Abstract: The success of expanded context windows in Large Language Models (LLMs) has driven increased use of broader context in retrieval-augmented generation. We investigate the use of LLMs for retr...
Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms : Abstract: Automated bias detection in news text is heavily used to support journalistic analysis and media accountability, yet little is known about how bias detection models arrive at their decisions...
Artificial Intelligence for All? Brazilian Teachers on Ethics, Equity, and the Everyday Challenges of AI in Education : Abstract: This study examines the perceptions of Brazilian K-12 education teachers regarding the use of AI in education, specifically General Purpose AI. This investigation employs a quantitative anal...
Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments : Abstract: Effective urban warfare training requires situational awareness and muscle memory, developed through repeated practice in realistic yet controlled environments. A key drill, Enter and Clear ...
Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware : Abstract: We present a hybrid quantum-classical framework augmented with learned error mitigation for solving the viscous Burgers equation on noisy intermediate-scale quantum (NISQ) hardware. Using th...
Improved Bounds for Private and Robust Alignment : Abstract: In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online se...
StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection : Abstract: The prevalence of chronic stress represents a significant public health concern, with social media platforms like Twitter serving as important venues for individuals to share their experienc...
Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems : Abstract: Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight urgent security gaps in Industrial IoT (IIoT...
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark : Abstract: Large language models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This degrades answer quality, inflates latency ...
A Survey on Graph Neural Networks for Fraud Detection in Ride Hailing Platforms : Abstract: This study investigates fraud detection in ride hailing platforms through Graph Neural Networks (GNNs),focusing on the effectiveness of various models. By analyzing prevalent fraudulent acti...
FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading : Abstract: Futures are contracts obligating the exchange of an asset at a predetermined date and price, notable for their high leverage and liquidity and, therefore, thrive in the Crypto market. RL has...
Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions : Abstract: Reinforcement learning (RL) in safety-critical domains requires agents to maximise rewards while strictly adhering to safety constraints. Existing approaches, such as Lagrangian and projecti...
Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations : Abstract: Fairness in algorithmic decision-making is often framed in terms of individual fairness, which requires that similar individuals receive similar outcomes. A system violates individual fairne...
Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics : Abstract: Physical AI at the edge -- enabling autonomous systems to understand and predict real-world dynamics in real time -- requires hardware-efficient learning and inference. Model recovery (MR), ...
Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning : Abstract: Speculative decoding (SD) accelerates large language model (LLM) reasoning by using a small draft model to generate candidate tokens, which the target LLM either accepts directly or regenera...
Drift-Based Dataset Stability Benchmark : Abstract: Machine learning (ML) represents an efficient and popular approach for network traffic classification. However, network traffic classification is a challenging domain, and trained models may...
Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory : Abstract: Reinforcement learning is increasingly used to transform large language models into agentic systems that act over long horizons, invoke tools, and manage memory under partial observability. ...
Leveraging Machine Learning for Early Detection of Lung Diseases : Abstract: A combination of traditional image processing methods with advanced neural networks concretes a predictive and preventive healthcare paradigm. This study offers rapid, accurate, and non-inva...
HINTS: Extraction of Human Insights from Time-Series Without External Sources : Abstract: Human decision-making, emotions, and collective psychology are complex factors that shape the temporal dynamics observed in financial and economic systems. Many recent time series forecastin...
Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation : Abstract: Evidential deep learning (EDL) models, based on Subjective Logic, introduce a principled and computationally efficient way to make deterministic neural networks uncertainty-aware. The result...
Geometric Scaling of Bayesian Inference in LLMs : Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric subst...
Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents : Abstract: Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a singl...
State-of-the-art Small Language Coder Model: Mify-Coder : Abstract: We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy an...
Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding : Abstract: Clinical coding automation using cloud-based Large Language Models (LLMs) poses privacy risks and latency bottlenecks, rendering them unsuitable for on-premise healthcare deployment. We intr...
AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization : Abstract: With the continued scaling of advanced technology nodes, the design-technology co-optimization (DTCO) paradigm has become increasingly critical, rendering efficient device design and optimiz...
Towards representation agnostic probabilistic programming : Abstract: Current probabilistic programming languages and tools tightly couple model representations with specific inference algorithms, preventing experimentation with novel representations or mixed ...
Break Out the Silverware -- Semantic Understanding of Stored Household Items : Abstract: ``Bring me a plate.'' For domestic service robots, this simple command reveals a complex challenge: inferring where everyday items are stored, often out of sight in drawers, cabinets, or clo...
Enforcing Temporal Constraints for LLM Agents : Abstract: LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and...
When in Doubt, Deliberate: Confidence-Based Routing to Expert Debate for Sexism Detection : Abstract: Sexist content online increasingly appears in subtle, context-dependent forms that evade traditional detection methods. Its interpretation often depends on overlapping linguistic, psychologi...
q3-MuPa: Quick, Quiet, Quantitative Multi-Parametric MRI using Physics-Informed Diffusion Models : Abstract: The 3D fast silent multi-parametric mapping sequence with zero echo time (MuPa-ZTE) is a novel quantitative MRI (qMRI) acquisition that enables nearly silent scanning by using a 3D phyllotax...
A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation : Abstract: Artificial intelligence is beginning to ease long-standing bottlenecks in the CAD-to-mesh pipeline. This survey reviews recent advances where machine learning aids part classification, mesh ...
HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate : Abstract: Large language models (LLMs) are equipped with safety mechanisms to detect and block harmful queries, yet current alignment approaches primarily focus on overtly dangerous content and overlo...
PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents : Abstract: LLMs excel at code generation from English prompts, but this progress has not extended to low-resource languages. We address Bangla-to-Python code generation by introducing BanglaCodeAct, an...
STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability : Abstract: Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive frame...
Enriching Historical Records: An OCR and AI-Driven Approach for Database Integration : Abstract: This research digitizes and analyzes the Leidse hoogleraren en lectoren 1575-1815 books written between 1983 and 1985, which contain biographic data about professors and curators of Leiden U...
Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings : Abstract: This study presents a conceptual framework and a prototype assessment for Large Language Model (LLM)-based Building Energy Management System (BEMS) AI agents to facilitate context-aware ener...
AMAP Agentic Planning Technical Report : Abstract: We present STAgent, an agentic large language model tailored for spatio-temporal understanding, designed to solve complex tasks such as constrained point-of-interest discovery and itinerary ...
Iterative Deployment Improves Planning Skills in LLMs : Abstract: We show that iterative deployment of large language models (LLMs), each fine-tuned on data carefully curated by users from the previous models' deployment, can significantly change the prope...
Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing : Abstract: This report presents the design and implementation of a semi-automated data annotation pipeline developed within the DARTS project, whose goal is to create a large-scale, multimodal dataset ...
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem : Abstract: Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, th...
A study on constraint extraction and exception exclusion in care worker scheduling : Abstract: Technologies for automatically generating work schedules have been extensively studied; however, in long-term care facilities, the conditions vary between facilities, making it essential to ...
GenZ: Foundational models as latent variable generators within traditional statistical models : Abstract: We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge,...
Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences : Abstract: Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While effective at prediction, these models offer limited insigh...
BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis : Abstract: Fault diagnosis of lithium-ion batteries is critical for system safety. While existing deep learning methods exhibit superior detection accuracy, their "black-box" nature hinders interpretab...
Multi-modal cross-domain mixed fusion model with dual disentanglement for fault diagnosis under unseen working conditions : Abstract: Intelligent fault diagnosis has become an indispensable technique for ensuring machinery reliability. However, existing methods suffer significant performance decline in real-world scenarios...
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization : Abstract: Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive...
Group Deliberation Oriented Multi-Agent Conversational Model for Complex Reasoning : Abstract: This paper proposes a group deliberation oriented multi-agent conversational model to address the limitations of single large language models in complex reasoning tasks. The model adopts a t...
Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization : Abstract: Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforce...
Recursive Language Models : Abstract: We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inferenc...
MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use : Abstract: Large Language Models (LLMs) are increasingly serving as autonomous agents, and their utilization of external tools via the Model Context Protocol (MCP) is considered a future trend. Current...
From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning : Abstract: Spatial reasoning in large language models (LLMs) has gained increasing attention due to applications in navigation and planning. Despite strong general language capabilities, LLMs still str...
Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems : Abstract: Understanding the limitations of Large Language Models, or LLMs, in mathematical reasoning has been the focus of several recent studies. However, the majority of these studies use the same d...
Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments : Abstract: Map environments provide a fundamental medium for representing spatial structure. Understanding how foundation model (FM) agents understand and act in such environments is therefore critical...
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? : Abstract: A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach invol...
Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents : Abstract: In this paper, we propose a test-time adaptive agent that performs exploratory inference through posterior-guided belief refinement without relying on gradient-based updates or additional tr...
Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment : Abstract: When fine-tuning pre-trained Language Models (LMs) to exhibit desired behaviors, maintaining control over risk is critical for ensuring both safety and trustworthiness. Most existing safety ...
Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem : Abstract: The Fleet Size and Mix Vehicle Routing Problem (FSMVRP) is a prominent variant of the Vehicle Routing Problem (VRP), extensively studied in operations research and computational science. FSM...
SCP: Accelerating Discovery with a Global Web of Autonomous Scientific Agents : Abstract: We introduce SCP: the Science Context Protocol, an open-source standard designed to accelerate discovery by enabling a global network of autonomous scientific agents. SCP is built on two fou...
Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks : Abstract: We present a training-free graph-based approach for solving interactive reasoning tasks in the ARC-AGI-3 benchmark. ARC-AGI-3 comprises game-like tasks where agents must infer task mechanics...
CogRec: A Cognitive Recommender Agent Fusing Large Language Models and Soar for Explainable Recommendation : Abstract: Large Language Models (LLMs) have demonstrated a remarkable capacity in understanding user preferences for recommendation systems. However, they are constrained by several critical challenge...
LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm : Abstract: The transition from static Large Language Models (LLMs) to self-improving agents is hindered by the lack of structured reasoning in traditional evolutionary approaches. Existing methods ofte...
ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment : Abstract: Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, la...
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing : Abstract: Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained by static profiles or monolithic retrieval pipeline...
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming : Abstract: Accurate disease prediction is vital for timely intervention, effective treatment, and reducing medical complications. While symbolic AI has been applied in healthcare, its adoption remains ...
CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution : Abstract: Large language model (LLM) agents currently depend on predefined tools or brittle tool generation, constraining their capability and adaptability to complex scientific tasks. We introduce CA...
The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models : Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot...

Research Sources: 450 | Generated: 1/1/2026