AI RESEARCH PAPERS & ACADEMIC SOURCES
- BikeScenes: Online LiDAR Semantic Segmentation for Bicycles : Abstract: The vulnerability of cyclists, exacerbated by the rising popularity of faster e-bikes, motivates adapting automotive perception technologies for bicycle safety. We use our multi-sensor 'Sens...
- Generative Image Restoration and Super-Resolution using Physics-Informed Synthetic Data for Scanning Tunneling Microscopy : Abstract: Scanning tunnelling microscopy (STM) enables atomic-resolution imaging and atom manipulation, but its utility is often limited by tip degradation and slow serial data acquisition. Fabricatio...
- SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing : Abstract: Rectified flow models have become a de facto standard in image generation due to their stable sampling trajectories and high-fidelity outputs. Despite their strong generative capabilities, t...
- Fine-tuning Segment Anything for Real-Time Tumor Tracking in Cine-MRI : Abstract: In this work, we address the TrackRAD2025 challenge of real-time tumor tracking in cine-MRI sequences of the thoracic and abdominal regions under strong data scarcity constraints. Two comple...
- Larger Hausdorff Dimension in Scanning Pattern Facilitates Mamba-Based Methods in Low-Light Image Enhancement : Abstract: We propose an innovative enhancement to the Mamba framework by increasing the Hausdorff dimension of its scanning pattern through a novel Hilbert Selective Scan mechanism. This mechanism exp...
- Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders : Abstract: Despite significant advances in Multimodal Large Language Models (MLLMs), understanding complex temporal dynamics in videos remains a major challenge. Our experiments show that current Video...
- FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation : Abstract: Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and ...
- OracleAgent: A Multimodal Reasoning Agent for Oracle Bone Script Research : Abstract: As one of the earliest writing systems, Oracle Bone Script (OBS) preserves the cultural and intellectual heritage of ancient civilizations. However, current OBS research faces two major chal...
- JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting : Abstract: Traditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To addres...
- Exploring Object-Aware Attention Guided Frame Association for RGB-D SLAM : Abstract: Attention models have recently emerged as a powerful approach, demonstrating significant progress in various fields. Visualization techniques, such as class activation mapping, provide visua...
- FullPart: Generating each 3D Part at Full Resolution : Abstract: Part-based 3D generation holds great potential for various applications. Previous part generators that represent parts using implicit vector-set tokens often suffer from insufficient geometr...
- BasicAVSR: Arbitrary-Scale Video Super-Resolution via Image Priors and Enhanced Motion Compensation : Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial deta...
- Detecting Unauthorized Vehicles using Deep Learning for Smart Cities: A Case Study on Bangladesh : Abstract: Modes of transportation vary across countries depending on geographical location and cultural context. In South Asian countries rickshaws are among the most common means of local transport. ...
- CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark : Abstract: Wearable devices such as smart glasses are transforming the way people interact with their surroundings, enabling users to seek information regarding entities in their view. Multi-Modal Retr...
- MoTDiff: High-resolution Motion Trajectory estimation from a single blurred image using Diffusion models : Abstract: Accurate estimation of motion information is crucial in diverse computational imaging and computer vision applications. Researchers have investigated various methods to extract motion inform...
- Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction : Abstract: 3D human pose estimation from sketches has broad applications in computer animation and film production. Unlike traditional human pose estimation, this task presents unique challenges due to...
- Developing a Multi-task Ensemble Geometric Deep Network for Supply Chain Sustainability and Risk Management : Abstract: The sustainability of supply chain plays a key role in achieving optimal performance in controlling the supply chain. The management of risks that occur in a supply chain is a fundamental pr...
- OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation : Abstract: Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, document layout g...
- Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws : Abstract: Existing infrared and visible image fusion methods often face the dilemma of balancing modal information. Generative fusion methods reconstruct fused images by learning from data distributio...
- Exploring Complementarity and Explainability in CNNs for Periocular Verification Across Acquisition Distances : Abstract: We study the complementarity of different CNNs for periocular verification at different distances on the UBIPr database. We train three architectures of increasing complexity (SqueezeNet, Mo...
- Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving : Abstract: Planning is a critical component of end-to-end autonomous driving. However, prevailing imitation learning methods often suffer from mode collapse, failing to produce diverse trajectory hypot...
- Leveraging Large-Scale Face Datasets for Deep Periocular Recognition via Ocular Cropping : Abstract: We focus on ocular biometrics, specifically the periocular region (the area around the eye), which offers high discrimination and minimal acquisition constraints. We evaluate three Convoluti...
- Towards Realistic Earth-Observation Constellation Scheduling: Benchmark and Methodology : Abstract: Agile Earth Observation Satellites (AEOSs) constellations offer unprecedented flexibility for monitoring the Earth's surface, but their scheduling remains challenging under large-scale scena...
- Exploring the correlation between the type of music and the emotions evoked: A study using subjective questionnaires and EEG : Abstract: The subject of this work is to check how different types of music affect human emotions. While listening to music, a subjective survey and brain activity measurements were carried out using ...
- A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading : Abstract: Diabetic retinopathy (DR) is a leading cause of vision loss among middle-aged and elderly people, which significantly impacts their daily lives and mental health. To improve the efficiency o...
- EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models : Abstract: Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence. To address this, we propose a dual-conditioning framew...
- A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models : Abstract: Test-time prompt tuning (TPT) has emerged as a promising technique for adapting large vision-language models (VLMs) to unseen tasks without relying on labeled data. However, the lack of disp...
- PointSt3R: Point Tracking through 3D Grounded Correspondence : Abstract: Recent advances in foundational 3D reconstruction models, such as DUSt3R and MASt3R, have shown great potential in 2D and 3D correspondence in static scenes. In this paper, we propose to ada...
- Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection : Abstract: Few-shot anomaly detection (FSAD) methods identify anomalous regions with few known normal samples. Most existing methods rely on the generalization ability of pre-trained vision-language mo...
- Analysis of the Robustness of an Edge Detector Based on Cellular Automata Optimized by Particle Swarm : Abstract: The edge detection task is essential in image processing aiming to extract relevant information from an image. One recurring problem in this task is the weaknesses found in some detectors, s...
- SA$^{2}$Net: Scale-Adaptive Structure-Affinity Transformation for Spine Segmentation from Ultrasound Volume Projection Imaging : Abstract: Spine segmentation, based on ultrasound volume projection imaging (VPI), plays a vital role for intelligent scoliosis diagnosis in clinical applications. However, this task faces several sig...
- AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping : Abstract: Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing sho...
- Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios : Abstract: In real-world environments, AI systems often face unfamiliar scenarios without labeled data, creating a major challenge for conventional scene understanding models. The inability to generali...
- CATCH: A Modular Cross-domain Adaptive Template with Hook : Abstract: Recent advances in Visual Question Answering (VQA) have demonstrated impressive performance in natural image domains, with models like LLaVA leveraging large language models (LLMs) for open-...
- Emu3.5: Native Multimodal Models are World Learners : Abstract: We introduce Emu3.5, a large-scale multimodal world model that natively predicts the next state across vision and language. Emu3.5 is pre-trained end-to-end with a unified next-token predict...
- Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras : Abstract: We propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is t...
- PT-DETR: Small Target Detection Based on Partially-Aware Detail Focus : Abstract: To address the challenges in UAV object detection, such as complex backgrounds, severe occlusion, dense small objects, and varying lighting conditions,this paper proposes PT-DETR based on RT...
- All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles : Abstract: Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to on...
- Towards Reliable Sea Ice Drift Estimation in the Arctic Deep Learning Optical Flow on RADARSAT-2 : Abstract: Accurate estimation of sea ice drift is critical for Arctic navigation, climate research, and operational forecasting. While optical flow, a computer vision technique for estimating pixel wi...
- Improving Classification of Occluded Objects through Scene Context : Abstract: The presence of occlusions has provided substantial challenges to typically-powerful object recognition algorithms. Additional sources of information can be extremely valuable to reduce erro...
- The Impact and Outlook of 3D Gaussian Splatting : Abstract: Since its introduction, 3D Gaussian Splatting (3DGS) has rapidly transformed the landscape of 3D scene representations, inspiring an extensive body of associated research. Follow-up work inc...
- ChartAB: A Benchmark for Chart Grounding & Dense Alignment : Abstract: Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate percepti...
- The Quest for Generalizable Motion Generation: Data, Model, and Evaluation : Abstract: Despite recent advances in 3D human motion generation (MoGen) on standard benchmarks, existing models still face a fundamental bottleneck in their generalization capability. In contrast, adj...
- SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting : Abstract: Immersive applications call for synthesizing spatiotemporal 4D content from casual videos without costly 3D supervision. Existing video-to-4D methods typically rely on manually annotated cam...
- Masked Diffusion Captioning for Visual Feature Learning : Abstract: We learn visual features by captioning images with an image-conditioned masked diffusion language model, a formulation we call masked diffusion captioning (MDC). During training, text tokens...
- Groupwise Registration with Physics-Informed Test-Time Adaptation on Multi-parametric Cardiac MRI : Abstract: Multiparametric mapping MRI has become a viable tool for myocardial tissue characterization. However, misalignment between multiparametric maps makes pixel-wise analysis challenging. To addr...
- StructLayoutFormer:Conditional Structured Layout Generation via Structure Serialization and Disentanglement : Abstract: Structured layouts are preferable in many 2D visual contents (\eg, GUIs, webpages) since the structural information allows convenient layout editing. Computational frameworks can help create...
- Self-localization on a 3D map by fusing global and local features from a monocular camera : Abstract: Self-localization on a 3D map by using an inexpensive monocular camera is required to realize autonomous driving. Self-localization based on a camera often uses a convolutional neural networ...
- AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM : Abstract: Autonomous robots in orchards require real-time 3D scene understanding despite repetitive row geometry, seasonal appearance changes, and wind-driven foliage motion. We present AgriGS-SLAM, a...
- Comparative Analysis of Deep Learning Models for Olive Tree Crown and Shadow Segmentation Towards Biovolume Estimation : Abstract: Olive tree biovolume estimation is a key task in precision agriculture, supporting yield prediction and resource management, especially in Mediterranean regions severely impacted by climate-...
- SAMRI: Segment Anything Model for MRI : Abstract: Accurate magnetic resonance imaging (MRI) segmentation is crucial for clinical decision-making, but remains labor-intensive when performed manually. Convolutional neural network (CNN)-based ...
- BRIQA: Balanced Reweighting in Image Quality Assessment of Pediatric Brain MRI : Abstract: Assessing the severity of artifacts in pediatric brain Magnetic Resonance Imaging (MRI) is critical for diagnostic accuracy, especially in low-field systems where the signal-to-noise ratio i...
- ProstNFound+: A Prospective Study using Medical Foundation Models for Prostate Cancer Detection : Abstract: Purpose: Medical foundation models (FMs) offer a path to build high-performance diagnostic systems. However, their application to prostate cancer (PCa) detection from micro-ultrasound ({\mu}...
- MORE: Multi-Organ Medical Image REconstruction Dataset : Abstract: CT reconstruction provides radiologists with images for diagnosis and treatment, yet current deep learning methods are typically limited to specific anatomies and datasets, hindering general...
- Quality-Aware Prototype Memory for Face Representation Learning : Abstract: Prototype Memory is a powerful model for face representation learning. It enables training face recognition models on datasets of any size by generating prototypes (classifier weights) on th...
- Dynamic Traceback Learning for Medical Report Generation : Abstract: Automated medical report generation has demonstrated the potential to significantly reduce the workload associated with time-consuming medical reporting. Recent generative representation lea...
- EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation : Abstract: Text-to-image diffusion models can generate realistic images based on textual inputs, enabling users to convey their opinions visually through language. Meanwhile, within language, emotion p...
- NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods : Abstract: Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations. With the recent rapid development of Neural Radiance Fields (NeRFs) and...
- Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data : Abstract: Dynamic facial expression recognition (DFER) infers emotions from the temporal evolution of expressions, unlike static facial expression recognition (SFER), which relies solely on a single s...
- A Continuous and Interpretable Morphometric for Robust Quantification of Dynamic Biological Shapes : Abstract: We introduce the Push-Forward Signed Distance Morphometric (PF-SDM) for shape quantification in biomedical imaging. The PF-SDM compactly encodes geometric and topological properties of close...
- Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning : Abstract: Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets, while they exhibit significant vulnerabilities to backdoor at...
- Language-guided Open-world Video Anomaly Detection under Weak Supervision : Abstract: Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected. In open-world scenarios, the expected events may change as requirements change. For example, not we...
- SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification : Abstract: Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to mainta...
- DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution : Abstract: Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampli...
- Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations : Abstract: Pre-trained stable diffusion models (SD) have shown great advances in visual correspondence. In this paper, we investigate the capabilities of Diffusion Transformers (DiTs) for accurate dens...
- LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering : Abstract: In this work, we present a novel level-of-detail (LOD) method for 3D Gaussian Splatting that enables real-time rendering of large-scale scenes on memory-constrained devices. Our approach int...
- RRCANet: Recurrent Reusable-Convolution Attention Network for Infrared Small Target Detection : Abstract: Infrared small target detection is a challenging task due to its unique characteristics (e.g., small, dim, shapeless and changeable). Recently published CNN-based methods have achieved promi...
- MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory : Abstract: Recent advances in vision-language models have enabled rich semantic understanding across modalities. However, these encoding methods lack the ability to interpret or reason about the moral ...
- Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features : Abstract: The ability of deep neural networks (DNNs) come from extracting and interpreting features from the data provided. By exploiting intermediate features in DNNs instead of relying on hard label...
- DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios : Abstract: Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge....
- ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models : Abstract: Despite the success of deep learning across various domains, it remains vulnerable to adversarial attacks. Although many existing adversarial attack methods achieve high success rates, they ...
- From One to More: Contextual Part Latents for 3D Generation : Abstract: Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despit...
- Disentangled 4D Gaussian Splatting: Rendering High-Resolution Dynamic World at 343 FPS : Abstract: While dynamic novel view synthesis from 2D videos has seen progress, achieving efficient reconstruction and rendering of dynamic scenes remains a challenging task. In this paper, we introduc...
- CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling : Abstract: Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong performance in robotic manipulation. However, these models remain constr...
- Hierarchical Graph Networks for Accurate Weather Forecasting via Lightweight Training : Abstract: Climate events arise from intricate, multivariate dynamics governed by global-scale drivers, profoundly impacting food, energy, and infrastructure. Yet, accurate weather prediction remains e...
- GSE: Group-wise Sparse and Explainable Adversarial Attacks : Abstract: Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structu...
- SafEDMD: A Koopman-based data-driven controller design framework for nonlinear dynamical systems : Abstract: The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposi...
- Random pairing MLE for estimation of item parameters in Rasch model : Abstract: The Rasch model, a classical model in the item response theory, is widely used in psychometrics to model the relationship between individuals' latent traits and their binary responses to ass...
- Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection : Abstract: The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separ...
- Diffusion Map Autoencoder : Abstract: Diffusion-Map-AutoEncoder (DMAE) pairs a diffusion-map encoder (using the Nystr\"om method) with linear or RBF Gaussian-Process latent mean decoders, yielding closed-form inductive mappings ...
- Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes : Abstract: Breast cancer's complexity and variability pose significant challenges in understanding its progression and guiding effective treatment. This study aims to integrate protein sequence data wi...
- Unified Error Correction Code Transformer with Low Complexity : Abstract: Channel coding is vital for reliable sixth-generation (6G) data transmission, employing diverse error correction codes for various application scenarios. Traditional decoders require dedicat...
- OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models : Abstract: We consider the problem of text-to-video generation tasks with precise control for various applications such as camera movement control and video-to-video editing. Most methods tacking this ...
- Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation : Abstract: We study parameter inference in simulation-based stochastic models where the analytical form of the likelihood is unknown. The main difficulty is that score evaluation as a ratio of noisy Mo...
- HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location : Abstract: Large language models (LLMs) have facilitated a wide range of applications with distinct service-level objectives (SLOs), from latency-sensitive online tasks like interactive chatbots to thr...
- Model Provenance Testing for Large Language Models : Abstract: Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model ...
- On the Impact of Performative Risk Minimization for Binary Random Variables : Abstract: Performativity, the phenomenon where outcomes are influenced by predictions, is particularly prevalent in social contexts where individuals strategically respond to a deployed model. In orde...
- Improving LLM Safety Alignment with Dual-Objective Optimization : Abstract: Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization (DPO), a widely deployed alignment ...
- Accurate predictive model of band gap with selected important features based on explainable machine learning : Abstract: In the rapidly advancing field of materials informatics, nonlinear machine learning models have demonstrated exceptional predictive capabilities for material properties. However, their black...
- Cybersecurity threat detection based on a UEBA framework using Deep Autoencoders : Abstract: User and Entity Behaviour Analytics (UEBA) is a broad branch of data analytics that attempts to build a normal behavioural profile in order to detect anomalous events. Among the techniques u...
- Optimal Online Change Detection via Random Fourier Features : Abstract: This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and...
- Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming : Abstract: Neural Networks (NN) with ReLU activation functions are used to model multiparametric quadratic optimization problems (mp-QP) in diverse engineering applications. Researchers have suggested ...
- Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood : Abstract: This study introduces an integrated framework for predictive causal inference designed to overcome limitations inherent in conventional single model approaches. Specifically, we combine a Hi...
- Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy : Abstract: Machine learning (ML) has the potential to support and improve expert performance in monitoring the brain function of at-risk newborns. Developing accurate and reliable ML models depends on ...
- Direct Debiased Machine Learning via Bregman Divergence Minimization : Abstract: We develop a direct debiased machine learning framework comprising Neyman targeted estimation and generalized Riesz regression. Our framework unifies Riesz regression for automatic debiased ...
- LISTEN to Your Preferences: An LLM Framework for Multi-Objective Selection : Abstract: Human experts often struggle to select the best option from a large set of items with multiple competing objectives, a process bottlenecked by the difficulty of formalizing complex, implicit...
- Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data : Abstract: Long-context language models unlock advanced capabilities in reasoning, code generation, and document summarization by leveraging dependencies across extended spans of text. However, a signi...
- Ideology-Based LLMs for Content Moderation : Abstract: Large language models (LLMs) are increasingly used in content moderation systems, where ensuring fairness and neutrality is essential. In this study, we examine how persona adoption influenc...
- A Survey on Efficient Large Language Model Training: From Data-centric Perspectives : Abstract: Post-training of Large Language Models (LLMs) is crucial for unlocking their task generalization potential and domain-specific capabilities. However, the current LLM post-training paradigm f...
- RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline : Abstract: If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely rep...
- Semantic Label Drift in Cross-Cultural Translation : Abstract: Machine Translation (MT) is widely employed to address resource scarcity in low-resource languages by generating synthetic data from high-resource counterparts. While sentiment preservation ...
- SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation : Abstract: Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting stra...
- NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium : Abstract: AI accelerators, customized to AI workloads, provide cost-effective and high-performance solutions for training and inference. Trainium, an AI accelerator recently developed by Amazon Web Se...
- QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback : Abstract: Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, hum...
- Reasoning Path Divergence: A New Metric and Curation Strategy to Unlock LLM Diverse Thinking : Abstract: While Test-Time Scaling (TTS) has proven effective in improving the reasoning ability of large language models (LLMs), low diversity in model outputs often becomes a bottleneck; this is part...
- On the Influence of Discourse Relations in Persuasive Texts : Abstract: This paper investigates the relationship between Persuasion Techniques (PTs) and Discourse Relations (DRs) by leveraging Large Language Models (LLMs) and prompt engineering. Since no dataset...
- MossNet: Mixture of State-Space Experts is a Multi-Head Attention : Abstract: Large language models (LLMs) have significantly advanced generative applications in natural language processing (NLP). Recent trends in model architectures revolve around efficient variants ...
- Similarity-Distance-Magnitude Language Models : Abstract: We introduce Similarity-Distance-Magnitude (SDM) language models (LMs), which are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high...
- RCScore: Quantifying Response Consistency in Large Language Models : Abstract: Current LLM evaluations often rely on a single instruction template, overlooking models' sensitivity to instruction style-a critical aspect for real-world deployments. We present RCScore, a ...
- Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs : Abstract: The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This stud...
- Language Models Are Borrowing-Blind: A Multilingual Evaluation of Loanword Identification across 10 Languages : Abstract: Throughout language history, words are borrowed from one language to another and gradually become integrated into the recipient's lexicon. Speakers can often differentiate these loanwords fr...
- Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual : Abstract: Vision-language models (VLMs) exhibit uneven performance across languages, a problem that is often exacerbated when the model size is reduced. While Knowledge distillation (KD) demonstrates ...
- Do LLMs Signal When They're Right? Evidence from Neuron Agreement : Abstract: Large language models (LLMs) commonly boost reasoning via sample-evaluate-ensemble decoders, achieving label free gains without ground truth. However, prevailing strategies score candidates ...
- SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling : Abstract: Language models can be used to provide interactive, personalized student feedback in educational settings. However, real-world deployment faces three key challenges: privacy concerns, limite...
- On the Role of Context for Discourse Relation Classification in Scientific Writing : Abstract: With the increasing use of generative Artificial Intelligence (AI) methods to support science workflows, we are interested in the use of discourse-level information to find supporting eviden...
- OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education : Abstract: With the rapid development of large language models (LLMs), various LLM-based works have been widely applied in educational fields. However, most existing LLMs and their benchmarks focus pri...
- 1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in language comprehension and generation; however, their widespread adoption is constrained by substantial bandwidth and...
- A Multi-agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool : Abstract: Purpose: The purpose of this study was to determine if an ensemble of multiple LLM agents could be used collectively to provide a more reliable assessment of a pixel-based AI triage tool tha...
- Hebrew Diacritics Restoration using Visual Representation : Abstract: Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language's high degree of ambiguity when unvoc...
- SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding : Abstract: Multi-page visual documents such as manuals, brochures, presentations, and posters convey key information through layout, colors, icons, and cross-slide references. While large language mode...
- Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model : Abstract: Recent large language model (LLM) research has undergone an architectural shift from encoder-decoder modeling to nowadays the dominant decoder-only modeling. This rapid transition, however, ...
- Enhancing Underwater Object Detection through Spatio-Temporal Analysis and Spatial Attention Networks : Abstract: This study examines the effectiveness of spatio-temporal modeling and the integration of spatial attention mechanisms in deep learning models for underwater object detection. Specifically, i...
- FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X : Abstract: Social platforms distribute information at unprecedented speed, which in turn accelerates the spread of misinformation and threatens public discourse. We present FakeZero, a fully client-sid...
- CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments : Abstract: Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic...
- ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests : Abstract: Recommender systems are among the most impactful AI applications, interacting with billions of users every day, guiding them to relevant products, services, or information tailored to their ...
- SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level : Abstract: The evaluation of intelligibility for TTS has reached a bottleneck, as existing assessments heavily rely on word-by-word accuracy metrics such as WER, which fail to capture the complexity of...
- Which Way Does Time Flow? A Psychophysics-Grounded Evaluation for Vision-Language Models : Abstract: Modern vision-language models (VLMs) excel at many multimodal tasks, yet their grasp of temporal information in video remains weak and, crucially, under-evaluated. We probe this gap with a d...
- Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis : Abstract: Test oracle generation in non-regression testing is a longstanding challenge in software engineering, where the goal is to produce oracles that can accurately determine whether a function un...
- Rethinking Text-to-SQL: Dynamic Multi-turn SQL Interaction for Real-world Database Exploration : Abstract: Recent advances in Text-to-SQL have achieved strong results in static, single-turn tasks, where models generate SQL queries from natural language questions. However, these systems fall short...
- Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning : Abstract: Logical reasoning is a pivotal component in the field of artificial intelligence. Proof planning, particularly in contexts requiring the validation of explanation accuracy, continues to pres...
- The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks : Abstract: Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) la...
- Unstructured Evidence Attribution for Long Context Query Focused Summarization : Abstract: Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query, and extracting and citing evidence spans helps improve the trustworthine...
- Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis : Abstract: Multimodal Aspect-Based Sentiment Analysis (MABSA) seeks to extract fine-grained information from image-text pairs to identify aspect terms and determine their sentiment polarity. However, e...
- ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models : Abstract: Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, ...
- ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved ...
- AI Debate Aids Assessment of Controversial Claims : Abstract: As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially...
- SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat : Abstract: We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in eva...
- Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text : Abstract: The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detecto...
- Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens : Abstract: Previous research has primarily focused on the cognitive error detection capabilities of Large Language Models (LLMs), often prompting them to analyze mistakes in reasoning chains. However, ...
- Comparing human and LLM politeness strategies in free production : Abstract: Polite speech poses a fundamental alignment challenge for large language models (LLMs). Humans deploy a rich repertoire of linguistic strategies to balance informational and social goals -- ...
- Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality : Abstract: Supervised fine-tuning (SFT) is a critical step in aligning large language models (LLMs) with human instructions and values, yet many aspects of SFT remain poorly understood. We trained a wi...
- Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction : Abstract: With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). ...
- Enhancing Reasoning Skills in Small Persian Medical Language Models Can Outperform Large-Scale Data Training : Abstract: Enhancing reasoning capabilities in small language models is critical for specialized applications such as medical question answering, particularly in underrepresented languages like Persian...
- Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model : Abstract: Integrating audio and visual data for training multimodal foundational models remains a challenge. The Audio-Video Vector Alignment (AVVA) framework addresses this by considering AV scene al...
- GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors : Abstract: In this paper, we introduce GradEscape, the first gradient-based evader designed to attack AI-generated text (AIGT) detectors. GradEscape overcomes the undifferentiable computation problem, ...
- Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning : Abstract: The goal of policy learning is to train a policy function that recommends a treatment given covariates to maximize population welfare. There are two major approaches in policy learning: the ...
- SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models : Abstract: This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns f...
- Surpassing state of the art on AMD area estimation from RGB fundus images through careful selection of U-Net architectures and loss functions for class imbalance : Abstract: Age-related macular degeneration (AMD) is one of the leading causes of irreversible vision impairment in people over the age of 60. This research focuses on semantic segmentation for AMD les...
- A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression : Abstract: This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE...
- HEIR: Learning Graph-Based Motion Hierarchies : Abstract: Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among ...
- Scaling Image Geo-Localization to Continent Level : Abstract: Determining the precise geographic location of an image at a global scale remains an unsolved challenge. Standard image retrieval techniques are inefficient due to the sheer volume of images...
- OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes : Abstract: There are two prevalent ways to constructing 3D scenes: procedural generation and 2D lifting. Among them, panorama-based 2D lifting has emerged as a promising technique, leveraging powerful ...
- Time Weaver: A Conditional Time Series Generation Model : Abstract: Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. ...
- Parallel Unlearning in Inherited Model Networks : Abstract: Unlearning is challenging in generic learning frameworks with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning...
- Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning : Abstract: Neurosymbolic learning enables the integration of symbolic reasoning with deep learning but faces significant challenges in scaling to complex symbolic programs, large datasets, or both. We ...
- Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$ : Abstract: Prior work (Klochkov $\&$ Zhivotovskiy, 2021) establishes at most $O\left(\log (n)/n\right)$ excess risk bounds via algorithmic stability for strongly-convex learners with high probability. ...
- Hysteresis Activation Function for Efficient Inference : Abstract: The widely used ReLU is favored for its hardware efficiency, {as the implementation at inference is a one bit sign case,} yet suffers from issues such as the ``dying ReLU'' problem, where du...
- HoGA: Higher-Order Graph Attention via Diversity-Aware k-Hop Sampling : Abstract: Graphs model latent variable relationships in many real-world systems, and Message Passing Neural Networks (MPNNs) are widely used to learn such structures for downstream tasks. While edge-b...
- Omni-Mol: Multitask Molecular Model for Any-to-any Modalities : Abstract: In the molecular domain, numerous studies have explored the use of multimodal large language models (LLMs) to construct a general-purpose, multi-task molecular model. However, these efforts ...
- Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance : Abstract: Neural network-based decoding methods show promise in enhancing error correction performance but face challenges with punctured codes. In particular, existing methods struggle to adapt to va...
- Experiments with Optimal Model Trees : Abstract: Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values...
- Explainable post-training bias mitigation with distribution-based fairness metrics : Abstract: We develop a novel bias mitigation framework with distribution-based fairness constraints suitable for producing demographically blind and explainable machine-learning models across a wide r...
- Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models : Abstract: Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this cha...
- Advancing Local Clustering on Graphs via Compressive Sensing: Semi-supervised and Unsupervised Methods : Abstract: Local clustering aims to identify specific substructures within a large graph without any additional structural information of the graph. These substructures are typically small compared to ...
- AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning : Abstract: Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly ...
- A Robust and Non-Iterative Tensor Decomposition Method with Automatic Thresholding : Abstract: Recent advances in IoT and biometric sensing technologies have led to the generation of massive and high-dimensional tensor data, yet achieving accurate and efficient low-rank approximation ...
- Improving the Euclidean Diffusion Generation of Manifold Data by Mitigating Score Function Singularity : Abstract: Euclidean diffusion models have achieved remarkable success in generative modeling across diverse domains, and they have been extended to manifold cases in recent advances. Instead of explic...
- Is Grokking a Computational Glass Relaxation? : Abstract: Understanding neural network's (NN) generalizability remains a central question in deep learning research. The special phenomenon of grokking, where NNs abruptly generalize long after the tr...
- Neurosymbolic Diffusion Models : Abstract: Neurosymbolic (NeSy) predictors combine neural perception with symbolic reasoning to solve tasks like visual reasoning. However, standard NeSy predictors assume conditional independence betw...
- On the creation of narrow AI: hierarchy and nonlocality of neural network skills : Abstract: We study the problem of creating strong, yet narrow, AI systems. While recent AI progress has been driven by the training of large general-purpose foundation models, the creation of smaller ...
- C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models : Abstract: Low-Rank Adaptation (LoRA) offers a cost-effective solution for fine-tuning large language models (LLMs), but it often produces overconfident predictions in data-scarce few-shot settings. To...
- TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields : Abstract: While deep learning has achieved remarkable success across many domains, it has historically underperformed on tabular learning tasks, which remain dominated by gradient boosting decision tr...
- Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL : Abstract: A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel...
- Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees : Abstract: Recent neural combinatorial optimization (NCO) methods have shown promising problem-solving ability without requiring domain-specific expertise. Most existing NCO methods use training and te...
- Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning : Abstract: Neural Combinatorial Optimization (NCO) has emerged as a promising learning-based paradigm for addressing Vehicle Routing Problems (VRPs) by minimizing the need for extensive manual engineer...
- MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver : Abstract: Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. Howe...
- AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation : Abstract: Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods--whether physics-based or deep learning-based--are developed around holo protein structures...
- Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks : Abstract: Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to s...
- When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product : Abstract: State-of-the-art embeddings often capture distinct yet complementary discriminative features: For instance, one image embedding model may excel at distinguishing fine-grained textures, while...
- A geometric framework for momentum-based optimizers for low-rank training : Abstract: Low-rank pre-training and fine-tuning have recently emerged as promising techniques for reducing the computational and storage costs of large neural networks. Training low-rank parameterizat...
- LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding : Abstract: Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating trea...
- Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update : Abstract: We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby m...
- FEDONet : Fourier-Embedded DeepONet for Spectrally Accurate Operator Learning : Abstract: Deep Operator Networks (DeepONets) have recently emerged as powerful data-driven frameworks for learning nonlinear operators, particularly suited for approximating solutions to partial diffe...
- Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data : Abstract: Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represente...
- Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum : Abstract: Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality data...
- Curriculum Abductive Learning : Abstract: Abductive Learning (ABL) integrates machine learning with logical reasoning in a loop: a learning model predicts symbolic concept labels from raw inputs, which are revised through abduction ...
- Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space : Abstract: Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has im...
- Learning to Insert for Constructive Neural Vehicle Routing Solver : Abstract: Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO me...
- Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English : Abstract: Sarcasm is a challenge to sentiment analysis because of the incongruity between stated and implied sentiment. The challenge is exacerbated when the implication may be relevant to a specific ...
- StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations : Abstract: Recently, text-to-image diffusion models have been widely used for style mimicry and personalized customization through methods such as DreamBooth and Textual Inversion. This has raised conc...
- Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers : Abstract: Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent pag...
- Learning World Models for Interactive Video Generation : Abstract: Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation ...
- Towards Predicting Any Human Trajectory In Context : Abstract: Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domai...
- Efficient Regression-Based Training of Normalizing Flows for Boltzmann Generators : Abstract: Simulation-free training frameworks have been at the forefront of the generative modelling revolution in continuous spaces, leading to large-scale diffusion and flow matching models. However...
- Incentivizing LLMs to Self-Verify Their Answers : Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in complex reasoning tasks through both post-training and test-time scaling laws. While prevalent test-time scaling approac...
- UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection : Abstract: The detection of ligand binding sites for proteins is a fundamental step in Structure-Based Drug Design. Despite notable advances in recent years, existing methods, datasets, and evaluation ...
- GenIR: Generative Visual Feedback for Mental Image Retrieval : Abstract: Vision-language models (VLMs) have shown strong performance on text-to-image retrieval benchmarks. However, bridging this success to real-world applications remains a challenge. In practice,...
- Human-assisted Robotic Policy Refinement via Action Preference Optimization : Abstract: Establishing a reliable and iteratively refined robotic system is essential for deploying real-world applications. While Vision-Language-Action (VLA) models are widely recognized as the foun...
- SAFE: Multitask Failure Detection for Vision-Language-Action Models : Abstract: While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks ...
- SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models via RKE Score : Abstract: Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of pro...
- The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs : Abstract: With the rapid advancement of artificial intelligence, Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), including content generation, hum...
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study : Abstract: Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic ...
- AIMeter: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads : Abstract: The rapid advancement of AI, particularly large language models (LLMs), has raised significant concerns about the energy use and carbon emissions associated with model training and inference...
- Controlling Thinking Speed in Reasoning Models : Abstract: Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 ...
- Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It : Abstract: Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differe...
- DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving : Abstract: Speculative decoding accelerates large language model inference, but its reliance on a fixed speculation length is suboptimal in large-batch serving environments with diverse requests. This ...
- FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes : Abstract: The growing popularity of robotic minimally invasive surgeries has made deep learning-based surgical training a key area of research. A thorough understanding of the surgical scene component...
- SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes : Abstract: High-energy materials (HEMs) are critical for propulsion and defense domains, yet their discovery remains constrained by experimental data and restricted access to testing facilities. This w...
- Optimal Information Combining for Multi-Agent Systems Using Adaptive Bias Learning : Abstract: Modern multi-agent systems ranging from sensor networks monitoring critical infrastructure to crowdsourcing platforms aggregating human intelligence can suffer significant performance degrad...
- FreIE: Low-Frequency Spectral Bias in Neural Networks for Time-Series Tasks : Abstract: The inherent autocorrelation of time series data presents an ongoing challenge to multivariate time series prediction. Recently, a widely adopted approach has been the incorporation of frequ...
- Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training : Abstract: Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity o...
- PRESTO: Preimage-Informed Instruction Optimization for Prompting Black-Box LLMs : Abstract: Large language models (LLMs) have achieved remarkable success across diverse domains, due to their strong instruction-following capabilities. This has led to increasing interest in optimizin...
- MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs : Abstract: Large Multimodal Models (LMMs) are increasingly capable of answering medical questions that require joint reasoning over images and text, yet training general medical VQA systems is impeded ...
- $\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate t...
- Topology-Aware Active Learning on Graphs : Abstract: We propose a graph-topological approach to active learning that directly targets the core challenge of exploration versus exploitation under scarce label budgets. To guide exploration, we in...
- Active Learning with Task-Driven Representations for Messy Pools : Abstract: Active learning has the potential to be especially useful for messy, uncurated pools where datapoints vary in relevance to the target task. However, state-of-the-art approaches to this probl...
- Robust GNN Watermarking via Implicit Perception of Topological Invariants : Abstract: Graph Neural Networks (GNNs) are valuable intellectual property, yet many watermarks rely on backdoor triggers that break under common model edits and create ownership ambiguity. We present ...
- Modular Linear Tokenization (MLT) : Abstract: This paper introduces Modular Linear Tokenization (MLT), a reversible and deterministic technique for encoding high-cardinality categorical identifiers into compact numerical vectors. Unlike...
- On the Dataless Training of Neural Networks : Abstract: This paper surveys studies on the use of neural networks for optimization in the training-data-free setting. Specifically, we examine the dataless application of neural network architectures...
- Contrastive Predictive Coding Done Right for Mutual Information Estimation : Abstract: The InfoNCE objective, originally introduced for contrastive representation learning, has become a popular choice for mutual information (MI) estimation, despite its indirect connection to M...
- A General and Streamlined Differentiable Optimization Framework : Abstract: Differentiating through constrained optimization problems is increasingly central to learning, control, and large-scale decision-making systems, yet practical integration remains challenging...
- Efficient Online Learning with Predictive Coding Networks: Exploiting Temporal Correlations : Abstract: Robotic systems operating at the edge require efficient online learning algorithms that can continuously adapt to changing environments while processing streaming sensory data. Traditional b...
- Infrequent Exploration in Linear Bandits : Abstract: We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), wh...
- Exploring Human-AI Conceptual Alignment through the Prism of Chess : Abstract: Do AI systems truly understand human concepts or merely mimic surface patterns? We investigate this through chess, where human creativity meets precise strategic concepts. Analyzing a 270M-p...
- Towards Scaling Laws for Symbolic Regression : Abstract: Symbolic regression (SR) aims to discover the underlying mathematical expressions that explain observed data. This holds promise for both gaining scientific insight and for producing inheren...
- New Money: A Systematic Review of Synthetic Data Generation for Finance : Abstract: Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, s...
- LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline : Abstract: Bug bisection has been an important security task that aims to understand the range of software versions impacted by a bug, i.e., identifying the commit that introduced the bug. However, tra...
- Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error : Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly boosted the reasoning capability of large language models (LLMs) recently. However, existing RLVR approaches merely tr...
- maxVSTAR: Maximally Adaptive Vision-Guided CSI Sensing with Closed-Loop Edge Model Adaptation for Robust Human Activity Recognition : Abstract: WiFi Channel State Information (CSI)-based human activity recognition (HAR) provides a privacy-preserving, device-free sensing solution for smart environments. However, its deployment on edg...
- STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments : Abstract: Human Activity Recognition (HAR) via Wi-Fi Channel State Information (CSI) presents a privacy-preserving, contactless sensing approach suitable for smart homes, healthcare monitoring, and mo...
- A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation : Abstract: Public resource allocation involves the efficient distribution of resources, including urban infrastructure, energy, and transportation, to effectively meet societal demands. However, existi...
- Likely Interpolants of Generative Models : Abstract: Interpolation in generative models allows for controlled generation, model inspection, and more. Unfortunately, most generative models lack a principal notion of interpolants without restric...
- Empirical Bayesian Multi-Bandit Learning : Abstract: Multi-task learning in contextual bandits has attracted significant research interest due to its potential to enhance decision-making across multiple related tasks by leveraging shared struc...
- Offline Clustering of Preference Learning with Active-data Augmentation : Abstract: Preference learning from pairwise feedback is a widely adopted framework in applications such as reinforcement learning with human feedback and recommendations. In many practical settings, h...
- Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning : Abstract: Continual learning (CL) aims to incrementally train a model on a sequence of tasks while retaining performance on prior ones. However, storing and replaying data is often infeasible due to p...
- On the Impact of Weight Discretization in QUBO-Based SVM Training : Abstract: Training Support Vector Machines (SVMs) can be formulated as a QUBO problem, enabling the use of quantum annealing for model optimization. In this work, we study how the number of qubits - l...
- Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections : Abstract: Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a fram...
- UnifiedFL: A Dynamic Unified Learning Framework for Equitable Federation : Abstract: Federated learning (FL) has emerged as a key paradigm for collaborative model training across multiple clients without sharing raw data, enabling privacy-preserving applications in areas suc...
- Towards Explainable and Reliable AI in Finance : Abstract: Financial forecasting increasingly uses large neural network models, but their opacity raises challenges for trust and regulatory compliance. We present several approaches to explainable and...
- CorVS: Person Identification via Video Trajectory-Sensor Correspondence in a Real-World Warehouse : Abstract: Worker location data is key to higher productivity in industrial sites. Cameras are a promising tool for localization in logistics warehouses since they also offer valuable environmental con...
- Efficient Generative AI Boosts Probabilistic Forecasting of Sudden Stratospheric Warmings : Abstract: Sudden Stratospheric Warmings (SSWs) are key sources of subseasonal predictability and major drivers of extreme winter weather. Yet, their accurate and efficient forecast remains a persisten...
- Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning : Abstract: Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. ...
- Multi-Task Learning Based on Support Vector Machines and Twin Support Vector Machines: A Comprehensive Survey : Abstract: Multi-task learning (MTL) enables simultaneous training across related tasks, leveraging shared information to improve generalization, efficiency, and robustness, especially in data-scarce o...
- Co-Evolving Latent Action World Models : Abstract: Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a tw...
- ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems : Abstract: Adapting large language models (LLMs) via reinforcement learning (RL) is often bottlenecked by the generation stage, which can consume over 75\% of the training time. Speculative decoding (S...
- Quantum Gated Recurrent GAN with Gaussian Uncertainty for Network Anomaly Detection : Abstract: Anomaly detection in time-series data is a critical challenge with significant implications for network security. Recent quantum machine learning approaches, such as quantum kernel methods a...
- Data-Efficient RLVR via Off-Policy Influence Guidance : Abstract: Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection ...
- Enhancing ECG Classification Robustness with Lightweight Unsupervised Anomaly Detection Filters : Abstract: Continuous electrocardiogram (ECG) monitoring via wearables offers significant potential for early cardiovascular disease (CVD) detection. However, deploying deep learning models for automat...
- LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection : Abstract: Model and hyperparameter selection are critical but challenging in machine learning, typically requiring expert intuition or expensive automated search. We investigate whether large language...
- Think Outside the Policy: In-Context Steered Policy Optimization : Abstract: Existing Reinforcement Learning from Verifiable Rewards (RLVR) methods, such as Group Relative Policy Optimization (GRPO), have achieved remarkable progress in improving the reasoning capabi...
- Polybasic Speculative Decoding Through a Theoretical Perspective : Abstract: Inference latency stands as a critical bottleneck in the large-scale deployment of Large Language Models (LLMs). Speculative decoding methods have recently shown promise in accelerating infe...
- Higher-Order Regularization Learning on Hypergraphs : Abstract: Higher-Order Hypergraph Learning (HOHL) was recently introduced as a principled alternative to classical hypergraph regularization, enforcing higher-order smoothness via powers of multiscale...
- A Three-Stage Bayesian Transfer Learning Framework to Improve Predictions in Data-Scarce Domains : Abstract: The use of ML in engineering has grown steadily to support a wide array of applications. Among these methods, deep neural networks have been widely adopted due to their performance and acces...
- Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices : Abstract: Deploying machine learning models on compute-constrained devices has become a key building block of modern IoT applications. In this work, we present a compression scheme for boosted decisio...
- On Measuring Localization of Shortcuts in Deep Networks : Abstract: Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact ...
- Wasserstein Regression as a Variational Approximation of Probabilistic Trajectories through the Bernstein Basis : Abstract: This paper considers the problem of regression over distributions, which is becoming increasingly important in machine learning. Existing approaches often ignore the geometry of the probabil...
- Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization : Abstract: Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to e...
- MSAD: A Deep Dive into Model Selection for Time series Anomaly Detection : Abstract: Anomaly detection is a fundamental task for time series analytics with important implications for the downstream performance of many applications. Despite increasing academic interest and th...
- Curly Flow Matching for Learning Non-gradient Field Dynamics : Abstract: Modeling the transport dynamics of natural processes from population-level observations is a ubiquitous problem in the natural sciences. Such models rely on key assumptions about the underly...
- Tight Differentially Private PCA via Matrix Coherence : Abstract: We revisit the task of computing the span of the top $r$ singular vectors $u_1, \ldots, u_r$ of a matrix under differential privacy. We show that a simple and efficient algorithm -- based on...
- LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits : Abstract: Low-Rank Adaptation (LoRA) has become a popular technique for parameter-efficient fine-tuning of large language models (LLMs). In many real-world scenarios, multiple adapters are loaded simu...
- How Regularization Terms Make Invertible Neural Networks Bayesian Point Estimators : Abstract: Can regularization terms in the training of invertible neural networks lead to known Bayesian point estimators in reconstruction? Invertible networks are attractive for inverse problems due ...
- Budgeted Multiple-Expert Deferral : Abstract: Learning to defer uncertain predictions to costly experts offers a powerful strategy for improving the accuracy and efficiency of machine learning systems. However, standard training procedu...
- An All-Reduce Compatible Top-K Compressor for Communication-Efficient Distributed Learning : Abstract: Communication remains a central bottleneck in large-scale distributed machine learning, and gradient sparsification has emerged as a promising strategy to alleviate this challenge. However, ...
- LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation : Abstract: A vast majority of mass spectrometry data remains uncharacterized, leaving much of its biological and chemical information untapped. Recent advances in machine learning have begun to address...
- On Purely Private Covariance Estimation : Abstract: We present a simple perturbation mechanism for the release of $d$-dimensional covariance matrices $\Sigma$ under pure differential privacy. For large datasets with at least $n\geq d^2/\varep...
- Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification : Abstract: Recent research on time series foundation models has primarily focused on forecasting, leaving it unclear how generalizable their learned representations are. In this study, we examine wheth...
- Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability : Abstract: We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs intr...
- RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics : Abstract: mRNA design and optimization are important in synthetic biology and therapeutic development, but remain understudied in machine learning. Systematic optimization of mRNAs is hindered by the ...
- Pulsar Detection with Deep Learning : Abstract: Pulsar surveys generate millions of candidates per run, overwhelming manual inspection. This thesis builds a deep learning pipeline for radio pulsar candidate selection that fuses array-deri...
- StreetMath: Study of LLMs' Approximation Behaviors : Abstract: There is a substantial body of literature examining the mathematical reasoning capabilities of large language models (LLMs), particularly their performance on precise arithmetic operations i...
- Review Based Entity Ranking using Fuzzy Logic Algorithmic Approach: Analysis : Abstract: Opinion mining, also called sentiment analysis, is the field of study that analyzes people opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as pro...
- Attention Augmented GNN RNN-Attention Models for Advanced Cybersecurity Intrusion Detection : Abstract: In this paper, we propose a novel hybrid deep learning architecture that synergistically combines Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and multi-head attention mec...
- Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models : Abstract: Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that spars...
- Flex-GAD : Flexible Graph Anomaly Detection : Abstract: Detecting anomalous nodes in attributed networks, where each node is associated with both structural connections and descriptive attributes, is essential for identifying fraud, misinformatio...
- Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms : Abstract: We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally trac...
- Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction : Abstract: Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptide...
- Beyond Long Context: When Semantics Matter More than Tokens : Abstract: Electronic Health Records (EHR) store clinical documentation as base64 encoded attachments in FHIR DocumentReference resources, which makes semantic question answering difficult. Traditional...
- Debate2Create: Robot Co-design via Large Language Model Debates : Abstract: Automating the co-design of a robot's morphology and control is a long-standing challenge due to the vast design space and the tight coupling between body and behavior. We introduce Debate2C...
- MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency : Abstract: Current text-to-image generative models are trained on large uncurated datasets to enable diverse generation capabilities. However, this does not align well with user preferences. Recently, ...
- InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics : Abstract: In control problems and basic scientific modeling, it is important to compare observations with dynamical simulations. For example, comparing two neural systems can shed light on the nature ...
- Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables : Abstract: Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction ...
- AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache : Abstract: Large Language Models (LLMs) are widely used in generative applications such as chatting, code generation, and reasoning. However, many realworld workloads such as classification, question a...
- Enabling Fast and Accurate Neutral Atom Readout through Image Denoising : Abstract: Neutral atom quantum computers hold promise for scaling up to hundreds of thousands of qubits, but their progress is constrained by slow qubit readout. Measuring qubits currently takes milli...
- Detecting Anomalies in Machine Learning Infrastructure via Hardware Telemetry : Abstract: Modern machine learning (ML) has grown into a tightly coupled, full-stack ecosystem that combines hardware, software, network, and applications. Many users rely on cloud providers for elasti...
- Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation : Abstract: Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evalua...
- Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods : Abstract: While autonomous racing performance in Time-Trial scenarios has seen significant progress and development, autonomous wheel-to-wheel racing and overtaking are still severely limited. These l...
- $L_1$-norm Regularized Indefinite Kernel Logistic Regression : Abstract: Kernel logistic regression (KLR) is a powerful classification method widely applied across diverse domains. In many real-world scenarios, indefinite kernels capture more domain-specific stru...
- Bias-Corrected Data Synthesis for Imbalanced Learning : Abstract: Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive...
- ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models : Abstract: Recent advances in Audio-Language Models (ALMs) have significantly improved multimodal understanding capabilities. However, the introduction of the audio modality also brings new and unique ...
- Robust Super-Capacity SRS Channel Inpainting via Diffusion Models : Abstract: Accurate channel state information (CSI) is essential for reliable multiuser MIMO operation. In 5G NR, reciprocity-based beamforming via uplink Sounding Reference Signals (SRS) face resource...
- Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning : Abstract: Physics-informed machine learning (PIML) integrates prior physical information, often in the form of differential equation constraints, into the process of fitting machine learning models to...
- PVMark: Enabling Public Verifiability for LLM Watermarking Schemes : Abstract: Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, cur...
- A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection : Abstract: Anomaly detection is a critical task in cybersecurity, where identifying insider threats, access violations, and coordinated attacks is essential for ensuring system resilience. Graph-based ...
- SABER: Symbolic Regression-based Angle of Arrival and Beam Pattern Estimator : Abstract: Accurate Angle-of-arrival (AoA) estimation is essential for next-generation wireless communication systems to enable reliable beamforming, high-precision localization, and integrated sensing...
- Multi-Output Robust and Conjugate Gaussian Processes : Abstract: Multi-output Gaussian process (MOGP) regression allows modelling dependencies among multiple correlated response variables. Similarly to standard Gaussian processes, MOGPs are sensitive to m...
- Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering : Abstract: Recommender systems often struggle with data sparsity and cold-start scenarios, limiting their ability to provide accurate suggestions for new or infrequent users. This paper presents a Grap...
- Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition : Abstract: Object-context shortcuts remain a persistent challenge in vision-language models, undermining zero-shot reliability when test-time scenes differ from familiar training co-occurrences. We rec...
- Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models : Abstract: Large Language Models (LLMs) face significant inference latency challenges stemming from their autoregressive design and large size. To address this, speculative decoding emerges as a soluti...
- Physics-Informed Mixture Models and Surrogate Models for Precision Additive Manufacturing : Abstract: In this study, we leverage a mixture model learning approach to identify defects in laser-based Additive Manufacturing (AM) processes. By incorporating physics based principles, we also ensu...
- Hybrid Physical-Neural Simulator for Fast Cosmological Hydrodynamics : Abstract: Cosmological field-level inference requires differentiable forward models that solve the challenging dynamics of gas and dark matter under hydrodynamics and gravity. We propose a hybrid appr...
- CYPRESS: Crop Yield Prediction via Regression on Prithvi's Encoder for Satellite Sensing : Abstract: Accurate and timely crop yield prediction is crucial for global food security and modern agricultural management. Traditional methods often lack the scalability and granularity required for ...
- Heuristic Adaptation of Potentially Misspecified Domain Support for Likelihood-Free Inference in Stochastic Dynamical Systems : Abstract: In robotics, likelihood-free inference (LFI) can provide the domain distribution that adapts a learnt agent in a parametric set of deployment conditions. LFI assumes an arbitrary support for...
- Action-Driven Processes for Continuous-Time Control : Abstract: At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, ...
- FlowQ-Net: A Generative Framework for Automated Quantum Circuit Design : Abstract: Designing efficient quantum circuits is a central bottleneck to exploring the potential of quantum computing, particularly for noisy intermediate-scale quantum (NISQ) devices, where circuit ...
- Kimi Linear: An Expressive, Efficient Attention Architecture : Abstract: We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-conte...
- Assessment of the conditional exchangeability assumption in causal machine learning models: a simulation study : Abstract: Observational studies developing causal machine learning (ML) models for the prediction of individualized treatment effects (ITEs) seldom conduct empirical evaluations to assess the conditio...
- Value Drifts: Tracing Value Alignment During LLM Post-Training : Abstract: As LLMs occupy an increasingly important role in society, they are more and more confronted with questions that require them not only to draw on their general knowledge but also to align wit...
- Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective : Abstract: Human-AI collaborative tools attract attentions from the data storytelling community to lower the expertise barrier and streamline the workflow. The recent advance in large-scale generative ...
- Language Models can Self-Improve at State-Value Estimation for Better Search : Abstract: Collecting ground-truth rewards or human demonstrations for multi-step reasoning tasks is often prohibitively expensive, particularly in interactive domains such as web tasks. We introduce S...
- MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning? : Abstract: Large foundation models face challenges in acquiring transferable, structured thinking abilities, especially when supervised with rigid templates or crowd-annotated instruction datasets. Unl...
- Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models : Abstract: Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both reg...
- Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models : Abstract: As language models improve and become capable of performing more complex tasks across modalities, evaluating them automatically becomes increasingly challenging. Developing strong and robust...
- M-Prometheus: A Suite of Open Multilingual LLM Judges : Abstract: The use of language models for automatically evaluating long-form text (LLM-as-a-judge) is becoming increasingly common, yet most LLM judges are optimized exclusively for English, with strat...
- Empowering Agentic Video Analytics Systems with Video Language Models : Abstract: AI-driven video analytics has become increasingly important across diverse domains. However, existing systems are often constrained to specific, predefined tasks, limiting their adaptability...
- Toward a Public and Secure Generative AI: A Comparative Analysis of Open and Closed LLMs : Abstract: Generative artificial intelligence (Gen AI) systems represent a critical technology with far-reaching implications across multiple domains of society. However, their deployment entails a ran...
- Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion : Abstract: Algorithmic collusion has emerged as a central question in AI: Will the interaction between different AI agents deployed in markets lead to collusion? More generally, understanding how emerg...
- A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows : Abstract: CodeSight is an end-to-end system designed to anticipate deadline compliance in software development workflows. It captures development and deployment data directly from GitHub, transforming...
- Revisiting Multilingual Data Mixtures in Language Model Pretraining : Abstract: The impact of different multilingual data mixtures in pretraining large language models (LLMs) has been a topic of ongoing debate, often raising concerns about potential trade-offs between l...
- Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi : Abstract: The reliability of routine health data in low and middle-income countries (LMICs) is often constrained by reporting delays and incomplete coverage, necessitating the exploration of novel dat...
- WaveVerif: Acoustic Side-Channel based Verification of Robotic Workflows : Abstract: In this paper, we present a framework that uses acoustic side-channel analysis (ASCA) to monitor and verify whether a robot correctly executes its intended commands. We develop and evaluate ...
- Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer : Abstract: Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current method...
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning : Abstract: Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails w...
- DARTS: A Drone-Based AI-Powered Real-Time Traffic Incident Detection System : Abstract: Rapid and reliable incident detection is critical for reducing crash-related fatalities, injuries, and congestion. However, conventional methods, such as closed-circuit television, dashcam f...
- The Quest for Reliable Metrics of Responsible AI : Abstract: The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified thro...
- Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis : Abstract: Survival analysis is a task to model the time until an event of interest occurs, widely used in clinical and biomedical research. A key challenge is to model patient heterogeneity while also...
- Climate Adaptation-Aware Flood Prediction for Coastal Cities Using Deep Learning : Abstract: Climate change and sea-level rise (SLR) pose escalating threats to coastal cities, intensifying the need for efficient and accurate methods to predict potential flood hazards. Traditional ph...
- RADRON: Cooperative Localization of Ionizing Radiation Sources by MAVs with Compton Cameras : Abstract: We present a novel approach to localizing radioactive material by cooperating Micro Aerial Vehicles (MAVs). Our approach utilizes a state-of-the-art single-detector Compton camera as a highl...
- PORTool: Tool-Use LLM Training with Rewarded Tree : Abstract: Current tool-use large language models (LLMs) are trained on static datasets, enabling them to interact with external tools and perform multi-step, tool-integrated reasoning, which produces ...
- Rethinking Cross-lingual Alignment: Balancing Transfer and Cultural Erasure in Multilingual LLMs : Abstract: Cross-lingual alignment (CLA) aims to align multilingual representations, enabling Large Language Models (LLMs) to seamlessly transfer knowledge across languages. While intuitive, we hypothe...
- Artificial Intelligence-Enabled Analysis of Radiology Reports: Epidemiology and Consequences of Incidental Thyroid Findings : Abstract: Importance Incidental thyroid findings (ITFs) are increasingly detected on imaging performed for non-thyroid indications. Their prevalence, features, and clinical consequences remain undefin...
- SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning : Abstract: The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe...
- Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods : Abstract: Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models. However, its effect on model's robustness against spurious correlations th...
- Dynamic VLM-Guided Negative Prompting for Diffusion Models : Abstract: We propose a novel approach for dynamic negative prompting in diffusion models that leverages Vision-Language Models (VLMs) to adaptively generate negative prompts during the denoising proce...
- Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems : Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection....
- Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization : Abstract: This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization. Unlike conventional approaches that search for optimal parameters within a fix...
- Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism : Abstract: Specialized Generalist Models (SGMs) aim to preserve broad capabilities while achieving expert-level performance in target domains. However, traditional LLM structures including Transformer,...
- Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing : Abstract: Traffic congestion in urban road networks leads to longer trip times and higher emissions, especially during peak periods. While the Shortest Path First (SPF) algorithm is optimal for a sing...
- SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth : Abstract: The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. This amounts to averaging performance geospatially across ...
- Security Risk of Misalignment between Text and Image in Multi-modal Model : Abstract: Despite the notable advancements and versatility of multi-modal diffusion models, such as text-to-image models, their susceptibility to adversarial inputs remains underexplored. Contrary to ...
- EgoExo-Con: Exploring View-Invariant Video Temporal Understanding : Abstract: Can Video-LLMs achieve consistent temporal understanding when videos capture the same event from different viewpoints? To study this, we introduce EgoExo-Con (Consistency), a benchmark of co...
- WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios : Abstract: Vision-based end-to-end (E2E) driving has garnered significant interest in the research community due to its scalability and synergy with multimodal large language models (MLLMs). However, c...
- Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation : Abstract: Large language models (LLMs) have advanced code generation at the function level, yet their ability to produce correct class-level implementations in authentic software projects remains poor...
- MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction : Abstract: Large annotated datasets are essential for training robust Computer-Aided Diagnosis (CAD) models for breast cancer detection or risk prediction. However, acquiring such datasets with fine-de...
- Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment : Abstract: Molecule and text representation learning has gained increasing interest due to its potential for enhancing the understanding of chemical information. However, existing models often struggle...
- Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series : Abstract: In this study, we investigate the effectiveness of advanced feature engineering and hybrid model architectures for anomaly detection in a multivariate industrial time series, focusing on a s...
- Learning to Manage Investment Portfolios beyond Simple Utility Functions : Abstract: While investment funds publicly disclose their objectives in broad terms, their managers optimize for complex combinations of competing goals that go beyond simple risk-return trade-offs. Tr...
- Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis : Abstract: Social media platforms generate massive volumes of heterogeneous data, capturing user behaviors, textual content, temporal dynamics, and network structures. Analyzing such data is crucial fo...
- Accumulative SGD Influence Estimation for Data Attribution : Abstract: Modern data-centric AI needs precise per-sample influence. Standard SGD-IE approximates leave-one-out effects by summing per-epoch surrogates and ignores cross-epoch compounding, which misra...
- ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts : Abstract: Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-g...
- Predicting All-Cause Hospital Readmissions from Medical Claims Data of Hospitalised Patients : Abstract: Reducing preventable hospital readmissions is a national priority for payers, providers, and policymakers seeking to improve health care and lower costs. The rate of readmission is being use...
- Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation : Abstract: While diffusion language models (DLMs) enable fine-grained refinement, their practical controllability remains fragile. We identify and formally characterize a central failure mode called up...
- What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data : Abstract: Human feedback can alter language models in unpredictable and undesirable ways, as practitioners lack a clear understanding of what feedback data encodes. While prior work studies preference...
- Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning : Abstract: Retrieval-augmented generation (RAG) has emerged as a leading approach to reducing hallucinations in large language models (LLMs). Current RAG evaluation benchmarks primarily focus on what w...
- Hybrid LLM and Higher-Order Quantum Approximate Optimization for CSA Collateral Management : Abstract: We address finance-native collateral optimization under ISDA Credit Support Annexes (CSAs), where integer lots, Schedule A haircuts, RA/MTA gating, and issuer/currency/class caps create rugg...
- Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space : Abstract: Test-time alignment of large language models (LLMs) attracts attention because fine-tuning LLMs requires high computational costs. In this paper, we propose a new test-time alignment method ...
- MPRU: Modular Projection-Redistribution Unlearning as Output Filter for Classification Pipelines : Abstract: As a new and promising approach, existing machine unlearning (MU) works typically emphasize theoretical formulations or optimization objectives to achieve knowledge removal. However, when de...
- Angular Steering: Behavior Control via Rotation in Activation Space : Abstract: Controlling specific behaviors in large language models while preserving their general capabilities is a central challenge for safe and reliable artificial intelligence deployment. Current s...
- A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI : Abstract: Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evo...
- Distributional Multi-objective Black-box Optimization for Diffusion-model Inference-time Multi-Target Generation : Abstract: Diffusion models have been successful in learning complex data distributions. This capability has driven their application to high-dimensional multi-objective black-box optimization problem....
- Unravelling the Mechanisms of Manipulating Numbers in Language Models : Abstract: Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented pr...
- Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games : Abstract: OpenAI's ChatGPT Atlas introduces new capabilities for web interaction, enabling the model to analyze webpages, process user intents, and execute cursor and keyboard inputs directly within t...
- Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens : Abstract: Contrastive Language-Image Pre-training (CLIP) delivers strong cross modal generalization by aligning images and texts in a shared embedding space, yet it persistently fails at compositional...
- Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime : Abstract: Adam [Kingma and Ba, 2015] is the de facto optimizer in deep learning, yet its theoretical understanding remains limited. Prior analyses show that Adam favors solutions aligned with $\ell_\i...
- Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics : Abstract: Given a noisy linear measurement $y = Ax + \xi$ of a distribution $p(x)$, and a good approximation to the prior $p(x)$, when can we sample from the posterior $p(x \mid y)$? Posterior samplin...
- From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning : Abstract: Large Language Models (LLMs) excel at general tasks but underperform in specialized domains like economics and psychology, which require deep, principled understanding. To address this, we i...
- GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model? : Abstract: Image super-resolution(SR) is fundamental to many vision system-from surveillance and autonomy to document analysis and retail analytics-because recovering high-frequency details, especially...
- Linear Causal Discovery with Interventional Constraints : Abstract: Incorporating causal knowledge and mechanisms is essential for refining causal models and improving downstream tasks such as designing new treatments. In this paper, we introduce a novel con...
- MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data : Abstract: Health-related misinformation is very prevalent and potentially harmful. It is difficult to identify, especially when claims distort or misinterpret scientific findings. We investigate the i...
- Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle : Abstract: Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstati...
- The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration : Abstract: While a multi-agent approach based on large language models (LLMs) represents a promising strategy to surpass the capabilities of single models, its success is critically dependent on synerg...
- SPG-CDENet: Spatial Prior-Guided Cross Dual Encoder Network for Multi-Organ Segmentation : Abstract: Multi-organ segmentation is a critical task in computer-aided diagnosis. While recent deep learning methods have achieved remarkable success in image segmentation, huge variations in organ s...
- Human-in-the-loop Online Rejection Sampling for Robotic Manipulation : Abstract: Reinforcement learning (RL) is widely used to produce robust robotic manipulation policies, but fine-tuning vision-language-action (VLA) models with RL can be unstable due to inaccurate valu...
- LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation : Abstract: Recently text-to-video generation has made impressive progress in producing short, high-quality clips, but evaluating long-form outputs remains a major challenge especially when processing c...
- SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification : Abstract: The rapid advancement of deep neural networks (DNNs) heavily relies on large-scale, high-quality datasets. However, unauthorized commercial use of these datasets severely violates the intell...
- Personalized Treatment Outcome Prediction from Scarce Data via Dual-Channel Knowledge Distillation and Adaptive Fusion : Abstract: Personalized treatment outcome prediction based on trial data for small-sample and rare patient groups is critical in precision medicine. However, the costly trial data limit the prediction ...
- Robust Graph Condensation via Classification Complexity Mitigation : Abstract: Graph condensation (GC) has gained significant attention for its ability to synthesize smaller yet informative graphs. However, existing studies often overlook the robustness of GC in scenar...
- SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning : Abstract: Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serv...
- Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing : Abstract: Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful traject...
- Bayesian Network Fusion of Large Language Models for Sentiment Analysis : Abstract: Large language models (LLMs) continue to advance, with an increasing number of domain-specific variants tailored for specialised tasks. However, these models often lack transparency and expl...
- Simulating and Experimenting with Social Media Mobilization Using LLM Agents : Abstract: Online social networks have transformed the ways in which political mobilization messages are disseminated, raising new questions about how peer influence operates at scale. Building on the ...
- Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs : Abstract: Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer critical insights but are often unstructured, lexically dense, and filled with ambiguo...
- The Structure of Relation Decoding Linear Operators in Large Language Models : Abstract: This paper investigates the structure of linear operators introduced in Hernandez et al. [2023] that decode specific relational facts in transformer language models. We extend their single-r...
- Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics : Abstract: Conventional robots possess a limited understanding of their kinematics and are confined to preprogrammed tasks, hindering their ability to leverage tools efficiently. Driven by the essentia...
- Multiclass Local Calibration With the Jensen-Shannon Distance : Abstract: Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notion...
- InfoFlow: Reinforcing Search Agent Via Reward Density Optimization : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a promising approach for enhancing agentic deep search. However, its application is often hindered by low \textbf{Reward Density} in ...
- Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems : Abstract: While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and fai...
- ResMatching: Noise-Resilient Computational Super-Resolution via Guided Conditional Flow Matching : Abstract: Computational Super-Resolution (CSR) in fluorescence microscopy has, despite being an ill-posed problem, a long history. At its very core, CSR is about finding a prior that can be used to ex...
- Aeolus: A Multi-structural Flight Delay Dataset : Abstract: We introduce Aeolus, a large-scale Multi-modal Flight Delay Dataset designed to advance research on flight delay prediction and support the development of foundation models for tabular data....
- Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments : Abstract: This paper presents a hierarchical path-planning and control framework that combines a high-level Deep Q-Network (DQN) for discrete sub-goal selection with a low-level Twin Delayed Deep Dete...
- Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models : Abstract: Large language models (LLMs) have demonstrated exceptional capabilities across multiple domains by leveraging massive pre-training and curated fine-tuning data. However, in data-sensitive fi...
- Process Integrated Computer Vision for Real-Time Failure Prediction in Steel Rolling Mill : Abstract: We present a long-term deployment study of a machine vision-based anomaly detection system for failure prediction in a steel rolling mill. The system integrates industrial cameras to monitor...
- The End of Manual Decoding: Towards Truly End-to-End Language Models : Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and ...
- On the limitation of evaluating machine unlearning using only a single training seed : Abstract: Machine unlearning (MU) aims to remove the influence of certain data points from a trained model without costly retraining. Most practical MU algorithms are only approximate and their perfor...
- Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off : Abstract: Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model...
- ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference : Abstract: The expansion of large language models is increasingly limited by the constrained memory capacity of modern GPUs. To mitigate this, Mixture-of-Experts (MoE) architectures activate only a sma...
- A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation : Abstract: We introduce the General Incentives-based Framework for Fairness (GIFF), a novel approach for fair multi-agent resource allocation that infers fair decision-making from standard value functi...
- Deep sequence models tend to memorize geometrically; it is unclear why : Abstract: In sequence modeling, the parametric memory of atomic facts has been predominantly abstracted as a brute-force lookup of co-occurrences between entities. We contrast this associative view ag...
- AMO-Bench: Large Language Models Still Struggle in High School Math Competitions : Abstract: We present AMO-Bench, an Advanced Mathematical reasoning benchmark with Olympiad level or even higher difficulty, comprising 50 human-crafted problems. Existing benchmarks have widely levera...
- STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization : Abstract: Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models. However, accuracy often degrades sharply when activations are quantized bel...
- Faithful and Fast Influence Function via Advanced Sampling : Abstract: How can we explain the influence of training data on black-box models? Influence functions (IFs) offer a post-hoc solution by utilizing gradients and Hessians. However, computing the Hessian...
- Clone Deterministic 3D Worlds with Geometrically-Regularized World Models : Abstract: A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future of both the embodied agent and its environment. Accurate ...
- Remote Labor Index: Measuring AI Automation of Remote Work : Abstract: AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this,...
- Defeating the Training-Inference Mismatch via FP16 : Abstract: Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior ...
- Gistify! Codebase-Level Understanding via Runtime Execution : Abstract: As coding agents are increasingly deployed in large codebases, the need to automatically design challenging, codebase-level evaluation is central. We propose Gistify, a task where a coding L...
- Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark : Abstract: Recent video generation models can produce high-fidelity, temporally coherent videos, indicating that they may encode substantial world knowledge. Beyond realistic synthesis, they also exhib...
- Plasticity as the Mirror of Empowerment : Abstract: Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a ...
- Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling : Abstract: Test-time scaling (TTS) has proven effective in enhancing the reasoning capabilities of large language models (LLMs). Verification plays a key role in TTS, simultaneously influencing (1) rea...
- MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks : Abstract: The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-ag...
- Self-Evolving Curriculum for LLM Reasoning : Abstract: Reinforcement learning (RL) has proven effective for fine-tuning large language models (LLMs), significantly enhancing their reasoning abilities in domains such as mathematics and code gener...
- Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems : Abstract: This position paper argues that the theoretical inconsistency often observed among Responsible AI (RAI) metrics, such as differing fairness definitions or tradeoffs between accuracy and priv...
- TERAG: Token-Efficient Graph-Based Retrieval-Augmented Generation : Abstract: Graph-based Retrieval-augmented generation (RAG) has become a widely studied approach for improving the reasoning, accuracy, and factuality of Large Language Models (LLMs). However, many exi...
- Reward Collapse in Aligning Large Language Models : Abstract: The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, whic...
- VerifIoU - Robustness of Object Detection to Perturbations : Abstract: We introduce a novel Interval Bound Propagation (IBP) approach for the formal verification of object detection models, specifically targeting the Intersection over Union (IoU) metric. The ap...
- Chaos-based reinforcement learning with TD3 : Abstract: Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration. However, the learning algorithms in CBRL have not been thoroughly dev...
- Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models : Abstract: Large Language Models (LLMs), especially those accessed via APIs, have demonstrated impressive capabilities across various domains. However, users without technical expertise often turn to (...
- A mathematical certification for positivity conditions in Neural Networks with applications to partial monotonicity and Trustworthy AI : Abstract: Artificial Neural Networks (ANNs) have become a powerful tool for modeling complex relationships in large-scale datasets. However, their black-box nature poses trustworthiness challenges. In...
- AI's Social Forcefield: Reshaping Distributed Cognition in Human-AI Teams : Abstract: AI is not only a neutral tool in team settings; it actively reshapes the social and cognitive fabric of collaboration. We advance a unified framework of alignment in distributed cognition in...
- Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking : Abstract: Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogu...
- Constrained Posterior Sampling: Time Series Generation with Hard Constraints : Abstract: Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samp...
- Language Model Preference Evaluation with Multiple Weak Evaluators : Abstract: Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding preference remains a critical challenge. While existing works usually leverage a s...
- Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM : Abstract: Passive tracking methods, such as phone and wearable sensing, have become dominant in monitoring human behaviors in modern ubiquitous computing studies. While there have been significant adv...
- In Defence of Post-hoc Explainability : Abstract: This position paper defends post-hoc explainability methods as legitimate tools for scientific knowledge production in machine learning. Addressing criticism of these methods' reliability an...
- UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping : Abstract: In recent research, adversarial attacks on person detectors using patches or static 3D model-based texture modifications have struggled with low success rates due to the flexible nature of h...
- Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data : Abstract: Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains. In practical scenarios, existing methods based on m...
- More of the Same: Persistent Representational Harms Under Increased Representation : Abstract: To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical ...
- Towards Piece-by-Piece Explanations for Chess Positions with SHAP : Abstract: Contemporary chess engines offer precise yet opaque evaluations, typically expressed as centipawn scores. While effective for decision-making, these outputs obscure the underlying contributi...
- An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0 : Abstract: We present a novel framework for Industry 5.0 that simplifies the deployment of AI models on edge devices in various industrial settings. The design reduces latency and avoids external data ...
- Symbolically Scaffolded Play: Designing Role-Sensitive Prompts for Generative NPC Dialogue : Abstract: Large Language Models (LLMs) promise to transform interactive games by enabling non-player characters (NPCs) to sustain unscripted dialogue. Yet it remains unclear whether constrained prompt...
- Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters : Abstract: Large language models (LLMs) are increasingly used as raters for evaluation tasks. However, their reliability is often limited for subjective tasks, when human judgments involve subtle reaso...
- The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence : Abstract: Existing frameworks converge on the centrality of compression to intelligence but leave underspecified why this process enforces the discovery of causal structure rather than superficial sta...
- Approximating Human Preferences Using a Multi-Judge Learned System : Abstract: Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming th...
- SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications : Abstract: Large language models (LLMs) have demonstrated transformative potential in scientific research, yet their deployment in high-stakes contexts raises significant trustworthiness concerns. Here...
- FinOps Agent -- A Use-Case for IT Infrastructure and Cost Optimization : Abstract: FinOps (Finance + Operations) represents an operational framework and cultural practice which maximizes cloud business value through collaborative financial accountability across engineering...
- Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning : Abstract: We introduce Humans-Junior, a 3.8B model that matches GPT-4o on the FACTS Grounding public subset within a $\pm 5$ pp equivalence margin. Results. On Q1--Q500 under identical judges, GPT-4...
- Estimating cognitive biases with attention-aware inverse planning : Abstract: People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to object...
- From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL : Abstract: Natural-language-to-SQL (NL-to-SQL) systems hold promise for democratizing access to structured data, allowing users to query databases without learning SQL. Yet existing systems struggle wi...
- AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys : Abstract: The rapid growth of research literature, particularly in large language models (LLMs), has made producing comprehensive and current survey papers increasingly difficult. This paper introduce...
- Large Language Model-assisted Autonomous Vehicle Recovery from Immobilization : Abstract: Despite significant advancements in recent decades, autonomous vehicles (AVs) continue to face challenges in navigating certain traffic scenarios where human drivers excel. In such situation...
- Can AI be Accountable? : Abstract: The AI we use is powerful, and its power is increasing rapidly. If this powerful AI is to serve the needs of consumers, voters, and decision makers, then it is imperative that the AI is acco...
- Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4 : Abstract: We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physic...
- GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks : Abstract: Large vision language models (VLMs) have advanced graphical user interface (GUI) task automation but still lag behind humans. We hypothesize this gap stems from missing core GUI knowledge, w...
- Beyond Benchmarks: The Economics of AI Inference : Abstract: The inference cost of Large Language Models (LLMs) has become a critical factor in determining their commercial viability and widespread adoption. This paper introduces a quantitative ``econ...
- Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math : Abstract: Reinforcement learning (RL) can elicit strong reasoning in large language models (LLMs), yet most open efforts focus on math and code. We propose Reasoning Curriculum, a simple two-stage cur...
- The FM Agent : Abstract: Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-...
- One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning : Abstract: Reward models (RMs) play a critical role in aligning large language models (LLMs) with human preferences. Yet in the domain of tool learning, the lack of RMs specifically designed for functi...
- Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses : Abstract: Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their s...
- Retrieval Augmented Generation-Enhanced Distributed LLM Agents for Generalizable Traffic Signal Control with Emergency Vehicles : Abstract: With increasing urban traffic complexity, Traffic Signal Control (TSC) is essential for optimizing traffic flow and improving road safety. Large Language Models (LLMs) emerge as promising ap...
- Graph-Enhanced Policy Optimization in LLM Agent Training : Abstract: Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks. Yet, when applied to train multi-turn, interactive LLM agents, these methods...
- GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance : Abstract: Compliance at web scale poses practical challenges: each request may require a regulatory assessment. Regulatory texts (e.g., the General Data Protection Regulation, GDPR) are cross-referent...
- Discovering State Equivalences in UCT Search Trees By Action Pruning : Abstract: One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though ...
- BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning : Abstract: Reinforcement finetuning (RFT) is a key technique for aligning Large Language Models (LLMs) with human preferences and enhancing reasoning, yet its effectiveness is highly sensitive to which...
- AI Mathematician as a Partner in Advancing Mathematical Discovery - A Case Study in Homogenization Theory : Abstract: Artificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we...
- Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings : Abstract: The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets (i.e., tiny benchmarks) that en...
- A Pragmatic View of AI Personhood : Abstract: The emergence of agentic Artificial Intelligence (AI) is set to trigger a "Cambrian explosion" of new kinds of personhood. This paper proposes a pragmatic framework for navigating this diver...
- Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education : Abstract: The rapid growth of programming education has outpaced traditional assessment tools, leaving faculty with limited means to provide meaningful, scalable feedback. Conventional autograders, wh...
- MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders : Abstract: Artificial intelligence in healthcare requires models that are accurate and interpretable. We advance mechanistic interpretability in medical vision by applying Medical Sparse Autoencoders (...
- Chain-of-Thought Hijacking : Abstract: Large reasoning models (LRMs) achieve higher task performance by allocating more inference-time compute, and prior works suggest this scaled reasoning may also strengthen safety by improving...
- Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections : Abstract: Large language models (LLMs) such as ChatGPT are increasingly integrated into high-stakes decision-making, yet little is known about their susceptibility to social influence. We conducted th...
- LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks : Abstract: Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these net...
- Context Engineering 2.0: The Context of Context Engineering : Abstract: Karl Marx once wrote that ``the human essence is the ensemble of social relations'', suggesting that individuals are not isolated entities but are fundamentally shaped by their interactions ...
- Human-AI Complementarity: A Goal for Amplified Oversight : Abstract: Human feedback is critical for aligning AI systems to human values. As AI capabilities improve and AI is used to tackle more challenging tasks, verifying quality and safety becomes increasin...
- EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge : Abstract: We present EdgeRunner 20B, a fine-tuned version of gpt-oss-20b optimized for military tasks. EdgeRunner 20B was trained on 1.6M high-quality records curated from military documentation and w...
- Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling : Abstract: The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interacti...
- Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives : Abstract: Normative reasoning is a type of reasoning that involves normative or deontic modality, such as obligation and permission. While large language models (LLMs) have demonstrated remarkable per...
- The Era of Agentic Organization: Learning to Organize with Language Models : Abstract: We envision a new era of AI, termed agentic organization, where agents solve complex problems by working collaboratively and concurrently, enabling outcomes beyond individual intelligence. T...
- Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching : Abstract: Authorizing Large Language Model driven agents to dynamically invoke tools and access protected resources introduces significant risks, since current methods for delegating authorization gra...
- Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis : Abstract: Multimodal large language models (MLLMs) exhibit a pronounced preference for textual inputs when processing vision-language data, limiting their ability to reason effectively from visual evi...
- Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models : Abstract: This paper presents a comprehensive cross-platform evaluation of reasoning capabilities in contemporary foundation models, establishing an infrastructure-agnostic benchmark across three comp...
- The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy : Abstract: As increasingly capable agents are deployed, a central safety question is how to retain meaningful human control without modifying the underlying system. We study a minimal control interface...
- LLMs Process Lists With General Filter Heads : Abstract: We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that LLMs have learned to encode a compact, causal representation of a general filtering operat...
- Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets : Abstract: As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also rai...
- A Practitioner's Guide to Kolmogorov-Arnold Networks : Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional Multilayer Perceptrons (MLPs), inspired by the Kolmogorov-Arnold representation theorem. Unl...
- LASTIST: LArge-Scale Target-Independent STance dataset : Abstract: Stance detection has emerged as an area of research in the field of artificial intelligence. However, most research is currently centered on the target-dependent stance detection task, which...
- zFLoRA: Zero-Latency Fused Low-Rank Adapters : Abstract: Large language models (LLMs) are increasingly deployed with task-specific adapters catering to multiple downstream applications. In such a scenario, the additional compute associated with th...
- HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series : Abstract: Wearable sensors provide abundant physiological time series, yet the principles governing their predictive utility remain unclear. We hypothesize that temporal resolution is a fundamental ax...
- BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection : Abstract: One of the main challenges in mechanistic interpretability is circuit discovery, determining which parts of a model perform a given task. We build on the Mechanistic Interpretability Benchma...
- Unsupervised local learning based on voltage-dependent synaptic plasticity for resistive and ferroelectric synapses : Abstract: The deployment of AI on edge computing devices faces significant challenges related to energy consumption and functionality. These devices could greatly benefit from brain-inspired learning ...
- The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers? : Abstract: Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We inv...
- Non-myopic Matching and Rebalancing in Large-Scale On-Demand Ride-Pooling Systems Using Simulation-Informed Reinforcement Learning : Abstract: Ride-pooling, also known as ride-sharing, shared ride-hailing, or microtransit, is a service wherein passengers share rides. This service can reduce costs for both passengers and operators a...
- MemEIC: A Step Toward Continual and Compositional Knowledge Editing : Abstract: The dynamic nature of information necessitates continuously updating large vision-language models (LVLMs). While recent knowledge editing techniques hint at promising directions, they often ...
- Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start : Abstract: Reinforcement learning (RL) with verifiable rewards has recently catalyzed a wave of "MLLM-r1" approaches that bring RL to vision language models. Most representative paradigms begin with a ...
- ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion : Abstract: Text-to-image diffusion models often exhibit degraded performance when generating images beyond their training resolution. Recent training-free methods can mitigate this limitation, but they...
- Identity Management for Agentic AI: The new frontier of authorization, authentication, and security for an AI agent world : Abstract: The rapid rise of AI agents presents urgent challenges in authentication, authorization, and identity management. Current agent-centric protocols (like MCP) highlight the demand for clarifie...
- AAGATE: A NIST AI RMF-Aligned Governance Platform for Agentic AI : Abstract: This paper introduces the Agentic AI Governance Assurance & Trust Engine (AAGATE), a Kubernetes-native control plane designed to address the unique security and governance challenges posed b...
- PRISM: Proof-Carrying Artifact Generation through LLM x MDE Synergy and Stratified Constraints : Abstract: PRISM unifies Large Language Models with Model-Driven Engineering to generate regulator-ready artifacts and machine-checkable evidence for safety- and compliance-critical domains. PRISM inte...
- Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting: the Case of FrameNet Annotation : Abstract: The use of LLM-based applications as a means to accelerate and/or substitute human labor in the creation of language resources and dataset is a reality. Nonetheless, despite the potential of...
- Transferring Causal Effects using Proxies : Abstract: We consider the problem of estimating a causal effect in a multi-domain setting. The causal effect of interest is confounded by an unobserved confounder and can change between the different ...
Research Sources: 488 | Generated: 10/31/2025
