AI RESEARCH PAPERS & ACADEMIC SOURCES
- Non-Aligned Reference Image Quality Assessment for Novel View Synthesis : Abstract: Evaluating the perceptual quality of Novel View Synthesis (NVS) images remains a key challenge, particularly in the absence of pixel-aligned ground truth references. Full-Reference Image Qua...
- LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping : Abstract: Land Use and Land Cover (LULC) mapping is a fundamental task in Earth Observation (EO). However, current LULC models are typically developed for a specific modality and a fixed class taxonom...
- Multi-Granularity Mutual Refinement Network for Zero-Shot Learning : Abstract: Zero-shot learning (ZSL) aims to recognize unseen classes with zero samples by transferring semantic knowledge from seen classes. Current approaches typically correlate global visual feature...
- KPLM-STA: Physically-Accurate Shadow Synthesis for Human Relighting via Keypoint-Based Light Modeling : Abstract: Image composition aims to seamlessly integrate a foreground object into a background, where generating realistic and geometrically accurate shadows remains a persistent challenge. While rece...
- Distributed Zero-Shot Learning for Visual Recognition : Abstract: In this paper, we propose a Distributed Zero-Shot Learning (DistZSL) framework that can fully exploit decentralized data to learn an effective model for unseen classes. Considering the data ...
- VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion : Abstract: Detecting visual anomalies in diverse, multi-class real-world images is a significant challenge. We introduce \ours, a novel unsupervised multi-class visual anomaly detection framework. It i...
- WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting : Abstract: 3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis, which requires visible regions with high fidelity and occl...
- Pixel-level Quality Assessment for Oriented Object Detection : Abstract: Modern oriented object detectors typically predict a set of bounding boxes and select the top-ranked ones based on estimated localization quality. Achieving high detection performance requir...
- UI2Code$^\text{N}$: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation : Abstract: User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI cod...
- UCDSC: Open Set UnCertainty aware Deep Simplex Classifier for Medical Image Datasets : Abstract: Driven by advancements in deep learning, computer-aided diagnoses have made remarkable progress. However, outside controlled laboratory settings, algorithms may encounter several challenges....
- Twist and Compute: The Cost of Pose in 3D Generative Diffusion : Abstract: Despite their impressive results, large-scale image-to-3D generative models remain opaque in their inductive biases. We identify a significant limitation in image-conditioned 3D generative m...
- Accurate and Efficient Surface Reconstruction from Point Clouds via Geometry-Aware Local Adaptation : Abstract: Point cloud surface reconstruction has improved in accuracy with advances in deep learning, enabling applications such as infrastructure inspection. Recent approaches that reconstruct from s...
- LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning : Abstract: Text-driven multi-object image editing which aims to precisely modify multiple objects within an image based on text descriptions, has recently attracted considerable interest. Existing work...
- Top2Ground: A Height-Aware Dual Conditioning Diffusion Model for Robust Aerial-to-Ground View Generation : Abstract: Generating ground-level images from aerial views is a challenging task due to extreme viewpoint disparity, occlusions, and a limited field of view. We introduce Top2Ground, a novel diffusion...
- Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation : Abstract: Semantic segmentation has achieved great success in ideal conditions. However, when facing extreme conditions (e.g., insufficient light, fierce camera motion), most existing methods suffer f...
- SWAN - Enabling Fast and Mobile Histopathology Image Annotation through Swipeable Interfaces : Abstract: The annotation of large scale histopathology image datasets remains a major bottleneck in developing robust deep learning models for clinically relevant tasks, such as mitotic figure classif...
- MAUGIF: Mechanism-Aware Unsupervised General Image Fusion via Dual Cross-Image Autoencoders : Abstract: Image fusion aims to integrate structural and complementary information from multi-source images. However, existing fusion methods are often either highly task-specific, or general framework...
- SynWeather: Weather Observation Data Synthesis across Multiple Regions and Variables via a General Diffusion Transformer : Abstract: With the advancement of meteorological instruments, abundant data has become available. Current approaches are typically focus on single-variable, single-region tasks and primarily rely on d...
- SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering : Abstract: Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across ...
- NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos : Abstract: In this paper, we aim to create physical digital twins of deformable objects under interaction. Existing methods focus more on the physical learning of current state modeling, but generalize...
- The Impact of Longitudinal Mammogram Alignment on Breast Cancer Risk Assessment : Abstract: Regular mammography screening is crucial for early breast cancer detection. By leveraging deep learning-based risk models, screening intervals can be personalized, especially for high-risk i...
- Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter : Abstract: Underwater instance segmentation (UIS), integrating pixel-level understanding and instance-level discrimination, is a pivotal technology in marine resource exploration and ecological protect...
- VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation : Abstract: Multi-hop Question Generation (QG) effectively evaluates reasoning but remains confined to text; Video Question Generation (VideoQG) is limited to zero-hop questions over single segments. To...
- Retrospective motion correction in MRI using disentangled embeddings : Abstract: Physiological motion can affect the diagnostic quality of magnetic resonance imaging (MRI). While various retrospective motion correction methods exist, many struggle to generalize across di...
- OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild : Abstract: A truly universal AI-Generated Image (AIGI) detector must simultaneously generalize across diverse generative models and varied semantic content. Current state-of-the-art methods learn a sin...
- Cross-pyramid consistency regularization for semi-supervised medical image segmentation : Abstract: Semi-supervised learning (SSL) enables training of powerful models with the assumption of limited, carefully labelled data and a large amount of unlabeled data to support the learning. In th...
- Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding : Abstract: Vision-language models advance multimodal representation learning by acquiring transferable semantic embeddings, thereby substantially enhancing performance across a range of vision-language...
- Fast Multi-Organ Fine Segmentation in CT Images with Hierarchical Sparse Sampling and Residual Transformer : Abstract: Multi-organ segmentation of 3D medical images is fundamental with meaningful applications in various clinical automation pipelines. Although deep learning has achieved superior performance, ...
- UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist : Abstract: While specialized AI models excel at isolated video tasks like generation or understanding, real-world applications demand complex, iterative workflows that combine these capabilities. To br...
- 3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation : Abstract: We introduce 3D4D, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core m...
- RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses : Abstract: Accurate 3D reconstruction from multi-view images is essential for downstream robotic tasks such as navigation, manipulation, and environment understanding. However, obtaining precise camera...
- Vision Transformer Based User Equipment Positioning : Abstract: Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the enti...
- A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation : Abstract: The rapid expansion of online fashion platforms has created an increasing demand for intelligent recommender systems capable of understanding both visual and textual cues. This paper propose...
- RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph : Abstract: Estimating robot pose from a monocular RGB image is a challenge in robotics and computer vision. Existing methods typically build networks on top of 2D visual backbones and depend heavily on...
- Deep Learning Analysis of Prenatal Ultrasound for Identification of Ventriculomegaly : Abstract: The proposed study aimed to develop a deep learning model capable of detecting ventriculomegaly on prenatal ultrasound images. Ventriculomegaly is a prenatal condition characterized by dilat...
- DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression : Abstract: Prevailing quantization techniques in Learned Image Compression (LIC) typically employ a static, uniform bit-width across all layers, failing to adapt to the highly diverse data distribution...
- From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression : Abstract: Recent implicit neural representation (INR)-based image compression methods have shown competitive performance by overfitting image-specific latent codes. However, they remain inferior to en...
- Re$^{\text{2}}$MaP: Macro Placement by Recursively Prototyping and Packing Tree-based Relocating : Abstract: This work introduces the Re$^{\text{2}}$MaP method, which generates expert-quality macro placements through recursively prototyping and packing tree-based relocating. We first perform multi-...
- DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion : Abstract: Object detection has wide applications in agriculture, but domain shifts of diverse environments limit the broader use of the trained models. Existing domain adaptation methods usually requi...
- MMCL: Correcting Content Query Distributions for Improved Anti-Overlapping X-Ray Object Detection : Abstract: Unlike natural images with occlusion-based overlap, X-ray images exhibit depth-induced superimposition and semi-transparent appearances, where objects at different depths overlap and their f...
- One Homography is All You Need: IMM-based Joint Homography and Multiple Object State Estimation : Abstract: A novel online MOT algorithm, IMM Joint Homography State Estimation (IMM-JHSE), is proposed. IMM-JHSE uses an initial homography estimate as the only additional 3D information, whereas other...
- Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach : Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable progress in visual understanding. This impressive leap raises a compelling question: how can lang...
- Bridged Semantic Alignment for Zero-shot 3D Medical Image Diagnosis : Abstract: 3D medical images such as computed tomography are widely used in clinical practice, offering a great potential for automatic diagnosis. Supervised learning-based approaches have achieved sig...
- X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding : Abstract: Long-form egocentric video understanding provides rich contextual information and unique insights into long-term human behaviors, holding significant potential for applications in embodied i...
- CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors : Abstract: Prohibited item detection based on X-ray images is one of the most effective security inspection methods. However, the foreground-background feature coupling caused by the overlapping phenom...
- TransParking: A Dual-Decoder Transformer Framework with Soft Localization for End-to-End Automatic Parking : Abstract: In recent years, fully differentiable end-to-end autonomous driving systems have become a research hotspot in the field of intelligent transportation. Among various research directions, auto...
- Systematic Literature Review on Vehicular Collaborative Perception - A Computer Vision Perspective : Abstract: The effectiveness of autonomous vehicles relies on reliable perception capabilities. Despite significant advancements in artificial intelligence and sensor fusion technologies, current singl...
- CountingDINO: A Training-free Pipeline for Class-Agnostic Counting using Unsupervised Backbones : Abstract: Class-agnostic counting (CAC) aims to estimate the number of objects in images without being restricted to predefined categories. However, while current exemplar-based CAC methods offer flex...
- A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior : Abstract: Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synt...
- DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction : Abstract: This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-a...
- Visual Explanation via Similar Feature Activation for Metric Learning : Abstract: Visual explanation maps enhance the trustworthiness of decisions made by deep learning models and offer valuable guidance for developing new algorithms in image recognition tasks. Class acti...
- Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence : Abstract: Semantic correspondence (SC) aims to establish semantically meaningful matches across different instances of an object category. We illustrate how recent supervised SC methods remain limited...
- X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability : Abstract: Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consist...
- PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications : Abstract: Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, ...
- LidarPainter: One-Step Away From Any Lidar View To Novel Guidance : Abstract: Dynamic driving scene reconstruction is of great importance in fields like digital twin system and autonomous driving simulation. However, unacceptable degradation occurs when the view devia...
- AVAR-Net: A Lightweight Audio-Visual Anomaly Recognition Framework with a Benchmark Dataset : Abstract: Anomaly recognition plays a vital role in surveillance, transportation, healthcare, and public safety. However, most existing approaches rely solely on visual data, making them unreliable un...
- Perceptual Quality Assessment of 3D Gaussian Splatting: A Subjective Dataset and Prediction Metric : Abstract: With the rapid advancement of 3D visualization, 3D Gaussian Splatting (3DGS) has emerged as a leading technique for real-time, high-fidelity rendering. While prior research has emphasized al...
- WEDepth: Efficient Adaptation of World Knowledge for Monocular Depth Estimation : Abstract: Monocular depth estimation (MDE) has widely applicable but remains highly challenging due to the inherently ill-posed nature of reconstructing 3D scenes from single 2D images. Modern Vision ...
- Generalized-Scale Object Counting with Gradual Query Aggregation : Abstract: Few-shot detection-based counters estimate the number of instances in the image specified only by a few test-time exemplars. A common approach to localize objects across multiple sizes is to...
- I2E: Real-Time Image-to-Event Conversion for High-Performance Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) promise highly energy-efficient computing, but their adoption is hindered by a critical scarcity of event-stream data. This work introduces I2E, an algorithmic...
- Introducing Nylon Face Mask Attacks: A Dataset for Evaluating Generalised Face Presentation Attack Detection : Abstract: Face recognition systems are increasingly deployed across a wide range of applications, including smartphone authentication, access control, and border security. However, these systems remai...
- LatentPrintFormer: A Hybrid CNN-Transformer with Spatial Attention for Latent Fingerprint identification : Abstract: Latent fingerprint identification remains a challenging task due to low image quality, background noise, and partial impressions. In this work, we propose a novel identification approach cal...
- PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions : Abstract: Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low sp...
- Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images : Abstract: Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings,...
- The Trilemma of Truth in Large Language Models : Abstract: The public often attributes human-like qualities to large language models (LLMs) and assumes they "know" certain things. In reality, LLMs encode information retained during training as inter...
- Provably data-driven projection method for quadratic programming : Abstract: Projection methods aim to reduce the dimensionality of the optimization instance, thereby improving the scalability of high-dimensional problems. Recently, Sakaue and Oki proposed a data-dri...
- Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey : Abstract: Scientific idea generation lies at the heart of scientific discovery and has driven human progress-whether by solving unsolved problems or proposing novel hypotheses to explain unknown pheno...
- LLM Optimization Unlocks Real-Time Pairwise Reranking : Abstract: Efficiently reranking documents retrieved from information retrieval (IR) pipelines to enhance overall quality of Retrieval-Augmented Generation (RAG) system remains an important yet challen...
- LLMs vs. Traditional Sentiment Tools in Psychology: An Evaluation on Belgian-Dutch Narratives : Abstract: Understanding emotional nuances in everyday language is crucial for computational linguistics and emotion research. While traditional lexicon-based tools like LIWC and Pattern have served as...
- Critical Confabulation: Can LLMs Hallucinate for Social Good? : Abstract: LLMs hallucinate, yet some confabulations can have social affordances if carefully bounded. We propose critical confabulation (inspired by critical fabulation from literary and social theory...
- Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production : Abstract: Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-un...
- Design, Results and Industry Implications of the World's First Insurance Large Language Model Evaluation Benchmark : Abstract: This paper comprehensively elaborates on the construction methodology, multi-dimensional evaluation system, and underlying design philosophy of CUFEInse v1.0. Adhering to the principles of "...
- From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory : Abstract: Large Language Models (LLMs) based agents have demonstrated remarkable potential in autonomous task-solving across complex, open-ended environments. A promising approach for improving the re...
- AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys : Abstract: Understanding human attitudes, preferences, and behaviors through social surveys is essential for academic research and policymaking. Yet traditional surveys face persistent challenges, incl...
- Planned Event Forecasting using Future Mentions and Related Entity Extraction in News Articles : Abstract: In democracies like India, people are free to express their views and demands. Sometimes this causes situations of civil unrest such as protests, rallies, and marches. These events may be di...
- Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification : Abstract: A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge c...
- Last Layer Logits to Logic: Empowering LLMs with Logic-Consistent Structured Knowledge Reasoning : Abstract: Large Language Models (LLMs) achieve excellent performance in natural language reasoning tasks through pre-training on vast unstructured text, enabling them to understand the logic in natura...
- Social Media for Mental Health: Data, Methods, and Findings : Abstract: There is an increasing number of virtual communities and forums available on the web. With social media, people can freely communicate and share their thoughts, ask personal questions, and s...
- Distinct Theta Synchrony across Speech Modes: Perceived, Spoken, Whispered, and Imagined : Abstract: Human speech production encompasses multiple modes such as perceived, overt, whispered, and imagined, each reflecting distinct neural mechanisms. Among these, theta-band synchrony has been c...
- Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker : Abstract: Workforce transformation across diverse industries has driven an increased demand for specialized natural language processing capabilities. Nevertheless, tasks derived from work-related cont...
- HyCoRA: Hyper-Contrastive Role-Adaptive Learning for Role-Playing : Abstract: Multi-character role-playing aims to equip models with the capability to simulate diverse roles. Existing methods either use one shared parameterized module across all roles or assign a sepa...
- Estranged Predictions: Measuring Semantic Category Disruption with Masked Language Modelling : Abstract: This paper examines how science fiction destabilises ontological categories by measuring conceptual permeability across the terms human, animal, and machine using masked language modelling (...
- Multimodal LLMs Do Not Compose Skills Optimally Across Modalities : Abstract: Skill composition is the ability to combine previously learned skills to solve new tasks. As neural networks acquire increasingly complex skills during their pretraining, it is not clear how...
- Quantification and object perception in Multimodal Large Language Models deviate from human linguistic cognition : Abstract: Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the log...
- Sentence-Anchored Gist Compression for Long-Context LLMs : Abstract: This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We d...
- On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility : Abstract: Language model architectures are predominantly first created for English and subsequently applied to other languages. It is an open question whether this architectural bias leads to degraded...
- Still Not There: Can LLMs Outperform Smaller Task-Specific Seq2Seq Models on the Poetry-to-Prose Conversion Task? : Abstract: Large Language Models (LLMs) are increasingly treated as universal, general-purpose solutions across NLP tasks, particularly in English. But does this assumption hold for low-resource, morph...
- Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models? : Abstract: We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syn...
- Encoder Fine-tuning with Stochastic Sampling Outperforms Open-weight GPT in Astronomy Knowledge Extraction : Abstract: Scientific literature in astronomy is rapidly expanding, making it increasingly important to automate the extraction of key entities and contextual information from research papers. In this ...
- VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context : Abstract: The development of multi-modal large language models (LLMs) leads to intelligent approaches capable of speech interactions. As one of the most widely spoken languages globally, Mandarin is s...
- ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech : Abstract: Parliamentary speech generation presents specific challenges for large language models beyond standard text generation tasks. Unlike general text generation, parliamentary speeches require n...
- Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments : Abstract: This work investigates the ability of Vision Large Language Models (VLLMs) to understand and interpret the structure of tables in scientific articles. Specifically, we explore whether VLLMs ...
- Automatic Paper Reviewing with Heterogeneous Graph Reasoning over LLM-Simulated Reviewer-Author Debates : Abstract: Existing paper review methods often rely on superficial manuscript features or directly on large language models (LLMs), which are prone to hallucinations, biased scoring, and limited reason...
- The Dynamic Articulatory Model DYNARTmo: Dynamic Movement Generation and Speech Gestures : Abstract: This paper describes the current implementation of the dynamic articulatory model DYNARTmo, which generates continuous articulator movements based on the concept of speech gestures and a cor...
- TurkEmbed: Turkish Embedding Model on NLI & STS Tasks : Abstract: This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Simila...
- PCRLLM: Proof-Carrying Reasoning with Large Language Models under Stepwise Logical Constraints : Abstract: Large Language Models (LLMs) often exhibit limited logical coherence, mapping premises to conclusions without adherence to explicit inference rules. We propose Proof-Carrying Reasoning with ...
- Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios? : Abstract: While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. I...
- AlphaResearch: Accelerating New Algorithm Discovery with Language Models : Abstract: Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaRese...
- Investigating CoT Monitorability in Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks by engaging in extended reasoning before producing final answers. Beyond improving abilities, these de...
- From Semantic Roles to Opinion Roles: SRL Data Extraction for Multi-Task and Transfer Learning in Low-Resource ORL : Abstract: This report presents a detailed methodology for constructing a high-quality Semantic Role Labeling (SRL) dataset from the Wall Street Journal (WSJ) portion of the OntoNotes 5.0 corpus and ad...
- LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation : Abstract: Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing a...
- Quantifying the Impact of CU: A Systematic Literature Review : Abstract: Community Unionism has served as a pivotal concept in debates on trade union renewal since the early 2000s, yet its theoretical coherence and political significance remain unresolved. This a...
- A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain : Abstract: Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concer...
- CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis : Abstract: Sentiments about the reproducibility of cited papers in downstream literature offer community perspectives and have shown as a promising signal of the actual reproducibility of published fin...
- BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives : Abstract: Hard negatives are essential for training effective retrieval models. Hard-negative mining typically relies on ranking documents using cross-encoders or static embedding models based on simi...
- Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR : Abstract: We challenge the conventional view of neural network pruning as solely a compression technique, demonstrating that one-shot magnitude pruning serves as a powerful implicit regularizer for AS...
- Quantizing Whisper-small: How design choices affect ASR performance : Abstract: Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified...
- Generative Artificial Intelligence in Qualitative Research Methods: Between Hype and Risks? : Abstract: As Artificial Intelligence (AI) is increasingly promoted and used in qualitative research, it also raises profound methodological issues. This position paper critically interrogates the role...
- How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity : Abstract: Current safety evaluations for LLM-driven agents primarily focus on atomic harms, failing to address sophisticated threats where malicious intent is concealed or diluted within complex tasks...
- Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering : Abstract: Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges ...
- VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use : Abstract: While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual...
- From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models : Abstract: Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visi...
- Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction : Abstract: Recent progress in large language models (LLMs) has enabled the automated processing of lengthy documents even without supervised training on a task-specific dataset. Yet, their zero-shot pe...
- Thus Spake Long-Context Large Language Model : Abstract: Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLM...
- Figurative Archive: an open dataset and web-based application for the study of metaphor : Abstract: Research on metaphor has steadily increased over the last decades, as this phenomenon opens a window into a range of linguistic and cognitive processes. At the same time, the demand for rigo...
- ENCORE: Entropy-guided Reward Composition for Multi-head Safety Reward Models : Abstract: The safety alignment of large language models (LLMs) often relies on reinforcement learning from human feedback (RLHF), which requires human annotations to construct preference datasets. Giv...
- CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment : Abstract: Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where confli...
- ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation : Abstract: Evaluating the quality of generated text automatically remains a significant challenge. Conventional reference-based metrics have been shown to exhibit relatively weak correlation with human...
- Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives : Abstract: Standard topic models often struggle to capture culturally specific nuances in text. This study evaluates the effectiveness of contextual embeddings for identifying culturally resonant theme...
- UniEdit: A Unified Knowledge Editing Benchmark for Large Language Models : Abstract: Model editing aims to enhance the accuracy and reliability of large language models (LLMs) by efficiently adjusting their internal parameters. Currently, most LLM editing datasets are confin...
- Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions : Abstract: We have only limited understanding of how and why large language models (LLMs) respond in the ways that they do. Their neural networks have proven challenging to interpret, and we are only b...
- p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models : Abstract: Table question answering (TQA) focuses on answering questions based on tabular data. Developing TQA systems targets effective interaction with tabular data for tasks such as cell retrieval a...
- MIDB: Multilingual Instruction Data Booster for Enhancing Cultural Equality in Multilingual Instruction Synthesis : Abstract: Despite doubts on data quality, instruction synthesis has been widely applied into instruction tuning (IT) of LLMs as an economic and rapid alternative. Recent endeavors focus on improving d...
- From Anger to Joy: How Nationality Personas Shape Emotion Attribution in Large Language Models : Abstract: Emotions are a fundamental facet of human experience, varying across individuals, cultural contexts, and nationalities. Given the recent success of Large Language Models (LLMs) as role-playi...
- LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward : Abstract: Navigation instruction generation for visually impaired (VI) individuals (NIG-VI) is critical yet relatively underexplored. This study focuses on generating precise, in-situ, step-by-step na...
- LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs : Abstract: Large Language Diffusion Models, or diffusion LLMs, have emerged as a significant focus in NLP research, with substantial effort directed toward understanding their scalability and downstrea...
- REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing : Abstract: Large Language Models (LLMs) face an inherent challenge: their knowledge is confined to the data that they have been trained on. To overcome this issue, Retrieval-Augmented Generation (RAG) ...
- Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning : Abstract: This paper investigates the relationship between large language models' (LLMs) ability to recognize repetitive input patterns and their performance on in-context learning (ICL). In contrast ...
- OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles : Abstract: We introduce OpenVLThinker, one of the first open-source large vision-language models (LVLMs) to exhibit sophisticated chain-of-thought reasoning, achieving notable performance gains on chal...
- MPMA: Preference Manipulation Attack Against Model Context Protocol : Abstract: Model Context Protocol (MCP) standardizes interface mapping for large language models (LLMs) to access external data and tools, which revolutionizes the paradigm of tool selection and facili...
- Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM : Abstract: Cryo-electron microscopy (cryo-EM) is a powerful imaging technique for reconstructing three-dimensional molecular structures from noisy tomographic projection images of randomly oriented par...
- LiveNeRF: Efficient Face Replacement Through Neural Radiance Fields Integration : Abstract: Face replacement technology enables significant advancements in entertainment, education, and communication applications, including dubbing, virtual avatars, and cross-cultural content adapt...
- TrackStudio: An Integrated Toolkit for Markerless Tracking : Abstract: Markerless motion tracking has advanced rapidly in the past 10 years and currently offers powerful opportunities for behavioural, clinical, and biomechanical research. While several speciali...
- Predicting Coronary Artery Calcium Severity based on Non-Contrast Cardiac CT images using Deep Learning : Abstract: Cardiovascular disease causes high rates of mortality worldwide. Coronary artery calcium (CAC) scoring is a powerful tool to stratify the risk of atherosclerotic cardiovascular disease. Curr...
- FlowFeat: Pixel-Dense Embedding of Motion Profiles : Abstract: Dense and versatile image representations underpin the success of virtually all computer vision applications. However, state-of-the-art networks, such as transformers, produce low-resolution...
- Cross Modal Fine-grained Alignment via Granularity-aware and Region-uncertain Modeling : Abstract: Fine-grained image-text alignment is a pivotal challenge in multimodal learning, underpinning key applications such as visual question answering, image captioning, and vision-language naviga...
- VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics : Abstract: We introduce VectorSynth, a diffusion-based framework for pixel-accurate satellite image synthesis conditioned on polygonal geographic annotations with semantic attributes. Unlike prior text...
- Class Incremental Medical Image Segmentation via Prototype-Guided Calibration and Dual-Aligned Distillation : Abstract: Class incremental medical image segmentation (CIMIS) aims to preserve knowledge of previously learned classes while learning new ones without relying on old-class labels. However, existing m...
- Beyond Randomness: Understand the Order of the Noise in Diffusion : Abstract: In text-driven content generation (T2C) diffusion model, semantic of generated content is mostly attributed to the process of text embedding and attention mechanism interaction. The initial ...
- Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation : Abstract: Cross-domain few-shot segmentation (CD-FSS) aims to tackle the dual challenge of recognizing novel classes and adapting to unseen domains with limited annotations. However, encoder features ...
- Learning Sparse Label Couplings for Multilabel Chest X-Ray Diagnosis : Abstract: We study multilabel classification of chest X-rays and present a simple, strong pipeline built on SE-ResNeXt101 $(32 \times 4d)$. The backbone is finetuned for 14 thoracic findings with a si...
- PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier : Abstract: Diffusion models have achieved remarkable success in conditional image generation, yet their outputs often remain misaligned with human preferences. To address this, recent work has applied ...
- DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model : Abstract: Although significant advances have been achieved in SAR land-cover classification, recent methods remain predominantly focused on supervised learning, which relies heavily on extensive label...
- Revisiting MLLM Based Image Quality Assessment: Errors and Remedy : Abstract: The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the...
- Cancer-Net PCa-MultiSeg: Multimodal Enhancement of Prostate Cancer Lesion Segmentation Using Synthetic Correlated Diffusion Imaging : Abstract: Current deep learning approaches for prostate cancer lesion segmentation achieve limited performance, with Dice scores of 0.32 or lower in large patient cohorts. To address this limitation, ...
- Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy : Abstract: Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we prop...
- CloudMamba: Grouped Selective State Spaces for Point Cloud Analysis : Abstract: Due to the long-range modeling ability and linear complexity property, Mamba has attracted considerable attention in point cloud analysis. Despite some interesting progress, related work sti...
- MonoCLUE : Object-Aware Clustering Enhances Monocular 3D Object Detection : Abstract: Monocular 3D object detection offers a cost-effective solution for autonomous driving but suffers from ill-posed depth and limited field of view. These constraints cause a lack of geometric ...
- Visual Bridge: Universal Visual Perception Representations Generating : Abstract: Recent advances in diffusion models have achieved remarkable success in isolated computer vision tasks such as text-to-image generation, depth estimation, and optical flow. However, these mo...
- Theoretical Analysis of Power-law Transformation on Images for Text Polarity Detection : Abstract: Several computer vision applications like vehicle license plate recognition, captcha recognition, printed or handwriting character recognition from images etc., text polarity detection and b...
- HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving : Abstract: Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving, enabling voxelized 3D scene understanding for effective scene perception and decision-making. Exis...
- An Image-Based Path Planning Algorithm Using a UAV Equipped with Stereo Vision : Abstract: This paper presents a novel image-based path planning algorithm that was developed using computer vision techniques, as well as its comparative analysis with well-known deterministic and pro...
- Federated CLIP for Resource-Efficient Heterogeneous Medical Image Classification : Abstract: Despite the remarkable performance of deep models in medical imaging, they still require source data for training, which limits their potential in light of privacy concerns. Federated learni...
- Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers : Abstract: With the development of diffusion models, enhancing spatial controllability in text-to-image generation has become a vital challenge. As a representative task for addressing this challenge, ...
- Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation? : Abstract: Talking Face Generation (TFG) aims to produce realistic and dynamic talking portraits, with broad applications in fields such as digital education, film and television production, e-commerce...
- ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification : Abstract: Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer-based methods have successfully addressed some limitations of convolut...
- Burst Image Quality Assessment: A New Benchmark and Unified Framework for Multiple Downstream Tasks : Abstract: In recent years, the development of burst imaging technology has improved the capture and processing capabilities of visual data, enabling a wide range of applications. However, the redundan...
- Multi-Modal Assistance for Unsupervised Domain Adaptation on Point Cloud 3D Object Detection : Abstract: Unsupervised domain adaptation for LiDAR-based 3D object detection (3D UDA) based on the teacher-student architecture with pseudo labels has achieved notable improvements in recent years. Al...
- DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion : Abstract: Point cloud completion aims to recover missing geometric structures from incomplete 3D scans, which often suffer from occlusions or limited sensor viewpoints. Existing methods typically assu...
- ChexFract: From General to Specialized - Enhancing Fracture Description Generation : Abstract: Generating accurate and clinically meaningful radiology reports from chest X-ray images remains a significant challenge in medical AI. While recent vision-language models achieve strong resu...
- CSF-Net: Context-Semantic Fusion Network for Large Mask Inpainting : Abstract: In this paper, we propose a semantic-guided framework to address the challenging problem of large-mask image inpainting, where essential visual content is missing and contextual cues are lim...
- EAGLE: Episodic Appearance- and Geometry-aware Memory for Unified 2D-3D Visual Query Localization in Egocentric Vision : Abstract: Egocentric visual query localization is vital for embodied AI and VR/AR, yet remains challenging due to camera motion, viewpoint changes, and appearance variations. We present EAGLE, a novel...
- High-Quality Proposal Encoding and Cascade Denoising for Imaginary Supervised Object Detection : Abstract: Object detection models demand large-scale annotated datasets, which are costly and labor-intensive to create. This motivated Imaginary Supervised Object Detection (ISOD), where models train...
- Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks : Abstract: Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box ...
- Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics : Abstract: Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focus...
- PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure : Abstract: We revisit the problem of generating synthetic data under differential privacy. To address the core limitations of marginal-based methods, we propose the Private Adaptive Generative Adversar...
- Foam Segmentation in Wastewater Treatment Plants: A Federated Learning Approach with Segment Anything Model 2 : Abstract: Foam formation in Wastewater Treatment Plants (WTPs) is a major challenge that can reduce treatment efficiency and increase costs. The ability to automatically examine changes in real-time w...
- Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation : Abstract: Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annota...
- Good flavor search in $SU(5)$: a machine learning approach : Abstract: We revisit the fermion mass problem of the $SU(5)$ grand unified theory using machine learning techniques. The original $SU(5)$ model proposed by Georgi and Glashow is incompatible with the ...
- Proof Minimization in Neural Network Verification : Abstract: The widespread adoption of deep neural networks (DNNs) requires efficient techniques for verifying their safety. DNN verifiers are complex tools, which might contain bugs that could compromi...
- From Classical to Hybrid: A Practical Framework for Quantum-Enhanced Learning : Abstract: This work addresses the challenge of enabling practitioners without quantum expertise to transition from classical to hybrid quantum-classical machine learning workflows. We propose a three-...
- Evaluating Gemini LLM in Food Image-Based Recipe and Nutrition Description with EfficientNet-B4 Visual Backbone : Abstract: The proliferation of digital food applications necessitates robust methods for automated nutritional analysis and culinary guidance. This paper presents a comprehensive comparative evaluatio...
- Emulating Radiative Transfer in Astrophysical Environments : Abstract: Radiative transfer is a fundamental process in astrophysics, essential for both interpreting observations and modeling thermal and dynamical feedback in simulations via ionizing radiation an...
-
A Fast and Accurate Approach for Covariance Matrix Construction
: Abstract: Reichel (2025) defined the Bariance as $\mathrm{Bariance}(x)=\frac{1}{n(n-1)}\sum_{i
- Prompt Tuning for Natural Language to SQL with Embedding Fine-Tuning and RAG : Abstract: This paper introduces an Error Correction through Prompt Tuning for NL-to-SQL, leveraging the latest advancements in generative pre-training-based LLMs and RAG. Our work addresses the crucia...
- Uncertainty Calibration of Multi-Label Bird Sound Classifiers : Abstract: Passive acoustic monitoring enables large-scale biodiversity assessment, but reliable classification of bioacoustic sounds requires not only high accuracy but also well-calibrated uncertaint...
- X-IONet: Cross-Platform Inertial Odometry Network with Dual-Stage Attention : Abstract: Learning-based inertial odometry has achieved remarkable progress in pedestrian navigation. However, extending these methods to quadruped robots remains challenging due to their distinct and...
- Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates via Generalized Riesz Regression : Abstract: This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also un...
- Concentration bounds on response-based vector embeddings of black-box generative models : Abstract: Generative models, such as large language models or text-to-image diffusion models, can generate relevant responses to user-given queries. Response-based vector embeddings of generative mode...
- BDD2Seq: Enabling Scalable Reversible-Circuit Synthesis via Graph-to-Sequence Learning : Abstract: Binary Decision Diagrams (BDDs) are instrumental in many electronic design automation (EDA) tasks thanks to their compact representation of Boolean functions. In BDD-based reversible-circuit...
- Mitigating Negative Flips via Margin Preserving Training : Abstract: Minimizing inconsistencies across successive versions of an AI system is as crucial as reducing the overall error. In image classification, such inconsistencies manifest as negative flips, w...
- AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress : Abstract: Despite rapid development, large language models (LLMs) still encounter challenges in multi-turn decision-making tasks (i.e., agent tasks) like web shopping and browser navigation, which req...
- Revisiting Network Traffic Analysis: Compatible network flows for ML models : Abstract: To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features...
- Revealing the Hidden Third Dimension of Point Defects in Two-Dimensional MXenes : Abstract: Point defects govern many important functional properties of two-dimensional (2D) materials. However, resolving the three-dimensional (3D) arrangement of these defects in multi-layer 2D mate...
- An Information-Minimal Geometry for Qubit-Efficient Optimization : Abstract: Qubit-efficient optimization seeks to represent an $N$-variable combinatorial problem within a Hilbert space smaller than $2^N$, using only as much quantum structure as the objective itself ...
- Source-Optimal Training is Transfer-Suboptimal : Abstract: We prove a fundamental misalignment in transfer learning: the source regularization that minimizes source risk almost never coincides with the regularization maximizing transfer benefit. Thr...
- Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications : Abstract: Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The eme...
- Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs : Abstract: Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with h...
- Identification of Empirical Constitutive Models for Age-Hardenable Aluminium Alloy and High-Chromium Martensitic Steel Using Symbolic Regression : Abstract: Process-structure-property relationships are fundamental in materials science and engineering and are key to the development of new and improved materials. Symbolic regression serves as a po...
- Galactification: painting galaxies onto dark matter only simulations using a transformer-based model : Abstract: Connecting the formation and evolution of galaxies to the large-scale structure is crucial for interpreting cosmological observations. While hydrodynamical simulations accurately model the c...
- Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN : Abstract: This paper presents a comprehensive methodology and comparative performance analysis for the automated classification and object detection of peripheral blood cells (PBCs) in microscopic ima...
- Toward Autonomous and Efficient Cybersecurity: A Multi-Objective AutoML-based Intrusion Detection System : Abstract: With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks. The ra...
- Structured RAG for Answering Aggregative Questions : Abstract: Retrieval-Augmented Generation (RAG) has become the dominant approach for answering questions over large corpora. However, current datasets and methods are highly focused on cases where only...
- CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing : Abstract: Mastering fine-grained visual recognition, essential in many expert domains, can require that specialists undergo years of dedicated training. Modeling the progression of such expertize in h...
- SeFA-Policy: Fast and Accurate Visuomotor Policy Learning with Selective Flow Alignment : Abstract: Developing efficient and accurate visuomotor policies poses a central challenge in robotic imitation learning. While recent rectified flow approaches have advanced visuomotor policy learning...
- Multiplicative Reweighting for Robust Neural Network Optimization : Abstract: Neural networks are widespread due to their powerful performance. Yet, they degrade in the presence of noisy labels at training time. Inspired by the setting of learning with expert advice, ...
- Hierarchical Deep Counterfactual Regret Minimization : Abstract: Imperfect Information Games (IIGs) offer robust models for scenarios where decision-makers face uncertainty or lack complete information. Counterfactual Regret Minimization (CFR) has been on...
- Pruning at Initialization -- A Sketching Perspective : Abstract: The lottery ticket hypothesis (LTH) has increased attention to pruning neural networks at initialization. We study this problem in the linear setting. We show that finding a sparse mask at i...
- Efficient Deep Learning with Decorrelated Backpropagation : Abstract: The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant comput...
- ElastoGen: 4D Generative Elastodynamics : Abstract: We present ElastoGen, a knowledge-driven AI model that generates physically accurate 4D elastodynamics. Unlike deep models that learn from video- or image-based observations, ElastoGen lever...
- Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics : Abstract: On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equati...
- Certified Robust Invariant Polytope Training in Neural Controlled ODEs : Abstract: We consider a nonlinear control system modeled as an ordinary differential equation subject to disturbance, with a state feedback controller parameterized as a feedforward neural network. We...
- SPO-VCS: An End-to-End Smart Predict-then-Optimize Framework with Alternating Differentiation Method for Relocation Problems in Large-Scale Vehicle Crowd Sensing : Abstract: Ubiquitous mobile devices have catalyzed the development of vehicle crowd sensing (VCS). In particular, vehicle sensing systems show great potential in the flexible acquisition of spatio-tem...
- Cluster Catch Digraphs with the Nearest Neighbor Distance : Abstract: We introduce a new method for clustering based on Cluster Catch Digraphs (CCDs). The new method addresses the limitations of RK-CCDs by employing a new variant of spatial randomness test tha...
- A Survey on Human-Centered Evaluation of Explainable AI Methods in Clinical Decision Support Systems : Abstract: Explainable Artificial Intelligence (XAI) is essential for the transparency and clinical adoption of Clinical Decision Support Systems (CDSS). However, the real-world effectiveness of existi...
- Towards High Resolution Probabilistic Coastal Inundation Forecasting from Sparse Observations : Abstract: Coastal flooding poses increasing threats to communities worldwide, necessitating accurate and hyper-local inundation forecasting for effective emergency response. However, real-world deploy...
- Accelerating Visual-Policy Learning through Parallel Differentiable Simulation : Abstract: In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach d...
- When fractional quasi p-norms concentrate : Abstract: Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental ...
- STaR-Bets: Sequential Target-Recalculating Bets for Tighter Confidence Intervals : Abstract: The construction of confidence intervals for the mean of a bounded random variable is a classical problem in statistics with numerous applications in machine learning and virtually all scien...
- Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning : Abstract: Despite its empirical success, the theoretical foundations of self-supervised contrastive learning (CL) are not yet fully established. In this work, we address this gap by showing that stand...
- Do-PFN: In-Context Learning for Causal Effect Estimation : Abstract: Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal grap...
- MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature : Abstract: Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expe...
- Sampling 3D Molecular Conformers with Diffusion Transformers : Abstract: Diffusion Transformers (DiTs) have demonstrated strong performance in generative modeling, particularly in image synthesis, making them a compelling choice for molecular conformer generation...
- TRUST-FS: Tensorized Reliable Unsupervised Multi-View Feature Selection for Incomplete Data : Abstract: Multi-view unsupervised feature selection (MUFS), which selects informative features from multi-view unlabeled data, has attracted increasing research interest in recent years. Although grea...
- Strategizing against No-regret Learners : Abstract: How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player c...
- Physics Guided Machine Learning Methods for Hydrology : Abstract: Streamflow prediction is one of the key challenges in the field of hydrology due to the complex interplay between multiple non-linear physical mechanisms behind streamflow generation. While ...
- Generalizable data-driven turbulence closure modeling on unstructured grids with differentiable physics : Abstract: Differentiable physical simulators are proving to be valuable tools for developing data-driven models for computational fluid dynamics (CFD). In particular, these simulators enable end-to-en...
- Natural gradient and parameter estimation for quantum Boltzmann machines : Abstract: Thermal states play a fundamental role in various areas of physics, and they are becoming increasingly important in quantum information science, with applications related to semi-definite pr...
- Outlyingness Scores with Cluster Catch Digraphs : Abstract: This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhanc...
- Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models : Abstract: We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of h...
- Joint Attention Mechanism Learning to Facilitate Opto-physiological Monitoring during Physical Activity : Abstract: Opto-physiological monitoring including photoplethysmography (PPG) provides non-invasive cardiac and respiratory measurements, yet motion artefacts (MAs) during physical activity degrade its...
- Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series Expansions : Abstract: The signature kernel is a recent state-of-the-art tool for analyzing high-dimensional sequential data, valued for its theoretical guarantees and strong empirical performance. In this paper, ...
- TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval : Abstract: Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large da...
- IAEmu: Learning Galaxy Intrinsic Alignment Correlations : Abstract: The intrinsic alignments (IA) of galaxies, a key contaminant in weak lensing analyses, arise from correlations in galaxy shapes driven by tidal interactions and galaxy formation processes. A...
- JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes : Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platf...
- Wasserstein Distributionally Robust Nonparametric Regression : Abstract: Wasserstein distributionally robust optimization (WDRO) strengthens statistical learning under model uncertainty by minimizing the local worst-case risk within a prescribed ambiguity set. Al...
- HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization : Abstract: While scaling the length of responses at test-time has been shown to markedly improve the reasoning abilities and performance of large language models (LLMs), it often results in verbose out...
- Continuous Subspace Optimization for Continual Learning : Abstract: Continual learning aims to learn multiple tasks sequentially while preserving prior knowledge, but faces the challenge of catastrophic forgetting when adapting to new tasks. Recently, approa...
- Generalizable Insights for Graph Transformers in Theory and Practice : Abstract: Graph Transformers (GTs) have shown strong empirical performance, yet current architectures vary widely in their use of attention mechanisms, positional embeddings (PEs), and expressivity. E...
- From Sequential to Recursive: Enhancing Decision-Focused Learning with Bidirectional Feedback : Abstract: Decision-focused learning (DFL) has emerged as a powerful end-to-end alternative to conventional predict-then-optimize (PTO) pipelines by directly optimizing predictive models through downst...
- DynaAct: Large Language Model Reasoning with Dynamic Action Spaces : Abstract: In modern sequential decision-making systems, the construction of an optimal candidate action space is critical to efficient inference. However, existing approaches either rely on manually d...
- Online Linear Regression with Paid Stochastic Features : Abstract: We study an online linear regression setting in which the observed feature vectors are corrupted by noise and the learner can pay to reduce the noise level. In practice, this may happen for ...
- HipKittens: Fast and Furious AMD Kernels : Abstract: AMD GPUs offer state-of-the-art compute and memory bandwidth; however, peak performance AMD kernels are written in raw assembly. To address the difficulty of mapping AI algorithms to hardwar...
- Stuart-Landau Oscillatory Graph Neural Network : Abstract: Oscillatory Graph Neural Networks (OGNNs) are an emerging class of physics-inspired architectures designed to mitigate oversmoothing and vanishing gradient problems in deep GNNs. In this wor...
- BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning Services : Abstract: Federated Learning (FL) is a promising machine learning solution in large-scale IoT systems, guaranteeing load distribution and privacy. However, FL does not natively consider infrastructure...
- Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics : Abstract: Learning to simulate complex physical systems from data has emerged as a promising way to overcome the limitations of traditional numerical solvers, which often require prohibitive computati...
- The Online Patch Redundancy Eliminator (OPRE): A novel approach to online agnostic continual learning using dataset compression : Abstract: In order to achieve Continual Learning (CL), the problem of catastrophic forgetting, one that has plagued neural networks since their inception, must be overcome. The evaluation of continual...
- Towards Non-Stationary Time Series Forecasting with Temporal Stabilization and Frequency Differencing : Abstract: Time series forecasting is critical for decision-making across dynamic domains such as energy, finance, transportation, and cloud computing. However, real-world time series often exhibit non...
- PrefPoE: Advantage-Guided Preference Fusion for Learning Where to Explore : Abstract: Exploration in reinforcement learning remains a critical challenge, as naive entropy maximization often results in high variance and inefficient policy updates. We introduce \textbf{PrefPoE}...
- A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation : Abstract: The Transformer architecture has achieved tremendous success in natural language processing, computer vision, and scientific computing through its self-attention mechanism. However, its core...
- Data-Driven Discovery of Feature Groups in Clinical Time Series : Abstract: Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from...
- Rethinking Explanation Evaluation under the Retraining Scheme : Abstract: Feature attribution has gained prominence as a tool for explaining model decisions, yet evaluating explanation quality remains challenging due to the absence of ground-truth explanations. To...
- Adversarial Bias: Data Poisoning Attacks on Fairness : Abstract: With the growing adoption of AI and machine learning systems in real-world applications, ensuring their fairness has become increasingly critical. The majority of the work in algorithmic fai...
- From Confusion to Clarity: ProtoScore - A Framework for Evaluating Prototype-Based XAI : Abstract: The complexity and opacity of neural networks (NNs) pose significant challenges, particularly in high-stakes fields such as healthcare, finance, and law, where understanding decision-making ...
- Multi-objective Hyperparameter Optimization in the Age of Deep Learning : Abstract: While Deep Learning (DL) experts often have prior knowledge about which hyperparameter settings yield strong performance, only few Hyperparameter Optimization (HPO) algorithms can leverage s...
- EMAformer: Enhancing Transformer through Embedding Armor for Time Series Forecasting : Abstract: Multivariate time series forecasting is crucial across a wide range of domains. While presenting notable progress for the Transformer architecture, iTransformer still lags behind the latest ...
- Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment : Abstract: Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local...
- ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games : Abstract: In graph-structured multi-agent reinforcement learning (MARL) adversarial tasks such as pursuit and confrontation, agents must coordinate under highly dynamic interactions, where sparse rewa...
- NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization : Abstract: Accurately estimating the normalization term (also known as the partition function) in the contrastive loss is a central challenge for training Contrastive Language-Image Pre-training (CLIP)...
- Physics-Informed Neural Operators for Cardiac Electrophysiology : Abstract: Accurately simulating systems governed by PDEs, such as voltage fields in cardiac electrophysiology (EP) modelling, remains a significant modelling challenge. Traditional numerical solvers a...
- HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization : Abstract: Diffusion and flow-matching have emerged as powerful methodologies for generative modeling, with remarkable success in capturing complex data distributions and enabling flexible guidance at ...
- An update to PYRO-NN: A Python Library for Differentiable CT Operators : Abstract: Deep learning has brought significant advancements to X-ray Computed Tomography (CT) reconstruction, offering solutions to challenges arising from modern imaging technologies. These developm...
- Coherence Mechanisms for Provable Self-Improvement : Abstract: Self-improvement is a critical capability for large language models and other intelligent systems, enabling them to refine their behavior and internal consistency without external supervisio...
- One Model for All: Universal Pre-training for EEG based Emotion Recognition across Heterogeneous Datasets and Paradigms : Abstract: EEG-based emotion recognition is hampered by profound dataset heterogeneity (channel/subject variability), hindering generalizable models. Existing approaches struggle to transfer knowledge ...
- Clustering Guided Residual Neural Networks for Multi-Tx Localization in Molecular Communications : Abstract: Transmitter localization in Molecular Communication via Diffusion is a critical topic with many applications. However, accurate localization of multiple transmitters is a challenging problem...
- FMMI: Flow Matching Mutual Information Estimation : Abstract: We introduce a novel Mutual Information (MI) estimator that fundamentally reframes the discriminative approach. Instead of training a classifier to discriminate between joint and marginal di...
- Resource Allocation in Hybrid Radio-Optical IoT Networks using GNN with Multi-task Learning : Abstract: This paper addresses the problem of dual-technology scheduling in hybrid Internet of Things (IoT) networks that integrate Optical Wireless Communication (OWC) alongside Radio Frequency (RF)....
- RL-Exec: Impact-Aware Reinforcement Learning for Opportunistic Optimal Liquidation, Outperforms TWAP and a Book-Liquidity VWAP on BTC-USD Replays : Abstract: We study opportunistic optimal liquidation over fixed deadlines on BTC-USD limit-order books (LOB). We present RL-Exec, a PPO agent trained on historical replays augmented with endogenous tr...
- From Hubs to Deserts: Urban Cultural Accessibility Patterns with Explainable AI : Abstract: Cultural infrastructures, such as libraries, museums, theaters, and galleries, support learning, civic life, health, and local economies, yet access is uneven across cities. We present a nov...
- Tractable Instances of Bilinear Maximization: Implementing LinUCB on Ellipsoids : Abstract: We consider the maximization of $x^\top θ$ over $(x,θ) \in \mathcal{X} \times Θ$, with $\mathcal{X} \subset \mathbb{R}^d$ convex and $Θ\subset \mathbb{R}^d$ an ellipsoid. This problem is fun...
- EvoPS: Evolutionary Patch Selection for Whole Slide Image Analysis in Computational Pathology : Abstract: In computational pathology, the gigapixel scale of Whole-Slide Images (WSIs) necessitates their division into thousands of smaller patches. Analyzing these high-dimensional patch embeddings ...
- Shocks Under Control: Taming Transonic Compressible Flow over an RAE2822 Airfoil with Deep Reinforcement Learning : Abstract: Active flow control of compressible transonic shock-boundary layer interactions over a two-dimensional RAE2822 airfoil at Re = 50,000 is investigated using deep reinforcement learning (DRL)....
- Infinite-Dimensional Operator/Block Kaczmarz Algorithms: Regret Bounds and $\lambda$-Effectiveness : Abstract: We present a variety of projection-based linear regression algorithms with a focus on modern machine-learning models and their algorithmic performance. We study the role of the relaxation pa...
- Robust Experimental Design via Generalised Bayesian Inference : Abstract: Bayesian optimal experimental design is a principled framework for conducting experiments that leverages Bayesian inference to quantify how much information one can expect to gain from selec...
- Kolmogorov-Arnold Chemical Reaction Neural Networks for learning pressure-dependent kinetic rate laws : Abstract: Chemical Reaction Neural Networks (CRNNs) have emerged as an interpretable machine learning framework for discovering reaction kinetics directly from data, while strictly adhering to the Arr...
- Misaligned by Design: Incentive Failures in Machine Learning : Abstract: The cost of error in many high-stakes settings is asymmetric: misdiagnosing pneumonia when absent is an inconvenience, but failing to detect it when present can be life-threatening. Because ...
- Streaming Tensor Program: A streaming abstraction for dynamic parallelism : Abstract: Dynamic behaviors are becoming prevalent in many tensor applications. In machine learning, for example, the input tensors are dynamically shaped or ragged, and data-dependent control flow is...
- Distributionally Robust Online Markov Game with Linear Function Approximation : Abstract: The sim-to-real gap, where agents trained in a simulator face significant performance degradation during testing, is a fundamental challenge in reinforcement learning. Extansive works adopt ...
- Hyperellipsoid Density Sampling: Exploitative Sequences to Accelerate High-Dimensional Optimization : Abstract: The curse of dimensionality presents a pervasive challenge in optimization problems, with exponential expansion of the search space rapidly causing traditional algorithms to become inefficie...
- Parallel Sampling via Autospeculation : Abstract: We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses...
- SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition : Abstract: Spiking neural networks (SNNs) offer a promising path toward energy-efficient speech command recognition (SCR) by leveraging their event-driven processing paradigm. However, existing SNN-bas...
- Improving the accuracy and generalizability of molecular property regression models with a substructure-substitution-rule-informed framework : Abstract: Artificial Intelligence (AI)-aided drug discovery is an active research field, yet AI models often exhibit poor accuracy in regression tasks for molecular property prediction, and perform ca...
- Adaptive Multi-Agent Response Refinement in Conversational Systems : Abstract: Large Language Models (LLMs) have demonstrated remarkable success in conversational systems by generating human-like responses. However, they can fall short, especially when required to acco...
- LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration : Abstract: Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcemen...
- HN-MVTS: HyperNetwork-based Multivariate Time Series Forecasting : Abstract: Accurate forecasting of multivariate time series data remains a formidable challenge, particularly due to the growing complexity of temporal dependencies in real-world scenarios. While neura...
- Towards Open-Set Myoelectric Gesture Recognition via Dual-Perspective Inconsistency Learning : Abstract: Surface electromyography (sEMG)-based gesture recognition plays a critical role in human-machine interaction (HMI), particularly for rehabilitation and prosthetic control. However, sEMG-base...
- Hybrid Quantum-Classical Selective State Space Artificial Intelligence : Abstract: Hybrid Quantum Classical (HQC) algorithms constitute one of the most effective paradigms for exploiting the computational advantages of quantum systems in large-scale numerical tasks. By ope...
- Extreme Model Compression with Structured Sparsity at Low Precision : Abstract: Deep neural networks (DNNs) are used in many applications, but their large size and high computational cost make them hard to run on devices with limited resources. Two widely used technique...
- DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering : Abstract: In multi-hop question answering (MHQA) tasks, Chain of Thought (CoT) improves the quality of generation by guiding large language models (LLMs) through multi-step reasoning, and Knowledge Gr...
- A Circular Argument : Does RoPE need to be Equivariant for Vision? : Abstract: Rotary Positional Encodings (RoPE) have emerged as a highly effective technique for one-dimensional sequences in Natural Language Processing spurring recent progress towards generalizing RoP...
- Text-based Aerial-Ground Person Retrieval : Abstract: This work introduces Text-based Aerial-Ground Person Retrieval (TAG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. Unlike tra...
- Bid Farewell to Seesaw: Towards Accurate Long-tail Session-based Recommendation via Dual Constraints of Hybrid Intents : Abstract: Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In the practical recommendation scenario, low-exposure items constit...
- RAPTR: Radar-based 3D Pose Estimation using Transformer : Abstract: Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusion...
- Unifying Model and Layer Fusion for Speech Foundation Models : Abstract: Speech Foundation Models have gained significant attention recently. Prior works have shown that the fusion of representations from multiple layers of the same model or the fusion of multipl...
- Interaction Dynamics as a Reward Signal for LLMs : Abstract: The alignment of Large Language Models (LLMs) for multi-turn conversations typically relies on reward signals derived from the content of the text. This approach, however, overlooks a rich, ...
- Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation : Abstract: Accurate disease interpretation from radiology remains challenging due to imaging heterogeneity. Achieving expert-level diagnostic decisions requires integration of subtle image features wit...
- Understanding Electro-communication and Electro-sensing in Weakly Electric Fish using Multi-Agent Deep Reinforcement Learning : Abstract: Weakly electric fish, like Gnathonemus petersii, use a remarkable electrical modality for active sensing and communication, but studying their rich electrosensing and electrocommunication be...
- Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification : Abstract: Interpretability is essential in Whole Slide Image (WSI) analysis for computational pathology, where understanding model predictions helps build trust in AI-assisted diagnostics. While Integ...
- Binary Split Categorical feature with Mean Absolute Error Criteria in CART : Abstract: In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established....
- Designing LLM-based Multi-Agent Systems for Software Engineering Tasks: Quality Attributes, Design Patterns and Rationale : Abstract: As the complexity of Software Engineering (SE) tasks continues to escalate, Multi-Agent Systems (MASs) have emerged as a focal point of research and practice due to their autonomy and scalab...
- HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios : Abstract: Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model ...
- SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation : Abstract: Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financia...
- Introducing A Bangla Sentence - Gloss Pair Dataset for Bangla Sign Language Translation and Research : Abstract: Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research...
- Large Sign Language Models: Toward 3D American Sign Language Translation : Abstract: We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL) by leveraging Large Language Models (LLMs) as the backbone, which can benefit ...
- LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics : Abstract: Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guida...
- Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models : Abstract: Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LL...
- The Path Not Taken: RLVR Provably Learns Off the Principals : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) reliably improves the reasoning performance of large language models, yet it appears to modify only a small fraction of parameters. We r...
- Automatic Grid Updates for Kolmogorov-Arnold Networks using Layer Histograms : Abstract: Kolmogorov-Arnold Networks (KANs) are a class of neural networks that have received increased attention in recent literature. In contrast to MLPs, KANs leverage parameterized, trainable acti...
- SENCA-st: Integrating Spatial Transcriptomics and Histopathology with Cross Attention Shared Encoder for Region Identification in Cancer Pathology : Abstract: Spatial transcriptomics is an emerging field that enables the identification of functional regions based on the spatial distribution of gene expression. Integrating this functional informati...
- Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models : Abstract: Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Prior work proposes recurrent transformers, ...
- Training Language Models to Explain Their Own Computations : Abstract: Can language models (LMs) learn to faithfully describe their internal computations? Are they better able to describe themselves than other models? We study the extent to which LMs' privilege...
- How Artificial Intelligence Leads to Knowledge Why: An Inquiry Inspired by Aristotle's Posterior Analytics : Abstract: Bayesian networks and causal models provide frameworks for handling queries about external interventions and counterfactuals, enabling tasks that go beyond what probability distributions alo...
- Emergence of Goal-Directed Behaviors via Active Inference with Self-Prior : Abstract: Infants often exhibit goal-directed behaviors, such as reaching for a sensory stimulus, even when no external reward criterion is provided. These intrinsically motivated behaviors facilitate...
- Question-to-Knowledge (Q2K): Multi-Agent Generation of Inspectable Facts for Product Mapping : Abstract: Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names...
- How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective : Abstract: Visual Spatial Reasoning (VSR) is a core human cognitive ability and a critical requirement for advancing embodied intelligence and autonomous systems. Despite recent progress in Vision-Lang...
- When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks : Abstract: Object-centric world models (OCWM) aim to decompose visual scenes into object-level representations, providing structured abstractions that could improve compositional generalization and dat...
- Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives : Abstract: As AI systems become more capable of complex agentic tasks, they also become more capable of pursuing undesirable objectives and causing harm. Previous work has attempted to catch these unsa...
- Green AI: A systematic review and meta-analysis of its definitions, lifecycle models, hardware and measurement attempts : Abstract: Across the Artificial Intelligence (AI) lifecycle - from hardware to development, deployment, and reuse - burdens span energy, carbon, water, and embodied impacts. Cloud provider tools impro...
- A Theoretical Analysis of Detecting Large Model-Generated Time Series : Abstract: Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs) in this work. ...
- Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture : Abstract: Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts ...
- Spikingformer: A Key Foundation Model for Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks, due to their event-driven spiking computation. However, some foundation SNN backb...
- Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships : Abstract: Pre-trained vision-language (VL) models are highly vulnerable to adversarial attacks. However, existing defense methods primarily focus on image classification, overlooking two key aspects o...
- Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions : Abstract: Heterogeneous hardware and dynamic workloads worsen long-standing OS bottlenecks in scalability, adaptability, and manageability. At the same time, advances in machine learning (ML), large l...
- Identifying treatment response subgroups in observational time-to-event data : Abstract: Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approa...
- Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form QA : Abstract: The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-en...
- Benchmarking Domain Generalization Algorithms in Computational Pathology : Abstract: Deep learning models have shown immense promise in computational pathology (CPath) tasks, but their performance often suffers when applied to unseen data due to domain shifts. Addressing thi...
- Selection of LLM Fine-Tuning Data based on Orthogonal Rules : Abstract: High-quality training data is critical to the performance of large language models (LLMs). Recent work has explored using LLMs to rate and select data based on a small set of human-designed ...
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning : Abstract: Offline RL is a powerful approach for data-driven decision-making and control. Compared to model-free methods, offline model-based RL (MBRL) explicitly learns world models from a static data...
- Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond : Abstract: While transformers have been at the core of most recent advancements in sequence generative models, their computational cost remains quadratic in sequence length. Several subquadratic archit...
- GeMID: Generalizable Models for IoT Device Identification : Abstract: With the proliferation of devices on the Internet of Things (IoT), ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their tr...
- SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins : Abstract: Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incu...
- Generalizing Weisfeiler-Lehman Kernels to Subgraphs : Abstract: Subgraph representation learning has been effective in solving various real-world problems. However, current graph neural networks (GNNs) produce suboptimal results for subgraph-level tasks ...
- A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses : Abstract: Prompt-based offline methods are commonly used to optimize large language model (LLM) responses, but evaluating these responses is computationally intensive and often fails to accommodate di...
- On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers : Abstract: This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. ...
- MA-GTS: A Multi-Agent Framework for Solving Complex Graph Problems in Real-World Applications : Abstract: Graph-theoretic problems arise in real-world applications like logistics, communication networks, and traffic optimization. These problems are often complex, noisy, and irregular, posing cha...
- Learning Vision-Based Neural Network Controllers with Semi-Probabilistic Safety Guarantees : Abstract: Ensuring safety in autonomous systems with vision-based control remains a critical challenge due to the high dimensionality of image inputs and the fact that the relationship between true sy...
- Explaining the Unexplainable: A Systematic Review of Explainable AI in Finance : Abstract: Practitioners and researchers trying to strike a balance between accuracy and transparency center Explainable Artificial Intelligence (XAI) at the junction of finance. This paper offers a th...
- Towards Synthesizing High-Dimensional Tabular Data with Limited Samples : Abstract: Diffusion-based tabular data synthesis models have yielded promising results. However, when the data dimensionality increases, existing models tend to degenerate and may perform even worse t...
- CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering : Abstract: Evaluating free-form Question Answering (QA) remains a challenge due to its diverse and open-ended nature. Traditional automatic metrics fail to capture semantic equivalence or accommodate t...
- COPA: Comparing the incomparable in multi-objective model evaluation : Abstract: In machine learning (ML), we often need to choose one among hundreds of trained ML models at hand, based on various objectives such as accuracy, robustness, fairness or scalability. However,...
- STAR-1: Safer Alignment of Reasoning LLMs with 1K Data : Abstract: This paper introduces STAR-1, a high-quality, just-1k-scale safety dataset specifically designed for large reasoning models (LRMs) like DeepSeek-R1. Built on three core principles -- diversi...
- MULTI-LF: A Continuous Learning Framework for Real-Time Malicious Traffic Detection in Multi-Environment Networks : Abstract: Multi-environment (M-En) networks integrate diverse traffic sources, including Internet of Things (IoT) and traditional computing systems, creating complex and evolving conditions for malici...
- A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling : Abstract: When captioning an image, people describe objects in diverse ways, such as by using different terms and/or including details that are perceptually noteworthy to them. Descriptions can be esp...
- Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation : Abstract: End-to-end speech-to-speech (S2S) dialogue systems have recently garnered increasing research attention for their lower latency and more natural integration of nonverbal cues such as emotion...
- On the generalization of language models from in-context learning and finetuning: a controlled study : Abstract: Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they a...
- FaSDiff: Balancing Perception and Semantics in Face Compression via Stable Diffusion Priors : Abstract: With the increasing deployment of facial image data across a wide range of applications, efficient compression tailored to facial semantics has become critical for both storage and transmiss...
- FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Alignment : Abstract: False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives ...
- Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning : Abstract: Federated Learning (FL) enables distributed model training across edge devices in a privacy-friendly manner. However, its efficiency heavily depends on effective device selection and high-di...
- Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors : Abstract: Interpretability research now offers a variety of techniques for identifying abstract internal mechanisms in neural networks. Can such techniques be used to predict how models will behave on...
- RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs : Abstract: Reinforcement learning-based post-training of large language models (LLMs) has recently gained attention, particularly following the release of DeepSeek R1, which applied GRPO for fine-tunin...
- Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning : Abstract: Offline reinforcement learning (RL) offers a powerful paradigm for data-driven control. Compared to model-free approaches, offline model-based RL (MBRL) explicitly learns a world model from ...
- BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations : Abstract: In the domain of sponsored search advertising, the focus of {Keyphrase recommendation has largely been on exact match types, which pose issues such as high management expenses, limited targe...
- A Unified and Fast-Sampling Diffusion Bridge Framework via Stochastic Optimal Control : Abstract: Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restorat...
- Zeroth-Order Optimization Finds Flat Minima : Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and lang...
- Imbalance in Balance: Online Concept Balancing in Generation Models : Abstract: In visual generation tasks, the responses and combinations of complex concepts often lack stability and are error-prone, which remains an under-explored area. In this paper, we attempt to ex...
- Beyond Algorethics: Addressing the Ethical and Anthropological Challenges of AI Recommender Systems : Abstract: This paper examines the ethical and anthropological challenges posed by AI-driven recommender systems (RSs), which increasingly shape digital environments and social interactions. By curatin...
- Report from Workshop on Dialogue alongside Artificial Intelligence : Abstract: Educational dialogue -the collaborative exchange of ideas through talk- is widely recognized as a catalyst for deeper learning and critical thinking in and across contexts. At the same time,...
- A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation : Abstract: Sequential recommendations (SR) predict users' future interactions based on their historical behavior. The rise of Large Language Models (LLMs) has brought powerful generative and reasoning ...
- SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? : Abstract: Optimizing the performance of large-scale software repositories demands expertise in code reasoning and software engineering (SWE) to reduce runtime while preserving program correctness. How...
- Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation : Abstract: Sequential recommendation (SR) aims to predict a user's next item preference by modeling historical interaction sequences. Recent advances often integrate frequency-domain modules to compens...
- Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models : Abstract: Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy...
- Slimmable NAM: Neural Amp Models with adjustable runtime computational cost : Abstract: This work demonstrates "slimmable Neural Amp Models", whose size and computational cost can be changed without additional training and with negligible computational overhead, enabling musici...
- Towards Personalized Quantum Federated Learning for Anomaly Detection : Abstract: Anomaly detection has a significant impact on applications such as video surveillance, medical diagnostics, and industrial monitoring, where anomalies frequently depend on context and anomal...
- Multivariate Variational Autoencoder : Abstract: We present the Multivariate Variational Autoencoder (MVAE), a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction. MVAE factorizes each posterio...
- RELEAP: Reinforcement-Enhanced Label-Efficient Active Phenotyping for Electronic Health Records : Abstract: Objective: Electronic health record (EHR) phenotyping often relies on noisy proxy labels, which undermine the reliability of downstream risk prediction. Active learning can reduce annotation...
- Comparing Reconstruction Attacks on Pretrained Versus Full Fine-tuned Large Language Model Embeddings on Homo Sapiens Splice Sites Genomic Data : Abstract: This study investigates embedding reconstruction attacks in large language models (LLMs) applied to genomic sequences, with a specific focus on how fine-tuning affects vulnerability to these...
- Counterfactual Forecasting of Human Behavior using Generative AI and Causal Graphs : Abstract: This study presents a novel framework for counterfactual user behavior forecasting that combines structural causal models with transformer-based generative artificial intelligence. To model ...
- Provably Efficient Sample Complexity for Robust CMDP : Abstract: We study the problem of learning policies that maximize cumulative reward while satisfying safety constraints, even when the real environment differs from a simulator or nominal model. We fo...
- Methodological Precedence in Health Tech: Why ML/Big Data Analysis Must Follow Basic Epidemiological Consistency. A Case Study : Abstract: The integration of advanced analytical tools, including Machine Learning (ML) and massive data processing, has revolutionized health research, promising unprecedented accuracy in diagnosis a...
- SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs : Abstract: Mechanistic interpretability aims to decompose neural networks into interpretable features and map their connecting circuits. The standard approach trains sparse autoencoders (SAEs) on each ...
- FlowTIE: Flow-based Transport of Intensity Equation for Phase Gradient Estimation from 4D-STEM Data : Abstract: We introduce FlowTIE, a neural-network-based framework for phase reconstruction from 4D-Scanning Transmission Electron Microscopy (STEM) data, which integrates the Transport of Intensity Equ...
- Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network : Abstract: Effective crime linkage analysis is crucial for identifying serial offenders and enhancing public safety. To address limitations of traditional crime linkage methods in handling high-dimensi...
- CAE: Character-Level Autoencoder for Non-Semantic Relational Data Grouping : Abstract: Enterprise relational databases increasingly contain vast amounts of non-semantic data - IP addresses, product identifiers, encoded keys, and timestamps - that challenge traditional semantic...
- ZeroSim: Zero-Shot Analog Circuit Evaluation with Unified Transformer Embeddings : Abstract: Although recent advancements in learning-based analog circuit design automation have tackled tasks such as topology generation, device sizing, and layout synthesis, efficient performance eva...
- Probabilities Are All You Need: A Probability-Only Approach to Uncertainty Estimation in Large Language Models : Abstract: Large Language Models (LLMs) exhibit strong performance across various natural language processing (NLP) tasks but remain vulnerable to hallucinations, generating factually incorrect or misl...
- On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection : Abstract: Artificial Intelligence (AI) models have demonstrated expert-level performance in melanoma detection, yet their clinical adoption is hindered by performance disparities across demographic su...
- Intelligent Optimization of Multi-Parameter Micromixers Using a Scientific Machine Learning Framework : Abstract: Multidimensional optimization has consistently been a critical challenge in engineering. However, traditional simulation-based optimization methods have long been plagued by significant limi...
- A Ranking-Based Optimization Algorithm for the Vehicle Relocation Problem in Car Sharing Services : Abstract: The paper addresses the Vehicle Relocation Problem in free-floating car-sharing services by presenting a solution focused on strategies for repositioning vehicles and transferring personnel ...
- Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning : Abstract: Learning how to reach goals in an environment is a longstanding challenge in AI, yet reasoning over long horizons remains a challenge for modern methods. The key question is how to estimate ...
- From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) for Multimodal Large Language Models (MLLMs) is highly dependent on high-quality labeled data, which is often scarce and prone to substa...
- Schedulers for Schedule-free: Theoretically inspired hyperparameters : Abstract: The recently proposed schedule-free method has been shown to achieve strong performance when hyperparameter tuning is limited. The current theory for schedule-free only supports a constant l...
- Analyzing Political Text at Scale with Online Tensor LDA : Abstract: This paper proposes a topic modeling method that scales linearly to billions of documents. We make three core contributions: i) we present a topic modeling method, Tensor Latent Dirichlet Al...
- Multi-Objective Bilevel Learning : Abstract: As machine learning (ML) applications grow increasingly complex in recent years, modern ML frameworks often need to address multiple potentially conflicting objectives with coupled decision ...
- DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning : Abstract: As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model tra...
- Algorithm-Relative Trajectory Valuation in Policy Gradient Control : Abstract: We study how trajectory value depends on the learning algorithm in policy-gradient control. Using Trajectory Shapley in an uncertain LQR, we find a negative correlation between Persistence o...
- A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics : Abstract: Empirical scaling laws describe how test loss and other performance metrics depend on model size, dataset size, and compute. While such laws are consistent within specific regimes, apparentl...
- CellARC: Measuring Intelligence with Cellular Automata : Abstract: We introduce CellARC, a synthetic benchmark for abstraction and reasoning built from multicolor 1D cellular automata (CA). Each episode has five support pairs and one query serialized in 256...
- Rectified Noise: A Generative Model Using Positive-incentive Noise : Abstract: Rectified Flow (RF) has been widely used as an effective generative model. Although RF is primarily based on probability flow Ordinary Differential Equations (ODE), recent studies have shown...
- Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison : Abstract: We introduce \textit{Feedback Descent}, a framework that optimizes text artifacts -- prompts, code, and molecules -- through structured textual feedback, rather than relying solely on scalar...
- SERL: Self-Examining Reinforcement Learning on Open-Domain : Abstract: Reinforcement Learning (RL) has been shown to improve the capabilities of large language models (LLMs). However, applying RL to open-domain tasks faces two key challenges: (1) the inherent s...
- IBMA: An Imputation-Based Mixup Augmentation Using Self-Supervised Learning for Time Series Data : Abstract: Data augmentation in time series forecasting plays a crucial role in enhancing model performance by introducing variability while maintaining the underlying temporal patterns. However, time ...
- Predict-then-Optimize Method for Seaport Power-Logistics Scheduling: Generalization across Varying Tasks Stream : Abstract: Power-logistics scheduling in modern seaports typically follow a predict-then-optimize pipeline. To enhance decision quality, decision-focused learning has been proposed to align forecasting...
- Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective : Abstract: Machine unlearning--the ability to remove designated concepts from a pre-trained model--has advanced rapidly, particularly for text-to-image diffusion models. However, existing methods typic...
- Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning : Abstract: We introduce LOREN, a curvature-aware zeroth-order (ZO) optimization method for fine-tuning large language models (LLMs). Existing ZO methods, which estimate gradients via finite differences...
- OTSNet: A Neurocognitive-Inspired Observation-Thinking-Spelling Pipeline for Scene Text Recognition : Abstract: Scene Text Recognition (STR) remains challenging due to real-world complexities, where decoupled visual-linguistic optimization in existing frameworks amplifies error propagation through cro...
- SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories : Abstract: In this work, we study the problem of offline safe imitation learning (IL). In many real-world settings, online interactions can be risky, and accurately specifying the reward and the safety...
- Relation as a Prior: A Novel Paradigm for LLM-based Document-level Relation Extraction : Abstract: Large Language Models (LLMs) have demonstrated their remarkable capabilities in document understanding. However, recent research reveals that LLMs still exhibit performance gaps in Document-...
- ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum : Abstract: Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems. Traditional approaches rely on con...
- Deep (Predictive) Discounted Counterfactual Regret Minimization : Abstract: Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural ...
- MARC: Multimodal and Multi-Task Agentic Retrieval-Augmented Generation for Cold-Start Recommender System : Abstract: Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the except...
- FedPoP: Federated Learning Meets Proof of Participation : Abstract: Federated learning (FL) offers privacy preserving, distributed machine learning, allowing clients to contribute to a global model without revealing their local data. As models increasingly s...
- 2D Representation for Unguided Single-View 3D Super-Resolution in Real-Time : Abstract: We introduce 2Dto3D-SR, a versatile framework for real-time single-view 3D super-resolution that eliminates the need for high-resolution RGB guidance. Our framework encodes 3D data from a si...
- Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback : Abstract: As teachers increasingly turn to GenAI in their educational practice, we need robust methods to benchmark large language models (LLMs) for pedagogical purposes. This article presents an embe...
- Real-Time Performance Analysis of Multi-Fidelity Residual Physics-Informed Neural Process-Based State Estimation for Robotic Systems : Abstract: Various neural network architectures are used in many of the state-of-the-art approaches for real-time nonlinear state estimation. With the ever-increasing incorporation of these data-driven...
- Remodeling Semantic Relationships in Vision-Language Fine-Tuning : Abstract: Vision-language fine-tuning has emerged as an efficient paradigm for constructing multimodal foundation models. While textual context often highlights semantic relationships within an image,...
- Hierarchical Direction Perception via Atomic Dot-Product Operators for Rotation-Invariant Point Clouds Learning : Abstract: Point cloud processing has become a cornerstone technology in many 3D vision tasks. However, arbitrary rotations introduce variations in point cloud orientations, posing a long-standing chal...
- NERVE: Neighbourhood & Entropy-guided Random-walk for training free open-Vocabulary sEgmentation : Abstract: Despite recent advances in Open-Vocabulary Semantic Segmentation (OVSS), existing training-free methods face several limitations: use of computationally expensive affinity refinement strateg...
- ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation : Abstract: Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimoda...
- Bi-Objective Evolutionary Optimization for Large-Scale Open Pit Mine Scheduling Problem under Uncertainty with Chance Constraints : Abstract: The open-pit mine scheduling problem (OPMSP) is a complex, computationally expensive process in long-term mine planning, constrained by operational and geological dependencies. Traditional d...
- Dual-Kernel Graph Community Contrastive Learning : Abstract: Graph Contrastive Learning (GCL) has emerged as a powerful paradigm for training Graph Neural Networks (GNNs) in the absence of task-specific labels. However, its scalability on large-scale ...
- Test-time Diverse Reasoning by Riemannian Activation Steering : Abstract: Best-of-$N$ reasoning improves the accuracy of language models in solving complex tasks by sampling multiple candidate solutions and then selecting the best one based on some criteria. A cri...
- DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation : Abstract: Large Language Models (LLMs) are increasingly being adopted as tools for learning; however, most tools remain text-only, limiting their usefulness for domains where visualizations are essent...
- Smarter Together: Creating Agentic Communities of Practice through Shared Experiential Learning : Abstract: The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repo...
- JobSphere: An AI-Powered Multilingual Career Copilot for Government Employment Platforms : Abstract: Users of government employment websites commonly face engagement and accessibility challenges linked to navigational complexity, a dearth of language options, and a lack of personalized supp...
- AI-Powered Data Visualization Platform: An Intelligent Web Application for Automated Dataset Analysis : Abstract: An AI-powered data visualization platform that automates the entire data analysis process, from uploading a dataset to generating an interactive visualization. Advanced machine learning algo...
- SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models : Abstract: Refusal refers to the functional behavior enabling safety-aligned language models to reject harmful or unethical prompts. Following the growing scientific interest in mechanistic interpretab...
- FaithAct: Faithfulness Planning and Acting in MLLMs : Abstract: Unfaithfulness remains a persistent challenge for large language models (LLMs), which often produce plausible yet ungrounded reasoning chains that diverge from perceptual evidence or final c...
- Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance : Abstract: Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous driving. This paper presents a structured framework for developing safe datasets aligne...
- Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models : Abstract: We propose patching for large language models (LLMs) like software versions, a lightweight and modular approach for addressing safety vulnerabilities. While vendors release improved LLM vers...
- A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models : Abstract: The evolution of mathematics has been guided in part by interestingness. From researchers choosing which problems to tackle next, to students deciding which ones to engage with, people's cho...
- Hyperdimensional Decoding of Spiking Neural Networks : Abstract: This work presents a novel spiking neural network (SNN) decoding method, combining SNNs with Hyperdimensional computing (HDC). The goal is to create a decoding method with high accuracy, hig...
- DeepProofLog: Efficient Proving in Deep Stochastic Logic Programs : Abstract: Neurosymbolic (NeSy) AI aims to combine the strengths of neural architectures and symbolic reasoning to improve the accuracy, interpretability, and generalization capability of AI models. Wh...
- Simulating the Visual World with Artificial Intelligence: A Roadmap : Abstract: The landscape of video generation is shifting, from a focus on generating visually appealing clips to building virtual environments that support interaction and maintain physical plausibilit...
- Hybrid Bit and Semantic Communications : Abstract: Semantic communication technology is regarded as a method surpassing the Shannon limit of bit transmission, capable of effectively enhancing transmission efficiency. However, current approac...
- Advancing mathematics research with large language models : Abstract: The main drawback of using generative AI for advanced mathematics via Large Language Models (LLMs) is that they are probabilistic pattern-matchers, not logical reasoning engines. However, LL...
- Synera: Synergistic LLM Serving across Device and Cloud at Scale : Abstract: Large Language Models (LLMs) are becoming key components in various mobile operating systems, driving smart applications like interactive chatbots and personal assistants. While bringing enh...
- An Evaluation of LLMs Inference on Popular Single-board Computers : Abstract: The growing demand for on-device large language model (LLM) inference is driving interest in deploying lightweight, cost-effective AI solutions on edge hardware. Single-board computers (SBCs...
- Network and Systems Performance Characterization of MCP-Enabled LLM Agents : Abstract: Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools...
- DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones : Abstract: As the demand for human-like reasoning, multi-turn dialogues, and long-form responses grows, large language models (LLMs) are increasingly expected to support efficient and effective long-se...
- Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs : Abstract: We introduce Text-based Explainable Video Anomaly Detection (TbVAD), a language-driven framework for weakly supervised video anomaly detection that performs anomaly detection and explanation...
- Benchmarking Simulacra AI's Quantum Accurate Synthetic Data Generation for Chemical Sciences : Abstract: In this work, we benchmark \simulacra's synthetic data generation pipeline against a state-of-the-art Microsoft pipeline on a dataset of small to large systems. By analyzing the energy quali...
- AudAgent: Automated Auditing of Privacy Policy Compliance in AI Agents : Abstract: AI agents can autonomously perform tasks and, often without explicit user consent, collect or disclose users' sensitive local data, which raises serious privacy concerns. Although AI agents'...
- Pinching Antennas Meet AI in Next-Generation Wireless Networks : Abstract: Next-generation (NG) wireless networks must embrace innate intelligence in support of demanding emerging applications, such as extended reality and autonomous systems, under ultra-reliable a...
- A Preliminary Study of RAG for Taiwanese Historical Archives : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this pape...
- Exploring the Psychometric Validity of AI-Generated Student Responses: A Study on Virtual Personas' Learning Motivation : Abstract: This study explores whether large language models (LLMs) can simulate valid student responses for educational measurement. Using GPT -4o, 2000 virtual student personas were generated. Each p...
- GRIP: In-Parameter Graph Reasoning through Fine-Tuning Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in modeling sequential textual data and generalizing across diverse tasks. However, adapting LLMs to effectively handle...
- REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment : Abstract: Evaluating log summarization systems is challenging due to the lack of high-quality reference summaries and the limitations of existing metrics like ROUGE and BLEU, which depend on surface-l...
- Optimizing Classification of Infrequent Labels by Reducing Variability in Label Distribution : Abstract: This paper presents a novel solution, LEVER, designed to address the challenges posed by underperforming infrequent categories in Extreme Classification (XC) tasks. Infrequent categories, of...
- It Takes Two: A Dual Stage Approach for Terminology-Aware Translation : Abstract: This paper introduces DuTerm, a novel two-stage architecture for terminology-constrained machine translation. Our system combines a terminology-aware NMT model, adapted via fine-tuning on la...
- Dynamic Stability of LLM-Generated Code : Abstract: Current evaluations of LLMs for code generation emphasize functional correctness, overlooking the fact that functionally correct solutions can differ significantly in algorithmic complexity....
- Motif 2 12.7B technical report : Abstract: We introduce Motif-2-12.7B, a new open-weight foundation model that pushes the efficiency frontier of large language models by combining architectural innovation with system-level optimizati...
- The Polite Liar: Epistemic Pathology in Language Models : Abstract: Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite lia...
- Modulo Video Recovery via Selective Spatiotemporal Vision Transformer : Abstract: Conventional image sensors have limited dynamic range, causing saturation in high-dynamic-range (HDR) scenes. Modulo cameras address this by folding incident irradiance into a bounded range,...
- KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs : Abstract: With the widespread application of large language models (LLMs) in various fields, the security challenges they face have become increasingly prominent, especially the issue of jailbreak. Th...
- Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits : Abstract: Large Language Models require substantial computational resources for inference, posing deployment challenges. While dynamic pruning offers superior efficiency over static methods through ad...
- When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift : Abstract: Machine learning systems exhibit diverse failure modes: unfairness toward protected groups, brittleness to spurious correlations, poor performance on minority sub-populations, which are typi...
- Enabling Automatic Self-Talk Detection via Earables : Abstract: Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisib...
- Laplacian Score Sharpening for Mitigating Hallucination in Diffusion Models : Abstract: Diffusion models, though successful, are known to suffer from hallucinations that create incoherent or unrealistic samples. Recent works have attributed this to the phenomenon of mode interp...
- Focusing on Language: Revealing and Exploiting Language Attention Heads in Multilingual Large Language Models : Abstract: Large language models (LLMs) increasingly support multilingual understanding and generation. Meanwhile, efforts to interpret their internal mechanisms have emerged, offering insights to enha...
- Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance : Abstract: Diffusion models have demonstrated strong generative performance when using guidance methods such as classifier-free guidance (CFG), which enhance output quality by modifying the sampling tr...
- Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models : Abstract: The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use ...
- FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models : Abstract: Data duplication within large-scale corpora often impedes large language models' (LLMs) performance and privacy. In privacy-concerned federated learning scenarios, conventional deduplication...
- N-ReLU: Zero-Mean Stochastic Extension of ReLU : Abstract: Activation functions are fundamental for enabling nonlinear representations in deep neural networks. However, the standard rectified linear unit (ReLU) often suffers from inactive or "dead" ...
- SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction : Abstract: Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment. W...
- LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows : Abstract: Financial institutions deploy Large Language Models (LLMs) for reconciliations, regulatory reporting, and client communications, but nondeterministic outputs (output drift) undermine auditab...
- Leveraging the Power of AI and Social Interactions to Restore Trust in Public Polls : Abstract: The emergence of crowdsourced data has significantly reshaped social science, enabling extensive exploration of collective human actions, viewpoints, and societal dynamics. However, ensuring...
- One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers : Abstract: A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on...
- Partial Action Replacement: Tackling Distribution Shift in Offline MARL : Abstract: Offline multi-agent reinforcement learning (MARL) is severely hampered by the challenge of evaluating out-of-distribution (OOD) joint actions. Our core finding is that when the behavior poli...
- Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private : Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving documents from an external corpus at inference time. When this corpus contains sensitive information,...
- A Self-Improving Architecture for Dynamic Safety in Large Language Models : Abstract: Context: The integration of Large Language Models (LLMs) into core software systems is accelerating. However, existing software architecture patterns are static, while current safety assuran...
- Adaptive Graph Learning with Transformer for Multi-Reservoir Inflow Prediction : Abstract: Reservoir inflow prediction is crucial for water resource management, yet existing approaches mainly focus on single-reservoir models that ignore spatial dependencies among interconnected re...
- Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering : Abstract: Evaluating answers from state-of-the-art large language models (LLMs) is challenging: lexical metrics miss semantic nuances, whereas "LLM-as-Judge" scoring is computationally expensive. We r...
- Cortex AISQL: A Production SQL Engine for Unstructured Data : Abstract: Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine rela...
- FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing : Abstract: Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated...
- Speech Separation for Hearing-Impaired Children in the Classroom : Abstract: Classroom environments are particularly challenging for children with hearing impairments, where background noise, multiple talkers, and reverberation degrade speech perception. These diffic...
- Designing and Evaluating Malinowski's Lens: An AI-Native Educational Game for Ethnographic Learning : Abstract: This study introduces 'Malinowski's Lens', the first AI-native educational game for anthropology that transforms Bronislaw Malinowski's 'Argonauts of the Western Pacific' (1922) into an inte...
- Stress Testing Factual Consistency Metrics for Long-Document Summarization : Abstract: Evaluating the factual consistency of abstractive text summarization remains a significant challenge, particularly for long documents, where conventional metrics struggle with input length l...
- CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences : Abstract: Preference optimization is a critical post-training technique used to align large language models (LLMs) with human preferences, typically by fine-tuning on ranked response pairs. While meth...
- Diffusion Guided Adversarial State Perturbations in Reinforcement Learning : Abstract: Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environmen...
- A Negotiation-Based Multi-Agent Reinforcement Learning Approach for Dynamic Scheduling of Reconfigurable Manufacturing Systems : Abstract: Reconfigurable manufacturing systems (RMS) are critical for future market adjustment given their rapid adaptation to fluctuations in consumer demands, the introduction of new technological a...
- ViPRA: Video Prediction for Robot Actions : Abstract: Can we turn a video prediction model into a robot policy? Videos, including those of humans or teleoperated robots, capture rich physical interactions. However, most of them lack labeled act...
- Global Optimization on Graph-Structured Data via Gaussian Processes with Spectral Representations : Abstract: Bayesian optimization (BO) is a powerful framework for optimizing expensive black-box objectives, yet extending it to graph-structured domains remains challenging due to the discrete and com...
- TurboSAT: Gradient-Guided Boolean Satisfiability Accelerated on GPU-CPU Hybrid System : Abstract: While accelerated computing has transformed many domains of computing, its impact on logical reasoning, specifically Boolean satisfiability (SAT), remains limited. State-of-the-art SAT solve...
- UltraGS: Gaussian Splatting for Ultrasound Novel View Synthesis : Abstract: Ultrasound imaging is a cornerstone of non-invasive clinical diagnostics, yet its limited field of view complicates novel view synthesis. We propose \textbf{UltraGS}, a Gaussian Splatting fr...
- Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs : Abstract: AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of d...
- Filtered-ViT: A Robust Defense Against Multiple Adversarial Patch Attacks : Abstract: Deep learning vision systems are increasingly deployed in safety-critical domains such as healthcare, yet they remain vulnerable to small adversarial patches that can trigger misclassificati...
- SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought : Abstract: As Large Language Models (LLMs) evolve into personal assistants with access to sensitive user data, they face a critical privacy challenge: while prior work has addressed output-level privac...
- Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval : Abstract: Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities (e.g., image and text) by encoding data into compact binary representations. While recent methods have a...
- Physical Consistency of Aurora's Encoder: A Quantitative Study : Abstract: The high accuracy of large-scale weather forecasting models like Aurora is often accompanied by a lack of transparency, as their internal representations remain largely opaque. This "black b...
- HybridGuard: Enhancing Minority-Class Intrusion Detection in Dew-Enabled Edge-of-Things Networks : Abstract: Securing Dew-Enabled Edge-of-Things (EoT) networks against sophisticated intrusions is a critical challenge. This paper presents HybridGuard, a framework that integrates machine learning and...
- Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring : Abstract: Modern slavery affects millions of people worldwide, and regulatory frameworks such as Modern Slavery Acts now require companies to publish detailed disclosures. However, these statements ar...
- PRISM: Privacy-preserving Inference System with Homomorphic Encryption and Modular Activation : Abstract: With the rapid advancements in machine learning, models have become increasingly capable of learning and making predictions in various industries. However, deploying these models in critical...
- Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views : Abstract: Recently, large language models (LLMs) have been explored widely for 3D scene understanding. Among them, training-free approaches are gaining attention for their flexibility and generalizati...
- SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control : Abstract: Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for human...
- MURPHY: Multi-Turn GRPO for Self Correcting Code Generation : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful framework for enhancing the reasoning capabilities of large language models (LLMs). However, existing approach...
- A General Method for Proving Networks Universal Approximation Property : Abstract: Deep learning architectures are highly diverse. To prove their universal approximation properties, existing works typically rely on model-specific proofs. Generally, they construct a dedicat...
- LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost : Abstract: Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, ...
- LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation : Abstract: As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and laten...
- Meta-cognitive Multi-scale Hierarchical Reasoning for Motor Imagery Decoding : Abstract: Brain-computer interface (BCI) aims to decode motor intent from noninvasive neural signals to enable control of external devices, but practical deployment remains limited by noise and variab...
- Intelligence per Watt: Measuring Intelligence Efficiency of Local AI : Abstract: Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers strug...
- Generating Sketches in a Hierarchical Auto-Regressive Process for Flexible Sketch Drawing Manipulation at Stroke-Level : Abstract: Generating sketches with specific patterns as expected, i.e., manipulating sketches in a controllable way, is a popular task. Recent studies control sketch features at stroke-level by editin...
- Toward Adaptive BCIs: Enhancing Decoding Stability via User State-Aware EEG Filtering : Abstract: Brain-computer interfaces (BCIs) often suffer from limited robustness and poor long-term adaptability. Model performance rapidly degrades when user attention fluctuates, brain states shift o...
- Statistically Assuring Safety of Control Systems using Ensembles of Safety Filters and Conformal Prediction : Abstract: Safety assurance is a fundamental requirement for deploying learning-enabled autonomous systems. Hamilton-Jacobi (HJ) reachability analysis is a fundamental method for formally verifying saf...
- Test-driven Reinforcement Learning : Abstract: Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However,...
- Exploring the Underwater World Segmentation without Extra Training : Abstract: Accurate segmentation of marine organisms is vital for biodiversity monitoring and ecological assessment, yet existing datasets and models remain largely limited to terrestrial scenes. To br...
- CNN-Based Automated Parameter Extraction Framework for Modeling Memristive Devices : Abstract: Resistive random access memory (RRAM) is a promising candidate for next-generation nonvolatile memory (NVM) and in-memory computing applications. Compact models are essential for analyzing t...
- SpeechJudge: Towards Human-Level Judgment for Speech Naturalness : Abstract: Aligning large generative models with human feedback is a critical challenge. In speech synthesis, this is particularly pronounced due to the lack of a large-scale human preference dataset, ...
- DiffRegCD: Integrated Registration and Change Detection with Diffusion Features : Abstract: Change detection (CD) is fundamental to computer vision and remote sensing, supporting applications in environmental monitoring, disaster response, and urban development. Most CD models assu...
- Libra-MIL: Multimodal Prototypes Stereoscopic Infused with Task-specific Language Priors for Few-shot Whole Slide Image Classification : Abstract: While Large Language Models (LLMs) are emerging as a promising direction in computational pathology, the substantial computational cost of giga-pixel Whole Slide Images (WSIs) necessitates t...
- Balance Equation-based Distributionally Robust Offline Imitation Learning : Abstract: Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible. However, standard IL method...
- USV Obstacles Detection and Tracking in Marine Environments : Abstract: Developing a robust and effective obstacle detection and tracking system for Unmanned Surface Vehicle (USV) at marine environments is a challenging task. Research efforts have been made in t...
- Reliable and Private Utility Signaling for Data Markets : Abstract: The explosive growth of data has highlighted its critical role in driving economic growth through data marketplaces, which enable extensive data sharing and access to high-quality datasets. ...
- Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection : Abstract: Remote sensing change detection is often challenged by spatial misalignment between bi-temporal images, especially when acquisitions are separated by long seasonal or multi-year gaps. While ...
- NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation : Abstract: Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated proc...
- State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting? : Abstract: Until recently, fine-tuned BERT-like models provided state-of-the-art performance on text classification tasks. With the rise of instruction-tuned decoder-only models, commonly known as larg...
- Hardware-Aware YOLO Compression for Low-Power Edge AI on STM32U5 for Weeds Detection in Digital Agriculture : Abstract: Weeds significantly reduce crop yields worldwide and pose major challenges to sustainable agriculture. Traditional weed management methods, primarily relying on chemical herbicides, risk env...
- Self-Correction Distillation for Structured Data Question Answering : Abstract: Structured data question answering (QA), including table QA, Knowledge Graph (KG) QA, and temporal KG QA, is a pivotal research area. Advances in large language models (LLMs) have driven sig...
- Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning : Abstract: Current Video Large Language Models (VideoLLMs) suffer from quadratic computational complexity and key-value cache scaling, due to their reliance on processing excessive redundant visual tok...
- DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes : Abstract: Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synth...
- Invisible Triggers, Visible Threats! Road-Style Adversarial Creation Attack for Visual 3D Detection in Autonomous Driving : Abstract: Modern autonomous driving (AD) systems leverage 3D object detection to perceive foreground objects in 3D environments for subsequent prediction and planning. Visual 3D detection based on RGB...
- AVOID-JACK: Avoidance of Jackknifing for Swarms of Long Heavy Articulated Vehicles : Abstract: This paper presents a novel approach to avoiding jackknifing and mutual collisions in Heavy Articulated Vehicles (HAVs) by leveraging decentralized swarm intelligence. In contrast to typical...
- Multi-modal Deepfake Detection and Localization with FPN-Transformer : Abstract: The rapid advancement of generative adversarial networks (GANs) and diffusion models has enabled the creation of highly realistic deepfake content, posing significant threats to digital trus...
- ProSona: Prompt-Guided Personalization for Multi-Expert Medical Image Segmentation : Abstract: Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches ei...
- Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching : Abstract: Subject-driven image generation aims to synthesize novel depictions of a specific subject across diverse contexts while preserving its core identity features. Achieving both strong identity ...
- Radar-APLANC: Unsupervised Radar-based Heartbeat Sensing via Augmented Pseudo-Label and Noise Contrast : Abstract: Frequency Modulated Continuous Wave (FMCW) radars can measure subtle chest wall oscillations to enable non-contact heartbeat sensing. However, traditional radar-based heartbeat sensing metho...
- CLIP is All You Need for Human-like Semantic Representations in Stable Diffusion : Abstract: Latent diffusion models such as Stable Diffusion achieve state-of-the-art results on text-to-image generation tasks. However, the extent to which these models have a semantic understanding o...
- An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models : Abstract: The integration of different learning paradigms has long been a focus of machine learning research, aimed at overcoming the inherent limitations of individual methods. Fuzzy rule-based model...
- Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking : Abstract: The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and veri...
- Hierarchical Structure-Property Alignment for Data-Efficient Molecular Generation and Editing : Abstract: Property-constrained molecular generation and editing are crucial in AI-driven drug discovery but remain hindered by two factors: (i) capturing the complex relationships between molecular st...
- BARD10: A New Benchmark Reveals Significance of Bangla Stop-Words in Authorship Attribution : Abstract: This research presents a comprehensive investigation into Bangla authorship attribution, introducing a new balanced benchmark corpus BARD10 (Bangla Authorship Recognition Dataset of 10 autho...
- Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks : Abstract: The use of learned dynamics models, also known as world models, can improve the sample efficiency of reinforcement learning. Recent work suggests that the underlying causal graphs of such dy...
- Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis : Abstract: Evaluating identity preservation in generative models remains a critical yet unresolved challenge. Existing metrics rely on global embeddings or coarse VLM prompting, failing to capture fine...
- StableMorph: High-Quality Face Morph Generation with Stable Diffusion : Abstract: Face morphing attacks threaten the integrity of biometric identity systems by enabling multiple individuals to share a single identity. To develop and evaluate effective morphing attack dete...
- PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision : Abstract: Recent advances in Large Language Models (LLMs) and multimodal foundation models have significantly broadened their application in robotics and collaborative systems. However, effective mult...
- A robust methodology for long-term sustainability evaluation of Machine Learning models : Abstract: Sustainability and efficiency have become essential considerations in the development and deployment of Artificial Intelligence systems, yet existing regulatory and reporting practices lack ...
- Analysing Environmental Efficiency in AI for X-Ray Diagnosis : Abstract: The integration of AI tools into medical applications has aimed to improve the efficiency of diagnosis. The emergence of large language models (LLMs), such as ChatGPT and Claude, has expande...
- Agentic Educational Content Generation for African Languages on Edge Devices : Abstract: Addressing educational inequity in Sub-Saharan Africa, this research presents an autonomous agent-orchestrated framework for decentralized, culturally adaptive educational content generation...
- Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning : Abstract: Recent advancements in large language models (LLMs) have shifted the post-training paradigm from traditional instruction tuning and human preference alignment toward reinforcement learning (...
- Procedural Knowledge Improves Agentic LLM Workflows : Abstract: Large language models (LLMs) often struggle when performing agentic tasks without substantial tool support, prom-pt engineering, or fine tuning. Despite research showing that domain-dependen...
- Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models : Abstract: Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning c...
- Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces : Abstract: Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with s...
- AI-Driven Contribution Evaluation and Conflict Resolution: A Framework & Design for Group Workload Investigation : Abstract: The equitable assessment of individual contribution in teams remains a persistent challenge, where conflict and disparity in workload can result in unfair performance evaluation, often requi...
- Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions : Abstract: Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcome...
- AIA Forecaster: Technical Report : Abstract: This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three c...
- ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents : Abstract: Deep Research (DR) is an emerging agent application that leverages large language models (LLMs) to address open-ended queries. It requires the integration of several capabilities, including ...
- Towards AI-Assisted Generation of Military Training Scenarios : Abstract: Achieving expert-level performance in simulation-based training relies on the creation of complex, adaptable scenarios, a traditionally laborious and resource intensive process. Although pri...
- Operational machine learning for remote spectroscopic detection of CH$_{4}$ point sources : Abstract: Mitigating anthropogenic methane sources is one the most cost-effective levers to slow down global warming. While satellite-based imaging spectrometers, such as EMIT, PRISMA, and EnMAP, can ...
- Alignment-Aware Quantization for LLM Safety : Abstract: Safety and efficiency are both important factors when deploying large language models(LLMs). LLMs are trained to follow human alignment for safety, and post training quantization(PTQ) is app...
- GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem : Abstract: Recent advances in neural neighborhood search methods have shown potential in tackling Vehicle Routing Problems (VRPs). However, most existing approaches rely on simplistic state representat...
- WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking : Abstract: Large language models now draft news, legal analyses, and software code with human-level fluency. At the same time, regulations such as the EU AI Act mandate that each synthetic passage carr...
- Confidence-Aware Neural Decoding of Overt Speech from EEG: Toward Robust Brain-Computer Interfaces : Abstract: Non-invasive brain-computer interfaces that decode spoken commands from electroencephalogram must be both accurate and trustworthy. We present a confidence-aware decoding framework that coup...
- Toward Robust EEG-based Intention Decoding during Misarticulated Speech in Aphasia : Abstract: Aphasia severely limits verbal communication due to impaired language production, often leading to frequent misarticulations during speech attempts. Despite growing interest in brain-compute...
- SparseRM: A Lightweight Preference Modeling with Sparse Autoencoder : Abstract: Reward models (RMs) are a core component in the post-training of large language models (LLMs), serving as proxies for human preference evaluation and guiding model alignment. However, traini...
- Data Descriptions from Large Language Models with Influence Estimation : Abstract: Deep learning models have been successful in many areas but understanding their behaviors still remains a black-box. Most prior explainable AI (XAI) approaches have focused on interpreting a...
- DANS-KGC: Diffusion Based Adaptive Negative Sampling for Knowledge Graph Completion : Abstract: Negative sampling (NS) strategies play a crucial role in knowledge graph representation. In order to overcome the limitations of existing negative sampling strategies, such as vulnerability ...
- Neurophysiological Characteristics of Adaptive Reasoning for Creative Problem-Solving Strategy : Abstract: Adaptive reasoning enables humans to flexibly adjust inference strategies when environmental rules or contexts change, yet its underlying neural dynamics remain unclear. This study investiga...
- Lightweight Diffusion-based Framework for Online Imagined Speech Decoding in Aphasia : Abstract: A diffusion-based neural decoding framework optimized for real-time imagined speech classification in individuals with aphasia. The system integrates a lightweight conditional diffusion enco...
- Computational Blueprints: Generating Isomorphic Mathematics Problems with Large Language Models : Abstract: Personalized mathematics education is growing rapidly, creating a strong demand for large sets of similar practice problems. Yet existing studies on mathematics problem generation have focus...
- Toward Practical BCI: A Real-time Wireless Imagined Speech EEG Decoding System : Abstract: Brain-computer interface (BCI) research, while promising, has largely been confined to static and fixed environments, limiting real-world applicability. To move towards practical BCI, we int...
- Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction : Abstract: Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for so...
- TimeFlow: Towards Stochastic-Aware and Efficient Time Series Generation via Flow Matching Modeling : Abstract: Generating high-quality time series data has emerged as a critical research topic due to its broad utility in supporting downstream time series mining tasks. A major challenge lies in modeli...
- Versatile and Risk-Sensitive Cardiac Diagnosis via Graph-Based ECG Signal Representation : Abstract: Despite the rapid advancements of electrocardiogram (ECG) signal diagnosis and analysis methods through deep learning, two major hurdles still limit their clinical adoption: the lack of vers...
- Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition : Abstract: Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks,...
- Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models : Abstract: Large language models (LLMs) have demonstrated strong reasoning abilities across specialized domains, motivating research into their application to legal reasoning. However, existing legal b...
- Capturing Complex Spatial-Temporal Dependencies in Traffic Forecasting: A Self-Attention Approach : Abstract: We study the problem of traffic forecasting, aiming to predict the inflow and outflow of a region in the subsequent time slot. The problem is complex due to the intricate spatial and tempora...
- The One Where They Brain-Tune for Social Cognition: Multi-Modal Brain-Tuning on Friends : Abstract: Recent studies on audio models show brain-tuning - fine-tuning models to better predict corresponding fMRI activity - improves brain alignment and increases performance on downstream semanti...
- VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation : Abstract: Competency Questions (CQs) play a crucial role in validating ontology design. While manually crafting CQs can be highly time-consuming and costly for ontology engineers, recent studies have ...
- Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation : Abstract: Graph neural networks (GNNs) can effectively model structural information of graphs, making them widely used in knowledge graph (KG) reasoning. However, existing studies on the expressive po...
- Multivariate Time series Anomaly Detection:A Framework of Hidden Markov Models : Abstract: In this study, we develop an approach to multivariate time series anomaly detection focused on the transformation of multivariate time series to univariate time series. Several transformatio...
- Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-view Multi-Label Feature Selection : Abstract: Multi-view multi-label feature selection aims to identify informative features from heterogeneous views, where each sample is associated with multiple interdependent labels. This problem is ...
- Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models : Abstract: LLMs have made significant progress in the field of mathematical reasoning, but whether they have true the mathematical understanding ability is still controversial. To explore this issue, w...
- Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning : Abstract: Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show ...
- Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens' worth of agentic AI evaluations : Abstract: Enterprise adoption of agentic AI systems requires reliable evaluation methods that reflect real-world deployment scenarios. Traditional LLM benchmarks suffer from training data contaminatio...
- Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging : Abstract: Recent LLMs have demonstrated sophisticated problem-solving capabilities on various benchmarks through advanced reasoning algorithms. However, the key research question of identifying reason...
- MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement : Abstract: LLMs demonstrate performance comparable to human abilities in complex tasks such as mathematical reasoning, but their robustness in mathematical reasoning under minor input perturbations sti...
- Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression : Abstract: Recent years have witnessed the rapid advancements of large language models (LLMs) and their expanding applications, leading to soaring demands for computational resources. The widespread ad...
- Clustering-based Anomaly Detection in Multivariate Time Series Data : Abstract: Multivariate time series data come as a collection of time series describing different aspects of a certain temporal phenomenon. Anomaly detection in this type of data constitutes a challeng...
- Prudential Reliability of Large Language Models in Reinsurance: Governance, Assurance, and Capital Efficiency : Abstract: This paper develops a prudential framework for assessing the reliability of large language models (LLMs) in reinsurance. A five-pillar architecture--governance, data lineage, assurance, resi...
- Gateways to Tractability for Satisfiability in Pearl's Causal Hierarchy : Abstract: Pearl's Causal Hierarchy (PCH) is a central framework for reasoning about probabilistic, interventional, and counterfactual statements, yet the satisfiability problem for PCH formulas is com...
- Improving Industrial Injection Molding Processes with Explainable AI for Quality Classification : Abstract: Machine learning is an essential tool for optimizing industrial quality control processes. However, the complexity of machine learning models often limits their practical applicability due t...
- Advancements in synthetic data extraction for industrial injection molding : Abstract: Machine learning has significant potential for optimizing various industrial processes. However, data acquisition remains a major challenge as it is both time-consuming and costly. Synthetic...
- National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech - The SpeechCARE Solution : Abstract: Alzheimer's disease and related dementias (ADRD) affect one in five adults over 60, yet more than half of individuals with cognitive decline remain undiagnosed. Speech-based assessments show...
- SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning : Abstract: Recent advances in large language models have enabled AI systems to achieve expert-level performance on domain-specific scientific tasks, yet these systems remain narrow and handcrafted. We ...
- oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention : Abstract: This project was conducted as a 2nd-term adopted project of the "Post-5G Information and Communication System Infrastructure Enhancement R&D Project Development of Competitive Generative AI ...
- An Efficient Training Pipeline for Reasoning Graphical User Interface Agents : Abstract: Visual grounding is the task of localising image regions from natural language queries and is critical for reasoning capable Graphical User Interface agents. Many existing methods rely on ma...
- Towards Provably Unlearnable Examples via Bayes Error Optimisation : Abstract: The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from onlin...
- EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks : Abstract: Structured Electronic Health Record (EHR) data stores patient information in relational tables and plays a central role in clinical decision-making. Recent advances have explored the use of ...
- MADD: Multi-Agent Drug Discovery Orchestra : Abstract: Hit identification is a central challenge in early drug discovery, traditionally requiring substantial experimental resources. Recent advances in artificial intelligence, particularly large ...
- Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning : Abstract: Gaussian policies have dominated continuous control in deep reinforcement learning (RL), yet they suffer from a fundamental mismatch: their unbounded support requires ad-hoc squashing functi...
- Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents : Abstract: As AI agents proliferate across industries and applications, evaluating their performance based solely on infrastructural metrics such as latency, time-to-first-token, or token throughput is...
- Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning : Abstract: Large Multimodal Models (LMMs) have shown promising in-context learning (ICL) capabilities, but scaling to many-shot settings remains difficult due to limited context length and high inferen...
- Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs : Abstract: While Retrieval-Augmented Generation (RAG) methods commonly draw information from unstructured documents, the emerging paradigm of GraphRAG aims to leverage structured data such as knowledge...
Research Sources: 572 | Generated: 11/12/2025
