AI RESEARCH PAPERS & ACADEMIC SOURCES
- SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation : Abstract: Robotic manipulation requires precise spatial understanding to interact with objects in the real world. Point-based methods suffer from sparse sampling, leading to the loss of fine-grained s...
- ACDC: The Adverse Conditions Dataset with Correspondences for Robust Semantic Driving Scene Perception : Abstract: Level-5 driving automation requires a robust visual perception system that can parse input images under any condition. However, existing driving datasets for dense semantic perception are ei...
- Adjacent-view Transformers for Supervised Surround-view Depth Estimation : Abstract: Depth estimation has been widely studied and serves as the fundamental step of 3D perception for robotics and autonomous driving. Though significant progress has been made in monocular depth...
- Exploring the Adversarial Robustness of Face Forgery Detection with Decision-based Black-box Attacks : Abstract: Face forgery generation technologies generate vivid faces, which have raised public concerns about security and privacy. Many intelligent systems, such as electronic payment and identity ver...
- Improving Adversarial Transferability with Neighbourhood Gradient Information : Abstract: Deep neural networks (DNNs) are known to be susceptible to adversarial examples, leading to significant performance degradation. In black-box attack scenarios, a considerable attack performa...
- LangPose: Language-Aligned Motion for Robust 3D Human Pose Estimation : Abstract: 2D-to-3D human pose lifting is an ill-posed problem due to depth ambiguity and occlusion. Existing methods relying on spatial and temporal consistency alone are insufficient to resolve these...
- Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models : Abstract: Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in digital design, offering resolution independence and precise control over individual elements. Despite ...
- The Visual Counter Turing Test (VCT2): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (VAI) : Abstract: The rapid progress and widespread availability of text-to-image (T2I) generative models have heightened concerns about the misuse of AI-generated visuals, particularly in the context of misi...
- CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models : Abstract: The rapid advancement of Large Vision-Language Models (VLMs), both general-domain models and those specifically tailored for remote sensing, has demonstrated exceptional perception and reaso...
- Robust Bayesian Scene Reconstruction with Retrieval-Augmented Priors for Precise Grasping and Planning : Abstract: Constructing 3D representations of object geometry is critical for many robotics tasks, particularly manipulation problems. These representations must be built from potentially noisy partial...
- Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable : Abstract: Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models o...
- Synth-Align: Improving Trustworthiness in Vision-Language Model with Synthetic Preference Data Alignment : Abstract: Large Vision-Language Models (LVLMs) have shown promising capabilities in understanding and generating information by integrating both visual and textual data. However, current models are st...
- Domain Adaptation from Generated Multi-Weather Images for Unsupervised Maritime Object Classification : Abstract: The classification and recognition of maritime objects are crucial for enhancing maritime safety, monitoring, and intelligent sea environment prediction. However, existing unsupervised metho...
- Improved Wildfire Spread Prediction with Time-Series Data and the WSTS+ Benchmark : Abstract: Recent research has demonstrated the potential of deep neural networks (DNNs) to accurately predict wildfire spread on a given day based upon high-dimensional explanatory data from a single ...
- Surgical AI Copilot: Energy-Based Fourier Gradient Low-Rank Adaptation for Surgical LLM Agent Reasoning and Planning : Abstract: Image-guided surgery demands adaptive, real-time decision support, yet static AI models struggle with structured task planning and providing interactive guidance. Large language models (LLMs...
- ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting : Abstract: Recognizing symbols in architectural CAD drawings is critical for various advanced engineering applications. In this paper, we propose a novel CAD data annotation engine that leverages intri...
- HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration : Abstract: Single-image human reconstruction is vital for digital human modeling applications but remains an extremely challenging task. Current approaches rely on generative models to synthesize multi...
- Sim-to-Real: An Unsupervised Noise Layer for Screen-Camera Watermarking Robustness : Abstract: Unauthorized screen capturing and dissemination pose severe security threats such as data leakage and information theft. Several studies propose robust watermarking methods to track the copy...
- DG-DETR: Toward Domain Generalized Detection Transformer : Abstract: End-to-end Transformer-based detectors (DETRs) have demonstrated strong detection performance. However, domain generalization (DG) research has primarily focused on convolutional neural netw...
- Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration : Abstract: Restoring nighttime images affected by multiple adverse weather conditions is a practical yet under-explored research problem, as multiple weather conditions often coexist in the real world ...
- LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization : Abstract: Image vectorization is a powerful technique that converts raster images into vector graphics, enabling enhanced flexibility and interactivity. However, popular image vectorization tools stru...
- LBMamba: Locally Bi-directional Mamba : Abstract: Mamba, a State Space Model (SSM) that accelerates training by recasting recurrence as a parallel scan, has recently emerged as a linearly-scaling alternative to self-attention. Because of it...
- A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving : Abstract: 3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving, providing a voxel-level representation of both geometric details and semantic categories. However, ...
- evMLP: An Efficient Event-Driven MLP Architecture for Vision : Abstract: Deep neural networks have achieved remarkable results in computer vision tasks. In the early days, Convolutional Neural Networks (CNNs) were the mainstream architecture. In recent years, Vis...
- DMAT: An End-to-End Framework for Joint Atmospheric Turbulence Mitigation and Object Detection : Abstract: Atmospheric Turbulence (AT) degrades the clarity and accuracy of surveillance imagery, posing challenges not only for visualization quality but also for object classification and scene track...
- Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS : Abstract: Accurate geo-registration of LiDAR point clouds remains a significant challenge in urban environments where Global Navigation Satellite System (GNSS) signals are denied or degraded. Existing...
- Knowledge-Guided Brain Tumor Segmentation via Synchronized Visual-Semantic-Topological Prior Fusion : Abstract: Background: Brain tumor segmentation requires precise delineation of hierarchical structures from multi-sequence MRI. However, existing deep learning methods primarily rely on visual feature...
- Procedure Learning via Regularized Gromov-Wasserstein Optimal Transport : Abstract: We study self-supervised procedure learning, which discovers key steps and their order from a set of unlabeled videos. Previous methods typically learn frame-to-frame correspondences between...
- Multi-scale Cascaded Foundation Model for Whole-body Organs-at-risk Segmentation : Abstract: Accurate segmentation of organs-at-risk (OARs) is vital for safe and precise radiotherapy and surgery. Most existing studies segment only a limited set of organs or regions, lacking a system...
- UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets : Abstract: Purpose: Automated ultrasound image analysis is challenging due to anatomical complexity and limited annotated data. To tackle this, we take a data-centric approach, assembling the largest p...
- vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding : Abstract: Current Visual Simultaneous Localization and Mapping (VSLAM) systems often struggle to create maps that are both semantically rich and easily interpretable. While incorporating semantic scen...
- SasMamba: A Lightweight Structure-Aware Stride State Space Model for 3D Human Pose Estimation : Abstract: Recently, the Mamba architecture based on State Space Models (SSMs) has gained attention in 3D human pose estimation due to its linear complexity and strong global modeling capability. Howev...
- Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks : Abstract: In the field of image clustering, the widely used contrastive learning networks improve clustering performance by maximizing the similarity between positive pairs and the dissimilarity of ne...
- Improving VisNet for Object Recognition : Abstract: Object recognition plays a fundamental role in how biological organisms perceive and interact with their environment. While the human visual system performs this task with remarkable efficie...
- Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency : Abstract: Cross-modal Knowledge Distillation has demonstrated promising performance on paired modalities with strong semantic connections, referred to as Symmetric Cross-modal Knowledge Distillation (...
- LLM-Guided Probabilistic Fusion for Label-Efficient Document Layout Analysis : Abstract: Document layout understanding remains data-intensive despite advances in semi-supervised learning. We present a framework that enhances semi-supervised detection by fusing visual predictions...
- Consistency Change Detection Framework for Unsupervised Remote Sensing Change Detection : Abstract: Unsupervised remote sensing change detection aims to monitor and analyze changes from multi-temporal remote sensing images in the same geometric region at different times, without the need f...
- HitoMi-Cam: A Shape-Agnostic Person Detection Method Using the Spectral Characteristics of Clothing : Abstract: While convolutional neural network (CNN)-based object detection is widely used, it exhibits a shape dependency that degrades performance for postures not included in the training data. Build...
- Negative Entity Suppression for Zero-Shot Captioning with Synthetic Images : Abstract: Text-only training provides an attractive approach to address data scarcity challenges in zero-shot image captioning (ZIC), avoiding the expense of collecting paired image-text annotations. ...
- SPEED-Q: Staged Processing with Enhanced Distillation towards Efficient Low-bit On-device VLM Quantization : Abstract: Deploying Vision-Language Models (VLMs) on edge devices (e.g., smartphones and robots) is crucial for enabling low-latency and privacy-preserving intelligent applications. Given the resource...
- Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework : Abstract: Human-machine collaborative compression has been receiving increasing research efforts for reducing image/video data, serving as the basis for both human perception and machine intelligence....
- From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model : Abstract: The inference latency of diffusion models remains a critical barrier to their real-time application. While trajectory-based and distribution-based step distillation methods offer solutions, ...
- Neural B-frame Video Compression with Bi-directional Reference Harmonization : Abstract: Neural video compression (NVC) has made significant progress in recent years, while neural B-frame video compression (NBVC) remains underexplored compared to P-frame compression. NBVC can ad...
- FGM-HD: Boosting Generation Diversity of Fractal Generative Models through Hausdorff Dimension Induction : Abstract: Improving the diversity of generated results while maintaining high visual quality remains a significant challenge in image generation tasks. Fractal Generative Models (FGMs) are efficient i...
- AuthSig: Safeguarding Scanned Signatures Against Unauthorized Reuse in Paperless Workflows : Abstract: With the deepening trend of paperless workflows, signatures as a means of identity authentication are gradually shifting from traditional ink-on-paper to electronic formats.Despite the avail...
- Efficient and Effective In-context Demonstration Selection with Coreset : Abstract: In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the eff...
- WDT-MD: Wavelet Diffusion Transformers for Microaneurysm Detection in Fundus Images : Abstract: Microaneurysms (MAs), the earliest pathognomonic signs of Diabetic Retinopathy (DR), present as sub-60 $μm$ lesions in fundus images with highly variable photometric and morphological charac...
- An ICTM-RMSAV Framework for Bias-Field Aware Image Segmentation under Poisson and Multiplicative Noise : Abstract: Image segmentation is a core task in image processing, yet many methods degrade when images are heavily corrupted by noise and exhibit intensity inhomogeneity. Within the iterative-convoluti...
- T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection : Abstract: Object detection methods have evolved from closed-set to open-set paradigms over the years. Current open-set object detectors, however, remain constrained by their exclusive reliance on posi...
- Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs : Abstract: Object hallucination remains a critical challenge in Large Vision-Language Models (LVLMs), where models generate content inconsistent with visual inputs. Existing language-decoder based miti...
- Dense Cross-Scale Image Alignment With Fully Spatial Correlation and Just Noticeable Difference Guidance : Abstract: Existing unsupervised image alignment methods exhibit limited accuracy and high computational complexity. To address these challenges, we propose a dense cross-scale image alignment model. I...
- USF-Net: A Unified Spatiotemporal Fusion Network for Ground-Based Remote Sensing Cloud Image Sequence Extrapolation : Abstract: Ground-based remote sensing cloud image sequence extrapolation is a key research area in the development of photovoltaic power systems. However, existing approaches exhibit several limitatio...
- 4KDehazeFlow: Ultra-High-Definition Image Dehazing via Flow Matching : Abstract: Ultra-High-Definition (UHD) image dehazing faces challenges such as limited scene adaptability in prior-based methods and high computational complexity with color distortion in deep learning...
- VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering : Abstract: Contemporary Visual Question Answering (VQA) systems remain constrained when confronted with culturally specific content, largely because cultural knowledge is under-represented in training ...
- Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference : Abstract: Vision-language pre-training models (VLPs) demonstrate strong multimodal understanding and zero-shot generalization, yet remain vulnerable to adversarial examples, raising concerns about the...
- Composition-Incremental Learning for Compositional Generalization : Abstract: Compositional generalization has achieved substantial progress in computer vision on pre-collected training data. Nonetheless, real-world data continually emerges, with possible compositions...
- Ultra-Light Test-Time Adaptation for Vision--Language Models : Abstract: Vision-Language Models (VLMs) such as CLIP achieve strong zero-shot recognition by comparing image embeddings to text-derived class prototypes. However, under domain shift, they suffer from ...
- DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization : Abstract: Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researcher...
- PIFF: A Physics-Informed Generative Flow Model for Real-Time Flood Depth Mapping : Abstract: Flood mapping is crucial for assessing and mitigating flood impacts, yet traditional methods like numerical modeling and aerial photography face limitations in efficiency and reliability. To...
- MACEval: A Multi-Agent Continual Evaluation Network for Large Models : Abstract: Hundreds of benchmarks dedicated to evaluating large models from multiple perspectives have been presented over the past few years. Albeit substantial efforts, most of them remain closed-end...
- PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery : Abstract: Multi-person global human mesh recovery (HMR) is crucial for understanding crowd dynamics and interactions. Traditional vision-based HMR methods sometimes face limitations in real-world scen...
- HOTFLoc++: End-to-End Hierarchical LiDAR Place Recognition, Re-Ranking, and 6-DoF Metric Localisation in Forests : Abstract: This article presents HOTFLoc++, an end-to-end framework for LiDAR place recognition, re-ranking, and 6-DoF metric localisation in forests. Leveraging an octree-based transformer, our approa...
- DBINDS - Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos? : Abstract: AI-generated video has advanced rapidly and poses serious challenges to content security and forensic analysis. Existing detectors rely mainly on pixel-level visual cues and generalize poorl...
- Towards Trustworthy Dermatology MLLMs: A Benchmark and Multimodal Evaluator for Diagnostic Narratives : Abstract: Multimodal large language models (LLMs) are increasingly used to generate dermatology diagnostic narratives directly from images. However, reliable evaluation remains the primary bottleneck ...
- Spatial Information Bottleneck for Interpretable Visual Recognition : Abstract: Deep neural networks typically learn spatially entangled representations that conflate discriminative foreground features with spurious background correlations, thereby undermining model int...
- GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow : Abstract: The Animation-based Generative Codec (AGC) is an emerging paradigm for talking-face video compression. However, deploying its intricate decoder on resource and power-constrained edge devices...
- Deep Learning for Metabolic Rate Estimation from Biosignals: A Comparative Study of Architectures and Signal Selection : Abstract: Energy expenditure estimation aims to infer human metabolic rate from physiological signals such as heart rate, respiration, or accelerometer data, and has been studied primarily with classi...
- Enriching Knowledge Distillation with Cross-Modal Teacher Fusion : Abstract: Multi-teacher knowledge distillation (KD), a more effective technique than traditional single-teacher methods, transfers knowledge from expert teachers to a compact student model using logit...
- DensiCrafter: Physically-Constrained Generation and Fabrication of Self-Supporting Hollow Structures : Abstract: The rise of 3D generative models has enabled automatic 3D geometry and texture synthesis from multimodal inputs (e.g., text or images). However, these methods often ignore physical constrain...
- DualFete: Revisiting Teacher-Student Interactions from a Feedback Perspective for Semi-supervised Medical Image Segmentation : Abstract: The teacher-student paradigm has emerged as a canonical framework in semi-supervised learning. When applied to medical image segmentation, the paradigm faces challenges due to inherent image...
- FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection : Abstract: Camera-based multi-view 3D detection is crucial for autonomous driving. PETR and its variants (PETRs) excel in benchmarks but face deployment challenges due to high computational cost and me...
- Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection : Abstract: Moving infrared small target detection (IRSTD) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV-based search system. Moving IR...
- Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition : Abstract: Recognizing unseen skeleton action categories remains highly challenging due to the absence of corresponding skeletal priors. Existing approaches generally follow an "align-then-classify" pa...
- OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have achieved state-of-the-art results for novel view synthesis. However, efficiently capturing high-fidelity reconstructions of specific obje...
- BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation : Abstract: Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence tha...
- Hand Held Multi-Object Tracking Dataset in American Football : Abstract: Multi-Object Tracking (MOT) plays a critical role in analyzing player behavior from videos, enabling performance evaluation. Current MOT methods are often evaluated using publicly available ...
- Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models : Abstract: Vision Transformers (ViTs) have achieved strong performance in video action recognition, but their high computational cost limits their practicality. Lightweight CNNs are more efficient but ...
- DreamPose3D: Hallucinative Diffusion with Prompt Learning for 3D Human Pose Estimation : Abstract: Accurate 3D human pose estimation remains a critical yet unresolved challenge, requiring both temporal coherence across frames and fine-grained modeling of joint relationships. However, most...
- vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs : Abstract: Recent advances in context optimization (CoOp) guided by large language model (LLM)-distilled medical semantic priors offer a scalable alternative to manual prompt engineering and full fine-...
- RF-DETR: Neural Architecture Search for Real-Time Detection Transformers : Abstract: Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-traini...
- Moving pattern-based modeling using a new type of interval ARX model : Abstract: In this paper,firstly,to overcome the shortcoming of traditional ARX model, a new operator between an interval number and a real matrix is defined, and then it is applied to the traditional ...
- SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images : Abstract: The Segment Anything Model (SAM) has demonstrated significant potential in medical image segmentation. Yet, its performance is limited when only a small amount of labeled data is available, ...
- Fluence Map Prediction with Deep Learning: A Transformer-based Approach : Abstract: Accurate fluence map prediction is essential in intensity-modulated radiation therapy (IMRT) to maximize tumor coverage while minimizing dose to healthy tissues. Conventional optimization is...
- 3D-TDA - Topological feature extraction from 3D images for Alzheimer's disease classification : Abstract: Now that disease-modifying therapies for Alzheimer disease have been approved by regulatory agencies, the early, objective, and accurate clinical diagnosis of AD based on the lowest-cost mea...
- Stabilizing Direct Training of Spiking Neural Networks: Membrane Potential Initialization and Threshold-robust Surrogate Gradient : Abstract: Recent advancements in the direct training of Spiking Neural Networks (SNNs) have demonstrated high-quality outputs even at early timesteps, paving the way for novel energy-efficient AI para...
- OG-PCL: Efficient Sparse Point Cloud Processing for Human Activity Recognition : Abstract: Human activity recognition (HAR) with millimeter-wave (mmWave) radar offers a privacy-preserving and robust alternative to camera- and wearable-based approaches. In this work, we propose the...
- "It's trained by non-disabled people": Evaluating How Image Quality Affects Product Captioning with VLMs : Abstract: Vision-Language Models (VLMs) are increasingly used by blind and low-vision (BLV) people to identify and understand products in their everyday lives, such as food, personal products, and hou...
- ROI-based Deep Image Compression with Implicit Bit Allocation : Abstract: Region of Interest (ROI)-based image compression has rapidly developed due to its ability to maintain high fidelity in important regions while reducing data redundancy. However, existing com...
- Expand Your SCOPE: Semantic Cognition over Potential-Based Exploration for Embodied Visual Navigation : Abstract: Embodied visual navigation remains a challenging task, as agents must explore unknown environments with limited knowledge. Existing zero-shot studies have shown that incorporating memory mec...
- Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation : Abstract: The performance of egocentric AI agents is fundamentally limited by multimodal intent ambiguity. This challenge arises from a combination of underspecified language, imperfect visual data, a...
- Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding : Abstract: Nowadays, navigation and ride-sharing apps have collected numerous images with spatio-temporal data. A core technology for analyzing such images, associated with spatiotemporal information, ...
- UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving : Abstract: Autonomous driving holds transformative potential but remains fundamentally constrained by the limited perception and isolated decision-making with standalone intelligence. While recent mult...
- RadHARSimulator V2: Video to Doppler Generator : Abstract: Radar-based human activity recognition (HAR) still lacks a comprehensive simulation method. Existing software is developed based on models or motion-captured data, resulting in limited flexi...
- SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields : Abstract: Traditional Visual Odometry (VO) and Visual Inertial Odometry (VIO) methods rely on a 'pose-centric' paradigm, which computes absolute camera poses from the local map thus requires large-sca...
- Augment to Augment: Diverse Augmentations Enable Competitive Ultra-Low-Field MRI Enhancement : Abstract: Ultra-low-field (ULF) MRI promises broader accessibility but suffers from low signal-to-noise ratio (SNR), reduced spatial resolution, and contrasts that deviate from high-field standards. I...
- SPIDER: Scalable Physics-Informed Dexterous Retargeting : Abstract: Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, a...
- MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation : Abstract: Pre-trained Vision-Language-Action (VLA) models have achieved remarkable success in improving robustness and generalization for end-to-end robotic manipulation. However, these models struggl...
- EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI : Abstract: High-quality datasets are critical for training and evaluating reliable NLP models. In tasks like natural language inference (NLI), human label variation (HLV) arises when multiple labels ar...
- SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving : Abstract: Recent advances in large reasoning models have been driven by reinforcement learning and test-time scaling, accompanied by growing interest in latent rather than purely textual reasoning. Ho...
- Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models : Abstract: Emotional support is a core capability in human-AI interaction, with applications including psychological counseling, role play, and companionship. However, existing evaluations of large lan...
- MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique : Abstract: The ability of critique is vital for models to self-improve and serve as reliable AI assistants. While extensively studied in language-only settings, multimodal critique of Large Multimodal ...
- Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition : Abstract: In this work, we propose a streaming speech recognition framework for Amdo Tibetan, built upon a hybrid CTC/Atten-tion architecture with a context-aware dynamic chunking mechanism. The propo...
- Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning : Abstract: Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning ...
- Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation : Abstract: Computational humor is a frontier for creating advanced and engaging natural language processing (NLP) applications, such as sophisticated dialogue systems. While previous studies have bench...
- One-Topic-Doesn't-Fit-All: Transcreating Reading Comprehension Test for Personalized Learning : Abstract: Personalized learning has gained attention in English as a Foreign Language (EFL) education, where engagement and motivation play crucial roles in reading comprehension. We propose a novel a...
- DoPE: Denoising Rotary Position Embedding : Abstract: Rotary Position Embedding (RoPE) in Transformer models has inherent limits that weaken length extrapolation. We reinterpret the attention map with positional encoding as a noisy feature map,...
- A Hybrid Search for Complex Table Question Answering in Securities Report : Abstract: Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting information from tables in documents. Howe...
- Context is Enough: Empirical Validation of $\textit{Sequentiality}$ on Essays : Abstract: Recent work has proposed using Large Language Models (LLMs) to quantify narrative flow through a measure called sequentiality, which combines topic and contextual terms. A recent critique ar...
- The Learning Dynamics of Subword Segmentation for Morphologically Diverse Languages : Abstract: Subword segmentation is typically applied in preprocessing and stays fixed during training. Alternatively, it can be learned during training to optimise the training objective. In this paper...
- Pretraining Finnish ModernBERTs : Abstract: This paper reports on pretraining ModernBERT encoder models in six different sizes, ranging from 51M to 475M parameters, with a focus on limited multilingualism, emphasizing languages releva...
- Stabilizing Reinforcement Learning for Honesty Alignment in Language Models on Deductive Reasoning : Abstract: Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a promising framework for aligning language models with complex reasoning objectives. However, most existing met...
- POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation : Abstract: Speech Large Language Models (SpeechLLMs) have achieved breakthroughs in multilingual speech-to-text translation (S2TT). However, existing approaches often overlook semantic commonalities ac...
- C$^3$TG: Conflict-aware, Composite, and Collaborative Controlled Text Generation : Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable text generation capabilities. However, controlling specific attributes of generated text remains challenging ...
- LiteraryTaste: A Preference Dataset for Creative Writing Personalization : Abstract: People have different creative writing preferences, and large language models (LLMs) for these tasks can benefit from adapting to each user's preferences. However, these models are often tra...
- Towards Explainable Khmer Polarity Classification : Abstract: Khmer polarity classification is a fundamental natural language processing task that assigns a positive, negative, or neutral label to a given Khmer text input. Existing Khmer models typical...
- mmJEE-Eval: A Bilingual Multimodal Benchmark for Evaluating Scientific Reasoning in Vision-Language Models : Abstract: Contemporary vision-language models (VLMs) perform well on existing multimodal reasoning benchmarks (78-85\% accuracy on MMMU, MathVista). Yet, these results fail to sufficiently distinguish...
- Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling : Abstract: Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs. Although recent studies have reduced token consumption t...
- Spider4SSC & S2CLite: A text-to-multi-query-language dataset using lightweight ontology-agnostic SPARQL to Cypher parser : Abstract: We present Spider4SSC dataset and S2CLite parsing tool. S2CLite is a lightweight, ontology-agnostic parser that translates SPARQL queries into Cypher queries, enabling both in-situ and large...
- MTQ-Eval: Multilingual Text Quality Evaluation for Language Models : Abstract: The use of large language models (LLMs) for evaluating outputs is becoming an increasingly effective and scalable approach. However, it remains uncertain whether this capability extends beyo...
- Self-Correcting Large Language Models: Generation vs. Multiple Choice : Abstract: Large language models have recently demonstrated remarkable abilities to self-correct their responses through iterative refinement, often referred to as self-consistency or self-reflection. ...
- AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment : Abstract: Offline preference optimization offers a simpler and more stable alternative to RLHF for aligning language models. However, their effectiveness is critically dependent on ranking accuracy, a...
- Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque : Abstract: Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource languages, compara...
- CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling : Abstract: The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in t...
- GSAP-ERE: Fine-Grained Scholarly Entity and Relation Extraction Focused on Machine Learning : Abstract: Research in Machine Learning (ML) and AI evolves rapidly. Information Extraction (IE) from scientific publications enables to identify information about research concepts and resources on a ...
- Readability Measures and Automatic Text Simplification: In the Search of a Construct : Abstract: Readability is a key concept in the current era of abundant written information. To help making texts more readable and make information more accessible to everyone, a line of researched aim...
- SynClaimEval: A Framework for Evaluating the Utility of Synthetic Data in Long-Context Claim Verification : Abstract: Large Language Models (LLMs) with extended context windows promise direct reasoning over long documents, reducing the need for chunking or retrieval. Constructing annotated resources for tra...
- Conversational Agents for Building Energy Efficiency -- Advising Housing Cooperatives in Stockholm on Reducing Energy Consumption : Abstract: Housing cooperative is a common type of multifamily building ownership in Sweden. Although this ownership structure grants decision-making autonomy, it places a burden of responsibility on c...
- AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews : Abstract: This paper analyses AI-generated podcasts produced by Google's NotebookLM, which generates audio podcasts with two chatty AI hosts discussing whichever documents a user uploads. While AI-gen...
- The Double Contingency Problem: AI Recursion and the Limits of Interspecies Understanding : Abstract: Current bioacoustic AI systems achieve impressive cross-species performance by processing animal communication through transformer architectures, foundation model paradigms, and other comput...
- AI Founding Fathers: A Case Study of GIS Search in Multi-Agent Pipelines : Abstract: Although Large Language Models (LLMs) show exceptional fluency, efforts persist to extract stronger reasoning capabilities from them. Drawing on search-based interpretations of LLM computati...
- Solving a Million-Step LLM Task with Zero Errors : Abstract: LLMs have achieved remarkable breakthroughs in reasoning, insights, and tool use, but chaining these abilities into extended processes at the scale of those routinely executed by humans, org...
- History-Aware Reasoning for GUI Agents : Abstract: Advances in Multimodal Large Language Models have significantly enhanced Graphical User Interface (GUI) automation. Equipping GUI agents with reliable episodic reasoning capabilities is esse...
- Taming Object Hallucinations with Verified Atomic Confidence Estimation : Abstract: Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce ...
- End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering : Abstract: Significant progress has been made in spoken question answering (SQA) in recent years. However, many existing methods, including large audio language models, struggle with processing long au...
- Not Everything That Counts Can Be Counted: A Case for Safe Qualitative AI : Abstract: Artificial intelligence (AI) and large language models (LLM) are reshaping science, with most recent advances culminating in fully-automated scientific discovery pipelines. But qualitative r...
- NaturalTurn: A Method to Segment Speech into Psychologically Meaningful Conversational Turns : Abstract: Conversation is a subject of increasing interest in the social, cognitive, and computational sciences. Yet as conversational datasets continue to increase in size and complexity, researchers...
- Evaluating Deep Unlearning in Large Language Models : Abstract: Machine unlearning has emerged as an important component in developing safe and trustworthy models. Prior work on fact unlearning in LLMs has mostly focused on removing a specified target fa...
- Large Language Model Benchmarks in Medical Tasks : Abstract: With the increasing application of large language models (LLMs) in the medical domain, evaluating these models' performance using benchmark datasets has become crucial. This paper presents a...
- OpenGenAlign: A Preference Dataset and Benchmark for Trustworthy Reward Modeling in Open-Ended, Long-Context Generation : Abstract: Reward Modeling is critical in evaluating and improving the generation of Large Language Models (LLMs). While numerous recent works have shown its feasibility in improving safety, helpfulnes...
- How Linguistics Learned to Stop Worrying and Love the Language Models : Abstract: Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don't really learn language and also that, even if they did, that would not be informati...
- Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment : Abstract: Argument mining algorithms analyze the argumentative structure of essays, making them a valuable tool for enhancing education by providing targeted feedback on the students' argumentation sk...
- Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs : Abstract: Large language models (LLMs) have demonstrated remarkable performance across diverse tasks by encoding vast amounts of factual knowledge. However, they are still prone to hallucinations, gen...
- SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors : Abstract: While voice technologies increasingly serve aging populations, current systems exhibit significant performance gaps due to inadequate training data capturing elderly-specific vocal character...
- MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization : Abstract: Large language models (LLMs) typically operate in a question-answering paradigm, where the quality of the input prompt critically affects the response. Automated Prompt Optimization (APO) ai...
- IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models : Abstract: Large language models (LLMs) have demonstrated strong instruction-following capabilities in text-based tasks. However, this ability often deteriorates in multimodal models after alignment wi...
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High : Abstract: This paper examines how LLMs handle false presuppositions and whether certain linguistic factors influence their responses to falsely presupposed content. Presuppositions subtly introduce in...
- Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models : Abstract: Stealthy data poisoning during fine-tuning can backdoor large language models (LLMs), threatening downstream safety. Existing detectors either use classifier-style probability signals--ill-s...
- anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding : Abstract: The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. However, existing ECG-focused MLLMs primarily focus on r...
- ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models : Abstract: Although demonstrating remarkable performance on reasoning tasks, Large Language Models (LLMs) still tend to fabricate unreliable responses when confronted with problems that are unsolvable ...
- Positional Bias in Long-Document Ranking: Impact, Assessment, and Mitigation : Abstract: We tested over 20 Transformer models for ranking long documents (including recent LongP models trained with FlashAttention and RankGPT models "powered" by OpenAI and Anthropic cloud APIs). W...
- LLM4AD: Large Language Models for Autonomous Driving - Concept, Review, Benchmark, Experiments, and Future Trends : Abstract: With the broader adoption and highly successful development of Large Language Models (LLMs), there has been growing interest and demand for applying LLMs to autonomous driving technology. Dr...
- Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy : Abstract: With the recent remarkable advancement of large language models (LLMs), there has been a growing interest in utilizing them in the domains with highly sensitive data that lies outside their ...
- Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation : Abstract: Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to...
- Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework : Abstract: Inpainting-based talking face generation aims to preserve video details such as pose, lighting, and gestures while modifying only lip motion, often using an identity reference image to maint...
- Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Network : Abstract: Grassmannian manifold offers a powerful carrier for geometric representation learning by modelling high-dimensional data as low-dimensional subspaces. However, existing approaches predominan...
- CADIC: Continual Anomaly Detection Based on Incremental Coreset : Abstract: The primary objective of Continual Anomaly Detection (CAD) is to learn the normal patterns of new tasks under dynamic data distribution assumptions while mitigating catastrophic forgetting. ...
- Predict and Resist: Long-Term Accident Anticipation under Sensor Noise : Abstract: Accident anticipation is essential for proactive and safe autonomous driving, where even a brief advance warning can enable critical evasive actions. However, two key challenges hinder real-...
- RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation : Abstract: Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos. However, existing methods are trained only on annotated object pairs and lack guidance for non-r...
- Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding : Abstract: We introduce a novel formulation of visual privacy preservation for video foundation models that operates entirely in the latent space. While spatio-temporal features learned by foundation m...
- Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification : Abstract: Image classification systems often inherit biases from uneven group representation in training data. For example, in face datasets for hair color classification, blond hair may be disproport...
- WiCV at CVPR 2025: The Women in Computer Vision Workshop : Abstract: The Women in Computer Vision Workshop (WiCV@CVPR 2025) was held in conjunction with the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) in Nashville, Tennessee, Un...
- Adaptive graph Kolmogorov-Arnold network for 3D human pose estimation : Abstract: Graph convolutional network (GCN)-based methods have shown strong performance in 3D human pose estimation by leveraging the natural graph structure of the human skeleton. However, their loca...
- SIFT-Graph: Benchmarking Multimodal Defense Against Image Adversarial Attacks With Robust Feature Graph : Abstract: Adversarial attacks expose a fundamental vulnerability in modern deep vision models by exploiting their dependence on dense, pixel-level representations that are highly sensitive to impercep...
- DT-NVS: Diffusion Transformers for Novel View Synthesis : Abstract: Generating novel views of a natural scene, e.g., every-day scenes both indoors and outdoors, from a single view is an under-explored problem, even though it is an organic extension to the ob...
- Enhancing Rotation-Invariant 3D Learning with Global Pose Awareness and Attention Mechanisms : Abstract: Recent advances in rotation-invariant (RI) learning for 3D point clouds typically replace raw coordinates with handcrafted RI features to ensure robustness under arbitrary rotations. However...
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? : Abstract: The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a hig...
- Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery : Abstract: Scientific modeling faces a tradeoff: mechanistic models provide scientific grounding but struggle with real-world complexity, while machine learning models achieve strong predictive perform...
- Generalization Bounds for Rank-sparse Neural Networks : Abstract: It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with...
- Background Invariance Testing According to Semantic Proximity : Abstract: In many applications, machine-learned (ML) models are required to hold some invariance qualities, such as rotation, size, and intensity invariance. Among these, testing for background invari...
- Arc travel time and path choice model estimation subsumed : Abstract: We address the problem of simultaneously estimating arc travel times in a network \emph{and} parameters of route choice models for strategic and tactical network planning purposes. Hitherto,...
- Online Ensemble Learning for Sector Rotation: A Gradient-Free Framework : Abstract: We propose a gradient-free online ensemble learning algorithm that dynamically combines forecasts from a heterogeneous set of machine learning models based on their recent predictive perform...
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses : Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are l...
- Bandit Convex Optimisation : Abstract: Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. This book covers the many tools used for this problem, including cutting plane methods, i...
- Proximal Oracles for Optimization and Sampling : Abstract: We consider convex optimization with non-smooth objective function and log-concave sampling with non-smooth potential (negative log density). In particular, we study two specific settings wh...
- Simulating Non-Markovian Open Quantum Dynamics with Neural Quantum States : Abstract: Reducing computational scaling for simulating non-Markovian dissipative dynamics using artificial neural networks is both a major focus and formidable challenge in open quantum systems. To e...
- Waveform Design for Over-the-Air Computing : Abstract: In response to the increasing number of devices expected in next-generation networks, a shift to over-the-air (OTA) computing has been proposed. By leveraging the superposition of multiple a...
- ElicitationGPT: Text Elicitation Mechanisms via Language Models : Abstract: Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information. This paper ...
- SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration : Abstract: Here we show that a general-purpose large language model (LLM) chatbot, Llama-3.1-8B-Instruct, can be transformed via supervised fine-tuning of engineered prompts into a chemical language mo...
- Dataset-Free Weight-Initialization on Restricted Boltzmann Machine : Abstract: In feed-forward neural networks, dataset-free weight-initialization methods such as LeCun, Xavier (or Glorot), and He initializations have been developed. These methods randomly determine th...
- A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms : Abstract: Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and ...
- Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models : Abstract: Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, w...
- Federated Variational Inference for Bayesian Mixture Models : Abstract: We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled 'divide and conquer' inference procedur...
- UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning : Abstract: Large Language Model (LLM) agents equipped with external tools have become increasingly powerful for complex tasks such as web shopping, automated email replies, and financial trading. Howev...
- Learning conformational ensembles of proteins based on backbone geometry : Abstract: Deep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics s...
- CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design : Abstract: Reinforcement fine-tuning played an instrumental role in enhancing the instruction-following and reasoning abilities of large language models. In this work, we employ reinforcement fine-tuni...
- Limits of Discrete Energy of Families of Increasing Sets : Abstract: The Hausdorff dimension of a set can be detected using the Riesz energy. Here, we consider situations where a sequence of points, $\{x_n\}$, ``fills in'' a set $E \subset \mathbb{R}^d$ in an...
- A Bayesian Approach to Segmentation with Noisy Labels via Spatially Correlated Distributions : Abstract: In semantic segmentation, the accuracy of models heavily depends on the high-quality annotations. However, in many practical scenarios, such as medical imaging and remote sensing, obtaining ...
- Continuous Symmetry Discovery and Enforcement Using Infinitesimal Generators of Multi-parameter Group Actions : Abstract: Symmetry-informed machine learning can exhibit advantages over machine learning which fails to account for symmetry. In the context of continuous symmetry detection, current state of the art...
- FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding : Abstract: In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG lev...
- Tight Bounds for Answering Adaptively Chosen Concentrated Queries : Abstract: Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some stru...
- Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper : Abstract: Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the crit...
- Where did you get that? Towards Summarization Attribution for Analysts : Abstract: Analysts require attribution, as nothing can be reported without knowing the source of the information. In this paper, we will focus on automatic methods for attribution, linking each senten...
- The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions
- Knowledge Graph Analysis of Legal Understanding and Violations in LLMs
- Diverse Preference Learning for Capabilities and Alignment : Abstract: The ability of LLMs to represent diverse perspectives is critical as they increasingly impact society. However, recent studies reveal that alignment algorithms such as RLHF and DPO significa...
- Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning
- What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge
- Self-HarmLLM: Can Large Language Model Harm Itself?
- OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking : Abstract: Knowledge-intensive question answering is central to large language models (LLMs) and is typically assessed using static benchmarks derived from sources like Wikipedia and textbooks. However...
- Retrieval-Augmented Generation of Pediatric Speech-Language Pathology vignettes: A Proof-of-Concept Study
- Evaluating DisCoCirc in Translation Tasks & its Limitations: A Comparative Study Between Bengali & English
- Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice
- A Super-Learner with Large Language Models for Medical Emergency Advising
- Structured Uncertainty guided Clarification for LLM Agents
- Toward Automated Cognitive Assessment in Parkinson's Disease Using Pretrained Language Models : Abstract: Understanding how individuals with Parkinson's disease (PD) describe cognitive experiences in their daily lives can offer valuable insights into disease-related cognitive and emotional chang...
- BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference : Abstract: Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsis...
- Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents : Abstract: Conversational agents have traditionally been developed for either task-oriented dialogue (TOD) or open-ended chitchat, with limited progress in unifying the two. Yet, real-world conversatio...
- BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation : Abstract: Hypothesis generation in biomedical research has traditionally centered on uncovering hidden relationships within vast scientific literature, often using methods like Literature-Based Discov...
- Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models : Abstract: Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in cita...
- HalluClean: A Unified Framework to Combat Hallucinations in LLMs : Abstract: Large language models (LLMs) have achieved impressive performance across a wide range of natural language processing tasks, yet they often produce hallucinated content that undermines factua...
- TiDAR: Think in Diffusion, Talk in Autoregression : Abstract: Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with lang...
- Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants : Abstract: Heterophilous graphs, where dissimilar nodes tend to connect, pose a challenge for graph neural networks (GNNs). Increasing the GNN depth can expand the scope (i.e., receptive field), potent...
- ExDBN: Learning Dynamic Bayesian Networks using Extended Mixed-Integer Programming Formulations : Abstract: Causal learning from data has received much attention recently. Bayesian networks can be used to capture causal relationships. There, one recovers a weighted directed acyclic graph in which ...
- Conditional Distribution Learning for Graph Classification : Abstract: Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive...
- Certified Training with Branch-and-Bound for Lyapunov-stable Neural Control : Abstract: We study the problem of learning verifiably Lyapunov-stable neural controllers that provably satisfy the Lyapunov asymptotic stability condition within a region-of-attraction (ROA). Unlike p...
- A Physics-Constrained Neural Differential Equation Framework for Data-Driven Snowpack Simulation : Abstract: This paper presents a physics-constrained neural differential equation framework for parameterization, and employs it to model the time evolution of seasonal snow depth given hydrometeorolog...
- Trustworthy Transfer Learning: A Survey : Abstract: Transfer learning aims to transfer knowledge or information from a source domain to a relevant target domain. In this paper, we understand transfer learning from the perspectives of knowledg...
- AutoG: Towards automatic graph construction from tabular data : Abstract: Recent years have witnessed significant advancements in graph machine learning (GML), with its applications spanning numerous domains. However, the focus of GML has predominantly been on dev...
- Graph Contrastive Learning for Connectome Classification : Abstract: With recent advancements in non-invasive techniques for measuring brain activity, such as magnetic resonance imaging (MRI), the study of structural and functional brain networks through grap...
- Contextual Thompson Sampling via Generation of Missing Data : Abstract: We introduce a framework for Thompson sampling (TS) contextual bandit algorithms, in which the algorithm's ability to quantify uncertainty and make decisions depends on the quality of a gene...
- Mixture of Message Passing Experts with Routing Entropy Regularization for Node Classification : Abstract: Graph neural networks (GNNs) have achieved significant progress in graph-based learning tasks, yet their performance often deteriorates when facing heterophilous structures where connected n...
- Ultrametric Cluster Hierarchies: I Want 'em All! : Abstract: Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas ...
- A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning : Abstract: Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. Proximal policy optimization (PPO) is the most popul...
- How Well Can Differential Privacy Be Audited in One Run? : Abstract: Recent methods for auditing the privacy of machine learning algorithms have improved computational efficiency by simultaneously intervening on multiple training examples in a single training...
- What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models : Abstract: How should we evaluate the quality of generative models? Many existing metrics focus on a model's producibility, i.e. the quality and breadth of outputs it can generate. However, the actual ...
- Evolutionary Policy Optimization : Abstract: On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additi...
- A Causal Framework to Measure and Mitigate Non-binary Treatment Discrimination : Abstract: Fairness studies of algorithmic decision-making systems often simplify complex decision processes, such as bail or loan approvals, into binary classification tasks. However, these approaches...
- TAMIS: Tailored Membership Inference Attacks on Synthetic Data : Abstract: Membership Inference Attacks (MIA) enable to empirically assess the privacy of a machine learning algorithm. In this paper, we propose TAMIS, a novel MIA against differentially-private synth...
- Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks : Abstract: Are traditional classification approaches irrelevant in this era of AI hype? We show that there are multiclass classification problems where predictive models holistically outperform LLM pro...
- GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases : Abstract: Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retriev...
- Repetitive Contrastive Learning Enhances Mamba's Selectivity in Time Series Prediction : Abstract: Long sequence prediction is a key challenge in time series forecasting. While Mamba-based models have shown strong performance due to their sequence selection capabilities, they still strugg...
- Integration Matters for Learning PDEs with Backwards SDEs : Abstract: Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial different...
- RefiDiff: Progressive Refinement Diffusion for Efficient Missing Data Imputation : Abstract: Missing values in high-dimensional, mixed-type datasets pose significant challenges for data imputation, particularly under Missing Not At Random (MNAR) mechanisms. Existing methods struggle...
- Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity : Abstract: Accelerating large language model (LLM) inference is critical for real-world deployments requiring high throughput and low latency. Contextual sparsity, where each token dynamically activate...
- Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning : Abstract: Policy-based methods currently dominate reinforcement learning (RL) pipelines for large language model (LLM) reasoning, leaving value-based approaches largely unexplored. We revisit the clas...
- Solver-Free Decision-Focused Learning for Linear Optimization Problems : Abstract: Mathematical optimization is a fundamental tool for decision-making in a wide range of applications. However, in many real-world scenarios, the parameters of the optimization problem are not...
- An empirical study of task and feature correlations in the reuse of pre-trained models : Abstract: Pre-trained neural networks are commonly used and reused in the machine learning community. Alice trains a model for a particular task, and a part of her neural network is reused by Bob for ...
- RiemannFormer: A Framework for Attention in Curved Spaces : Abstract: This research endeavors to offer insights into unlocking the further potential of transformer-based architectures. One of the primary motivations is to offer a geometric interpretation for t...
- STOAT: Spatial-Temporal Probabilistic Causal Inference Network : Abstract: Spatial-temporal causal time series (STC-TS) involve region-specific temporal observations driven by causally relevant covariates and interconnected across geographic or network-based spaces...
- From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm : Abstract: SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine,...
- Diffusion-based Sinogram Interpolation for Limited Angle PET : Abstract: Accurate PET imaging increasingly requires methods that support unconstrained detector layouts from walk-through designs to long-axial rings where gaps and open sides lead to severely unders...
- Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm
- Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy
- Spatio-Temporal Graph Unlearning
- Probing then Editing: A Push-Pull Framework for Retain-Free Machine Unlearning in Industrial IoT
- Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems : Abstract: Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled...
- Several Supporting Evidences for the Adaptive Feature Program
- Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders
- LLM-Guided Dynamic-UMAP for Personalized Federated Graph Learning
- How does the Performance of the Data-driven Traffic Flow Forecasting Models deteriorate with Increasing Forecasting Horizon? An Extensive Approach Considering Statistical, Machine Learning and Deep Learning Models : Abstract: With rapid urbanization in recent decades, traffic congestion has intensified due to increased movement of people and goods. As planning shifts from demand-based to supply-oriented strategie...
- Enhancing Explainability in Solar Energetic Particle Event Prediction: A Global Feature Mapping Approach : Abstract: Solar energetic particle (SEP) events, as one of the most prominent manifestations of solar activity, can generate severe hazardous radiation when accelerated by solar flares or shock waves ...
- Latent Planning via Embedding Arithmetic: A Contrastive Approach to Strategic Reasoning : Abstract: Planning in high-dimensional decision spaces is increasingly being studied through the lens of learned representations. Rather than training policies or value heads, we investigate whether p...
- AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting : Abstract: Reinforcement learning (RL) has demonstrated considerable potential for enhancing reasoning in large language models (LLMs). However, existing methods suffer from Gradient Starvation and Pol...
- PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness : Abstract: Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the store...
- AutoSynth: Automated Workflow Optimization for High-Quality Synthetic Dataset Generation via Monte Carlo Tree Search : Abstract: Supervised fine-tuning (SFT) of large language models (LLMs) for specialized tasks requires high-quality datasets, but manual curation is prohibitively expensive. Synthetic data generation o...
- Quasi-Newton Compatible Actor-Critic for Deterministic Policies : Abstract: In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature i...
- GenePheno: Interpretable Gene Knockout-Induced Phenotype Abnormality Prediction from Gene Sequences : Abstract: Exploring how genetic sequences shape phenotypes is a fundamental challenge in biology and a key step toward scalable, hypothesis-driven experimentation. The task is complicated by the large...
- Event-Driven Digital-Time-Domain Inference Architectures for Tsetlin Machines : Abstract: Machine learning fits model parameters to approximate input-output mappings, predicting unknown samples. However, these models often require extensive arithmetic computations during inferenc...
- SiDGen: Structure-informed Diffusion for Generative modeling of Ligands for Proteins : Abstract: Designing ligands that are both chemically valid and structurally compatible with protein binding pockets is a key bottleneck in computational drug discovery. Existing approaches either igno...
- NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages : Abstract: We introduce Negative Space Learning MT (NSL-MT), a training method that teaches models what not to generate by encoding linguistic constraints as severity-weighted penalties in the loss fun...
- Extrapolation to infinite model space of no-core shell model calculations using machine learning : Abstract: An ensemble of neural networks is employed to extrapolate no-core shell model (NCSM) results to infinite model space for light nuclei. We present a review of our neural network extrapolation...
- Explainable Federated Learning for U.S. State-Level Financial Distress Modeling : Abstract: We present the first application of federated learning (FL) to the U.S. National Financial Capability Study, introducing an interpretable framework for predicting consumer financial distress...
- GMTRouter: Personalized LLM Router over Multi-turn User Interactions : Abstract: Large Language Model (LLM) routing has demonstrated strong capability in balancing response quality with computational cost. As users exhibit diverse preferences, personalization has attract...
- Case Study: Transformer-Based Solution for the Automatic Digitization of Gas Plants : Abstract: The energy transition is a key theme of the last decades to determine a future of eco-sustainability, and an area of such importance cannot disregard digitization, innovation and the new tec...
- MoE-GraphSAGE-Based Integrated Evaluation of Transient Rotor Angle and Voltage Stability in Power Systems : Abstract: The large-scale integration of renewable energy and power electronic devices has increased the complexity of power system stability, making transient stability assessment more challenging. C...
- Learning based Modelling of Throttleable Engine Dynamics for Lunar Landing Mission : Abstract: Typical lunar landing missions involve multiple phases of braking to achieve soft-landing. The propulsion system configuration for these missions consists of throttleable engines. This confi...
- A Multi-Drone Multi-View Dataset and Deep Learning Framework for Pedestrian Detection and Tracking : Abstract: Multi-drone surveillance systems offer enhanced coverage and robustness for pedestrian tracking, yet existing approaches struggle with dynamic camera positions and complex occlusions. This p...
- Reasoning on Time-Series for Financial Technical Analysis : Abstract: While Large Language Models have been used to produce interpretable stock forecasts, they mainly focus on analyzing textual reports but not historical price data, also known as Technical Ana...
- , Forget Less: A Gradient-Aware Data Selection Approach for LLM : Abstract: Despite large language models (LLMs) have achieved impressive achievements across numerous tasks, supervised fine-tuning (SFT) remains essential for adapting these models to specialized doma...
- Multi-period Learning for Financial Time Series Forecasting : Abstract: Time series forecasting is important in finance domain. Financial time series (TS) patterns are influenced by both short-term public opinions and medium-/long-term policy and market trends. ...
- Cross-Field Interface-Aware Neural Operators for Multiphase Flow Simulation : Abstract: Multiphase flow systems, with their complex dynamics, field discontinuities, and interphase interactions, pose significant computational challenges for traditional numerical solvers. While n...
- Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising : Abstract: Diffusion-based video generation can create realistic videos, yet existing image- and text-based conditioning fails to offer precise motion control. Prior methods for motion-conditioned synt...
- Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism : Abstract: Worldwide, suicide is the second leading cause of death for adolescents with past suicide attempts to be an important predictor for increased future suicides. While some people with suicidal...
- Pattern Recognition of Scrap Plastic Misclassification in Global Trade Data : Abstract: We propose an interpretable machine learning framework to help identify trade data discrepancies that are challenging to detect with traditional methods. Our system analyzes trade data to fi...
- Compact Artificial Neural Network Models for Predicting Protein Residue - RNA Base Binding : Abstract: Large Artificial Neural Network (ANN) models have demonstrated success in various domains, including general text and image generation, drug discovery, and protein-RNA (ribonucleic acid) bin...
- "It Looks All the Same to Me": Cross-index Training for Long-term Financial Series Prediction : Abstract: We investigate a number of Artificial Neural Network architectures (well-known and more ``exotic'') in application to the long-term financial time-series forecasts of indexes on different gl...
- Practical and Performant Enhancements for Maximization of Algebraic Connectivity : Abstract: Long-term state estimation over graphs remains challenging as current graph estimation methods scale poorly on large, long-term graphs. To address this, our work advances a current state-of-...
- Automated Hardware Trojan Insertion in Industrial-Scale Designs : Abstract: Industrial Systems-on-Chips (SoCs) often comprise hundreds of thousands to millions of nets and millions to tens of millions of connectivity edges, making empirical evaluation of hardware-Tr...
- Rethinking generative image pretraining: How far are we from scaling up next-pixel prediction? : Abstract: This paper investigates the scaling properties of autoregressive next-pixel prediction, a simple, end-to-end yet under-explored framework for unified vision models. Starting with images at r...
- Optimal Control of the Future via Prospective Foraging : Abstract: Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in either reinforcement learning or online learning. While powerful, these ...
- Practical considerations when designing an online learning algorithm for an app-based mHealth intervention : Abstract: The ubiquitous nature of mobile health (mHealth) technology has expanded opportunities for the integration of reinforcement learning into traditional clinical trial designs, allowing researc...
- Intuitive Programming, Adaptive Task Planning, and Dynamic Role Allocation in Human-Robot Collaboration : Abstract: Remarkable capabilities have been achieved by robotics and AI, mastering complex tasks and environments. Yet, humans often remain passive observers, fascinated but uncertain how to engage. R...
- A Deep Learning-Based Method for Fully Coupled Non-Markovian FBSDEs with Applications : Abstract: In this work, we extend deep learning-based numerical methods to fully coupled forward-backward stochastic differential equations (FBSDEs) within a non-Markovian framework. Error estimates a...
- Vector Symbolic Algebras for the Abstraction and Reasoning Corpus : Abstract: The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a generative, few-shot fluid intelligence benchmark. Although humans effortlessly solve ARC-AGI, it rema...
- WATSON-Net: Vetting, Validation, and Analysis of Transits from Space Observations with Neural Networks : Abstract: Context. As the number of detected transiting exoplanet candidates continues to grow, the need for robust and scalable automated tools to prioritize or validate them has become increasingly ...
- The Probably Approximately Correct Learning Model in Computational Learning Theory : Abstract: This survey paper gives an overview of various known results on learning classes of Boolean functions in Valiant's Probably Approximately Correct (PAC) learning model and its commonly studie...
- Effects of label noise on the classification of outlier observations : Abstract: This study investigates the impact of adding noise to the training set classes in classification tasks using the BCOPS algorithm (Balanced and Conformal Optimized Prediction Sets), proposed ...
- A Neural-Operator Preconditioned Newton Method for Accelerated Nonlinear Solvers : Abstract: We propose a novel neural preconditioned Newton (NP-Newton) method for solving parametric nonlinear systems of equations. To overcome the stagnation or instability of Newton iterations cause...
- Learning-based Radio Link Failure Prediction Based on Measurement Dataset in Railway Environments : Abstract: In this paper, a measurement-driven framework is proposed for early radio link failure (RLF) prediction in 5G non-standalone (NSA) railway environments. Using 10 Hz metro-train traces with s...
- DRL-Based Beam Positioning for LEO Satellite Constellations with Weighted Least Squares : Abstract: In this paper, we propose a reinforcement learning based beam weighting framework that couples a policy network with an augmented weighted least squares (WLS) estimator for accurate and low-...
- When is a System Discoverable from Data? Discovery Requires Chaos : Abstract: The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational dat...
- Classifying Histopathologic Glioblastoma Sub-regions with EfficientNet : Abstract: Glioblastoma (GBM) is the most common aggressive, fast-growing brain tumor, with a grim prognosis. Despite clinical diagnostic advancements, there have not been any substantial improvements ...
- Boosting Adversarial Transferability via Ensemble Non-Attention : Abstract: Ensemble attacks integrate the outputs of surrogate models with diverse architectures, which can be combined with various gradient-based attacks to improve adversarial transferability. Howev...
- MicroEvoEval: A Systematic Evaluation Framework for Image-Based Microstructure Evolution Prediction : Abstract: Simulating microstructure evolution (MicroEvo) is vital for materials design but demands high numerical accuracy, efficiency, and physical fidelity. Although recent studies on deep learning ...
- A Finite Difference Approximation of Second Order Regularization of Neural-SDFs : Abstract: We introduce a finite-difference framework for curvature regularization in neural signed distance field (SDF) learning. Existing approaches enforce curvature priors using full Hessian inform...
- DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks : Abstract: Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are...
- Robust Sampling for Active Statistical Inference : Abstract: Active statistical inference is a new method for inference with AI-assisted data collection. Given a budget on the number of labeled data points that can be collected and assuming access to ...
- Generalisable prediction model of surgical case duration: multicentre development and temporal validation : Abstract: Background: Accurate prediction of surgical case duration underpins operating room (OR) scheduling, yet existing models often depend on site- or surgeon-specific inputs and rarely undergo ex...
- Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation : Abstract: Self-consuming generative models have received significant attention over the last few years. In this paper, we study a self-consuming generative model with heterogeneous preferences that is...
- A Neurosymbolic Approach to Natural Language Formalization and Verification : Abstract: Large Language Models perform well at natural language interpretation and reasoning, but their inherent stochasticity limits their adoption in regulated industries like finance and healthcar...
- Assumed Density Filtering and Smoothing with Neural Network Surrogate Models : Abstract: The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncert...
- DeepVRegulome: DNABERT-based deep-learning framework for predicting the functional impact of short genomic variants on the human regulome : Abstract: Whole-genome sequencing (WGS) has revealed numerous non-coding short variants whose functional impacts remain poorly understood. Despite recent advances in deep-learning genomic approaches, ...
- PAN: A World Model for General, Interactable, and Long-Horizon World Simulation : Abstract: A world model enables an intelligent agent to imagine, predict, and reason about how the world evolves in response to its actions, and accordingly to plan and strategize. While recent video ...
- VAE-Based Synthetic EMG Generation with Mix-Consistency Loss for Recognizing Unseen Motion Combinations : Abstract: Electromyogram (EMG)-based motion classification using machine learning has been widely employed in applications such as prosthesis control. While previous studies have explored generating s...
- Learning to Validate Generative Models: a Goodness-of-Fit Approach : Abstract: Generative models are increasingly central to scientific workflows, yet their systematic use and interpretation require a proper understanding of their limitations through rigorous validatio...
- LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls : Abstract: Augmenting Large Language Models (LLMs) with external tools enables them to execute complex, multi-step tasks. However, tool learning is hampered by the static synthetic data pipelines where...
- Scalable Mixed-Integer Optimization with Neural Constraints via Dual Decomposition : Abstract: Embedding deep neural networks (NNs) into mixed-integer programs (MIPs) is attractive for decision making with learned constraints, yet state-of-the-art monolithic linearisations blow up in ...
- Resource-Efficient Variational Quantum Classifier : Abstract: Quantum computing promises a revolution in information processing, with significant potential for machine learning and classification tasks. However, achieving this potential requires overco...
- Robust Least-Squares Optimization for Data-Driven Predictive Control: A Geometric Approach : Abstract: The paper studies a geometrically robust least-squares problem that extends classical and norm-based robust formulations. Rather than minimizing residual error for fixed or perturbed data, w...
- From Model Training to Model Raising - A call to reform AI model training paradigms from post-hoc alignment to intrinsic, identity-based development : Abstract: Current AI training methods align models with human values only after their core capabilities have been established, resulting in models that are easily misaligned and lack deep-rooted value...
- AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness : Abstract: We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural lan...
- Routesplain: Towards Faithful and Intervenable Routing for Software-related Tasks : Abstract: LLMs now tackle a wide range of software-related tasks, yet we show that their performance varies markedly both across and within these tasks. Routing user queries to the appropriate LLMs ca...
- The 2025 Planning Performance of Frontier Large Language Models : Abstract: The capacity of Large Language Models (LLMs) for reasoning remains an active area of research, with the capabilities of frontier models continually advancing. We provide an updated evaluatio...
- BIG5-TPoT: Predicting BIG Five Personality Traits, Facets, and Items Through Targeted Preselection of Texts : Abstract: Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effecti...
- Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach : Abstract: We propose an optimization model where a parking lot manager (PLM) can aggregate parked EV batteries to provide virtual energy storage services that are provably robust under uncertain EV de...
- MCAD: Multimodal Context-Aware Audio Description Generation For Soccer : Abstract: Audio Descriptions (AD) are essential for making visual content accessible to individuals with visual impairments. Recent works have shown a promising step towards automating AD, but they ha...
- Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions : Abstract: Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and dis...
- A general framework for adaptive nonparametric dimensionality reduction : Abstract: Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embedding...
- Consensus Sampling for Safer Generative AI : Abstract: Many approaches to AI safety rely on inspecting model outputs or activations, yet certain risks are inherently undetectable by inspection alone. We propose a complementary, architecture-agno...
- Distributional Shrinkage I: Universal Denoisers in Multi-Dimensions : Abstract: We revisit the problem of denoising from noisy measurements where only the noise level is known, not the noise distribution. In multi-dimensions, independent noise $Z$ corrupts the signal $X...
- LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication : Abstract: As large language models (LLMs) continue to grow in size, distributed inference has become increasingly important. Model-parallel strategies must now efficiently scale not only across multip...
- IFG: Internet-Scale Guidance for Functional Grasping Generation : Abstract: Large Vision Models trained on internet-scale data have demonstrated strong capabilities in segmenting and semantically understanding object parts, even in cluttered, crowded scenes. However...
- ReactionTeam: Teaming Experts for Divergent Thinking Beyond Typical Reaction Patterns : Abstract: Reaction prediction, a critical task in synthetic chemistry, is to predict the outcome of a reaction based on given reactants. Generative models like Transformer have typically been employed...
- CSAI: Conditional Self-Attention Imputation for Healthcare Time-series : Abstract: We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in mult...
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback : Abstract: Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adve...
- Adaptive Data Analysis for Growing Data : Abstract: Reuse of data in adaptive workflows poses challenges regarding overfitting and the statistical validity of results. Previous work has demonstrated that interacting with data via differential...
- An Information Theoretic Evaluation Metric For Strong Unlearning : Abstract: Machine unlearning (MU) aims to remove the influence of specific data from trained models, addressing privacy concerns and ensuring compliance with regulations such as the ``right to be forg...
- TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data : Abstract: As attention to recorded data grows in the realm of automotive testing and manual evaluation reaches its limits, there is a growing need for automatic online anomaly detection. This real-wor...
- A Lightweight CNN-Attention-BiLSTM Architecture for Multi-Class Arrhythmia Classification on Standard and Wearable ECGs : Abstract: Early and accurate detection of cardiac arrhythmias is vital for timely diagnosis and intervention. We propose a lightweight deep learning model combining 1D Convolutional Neural Networks (C...
- Accelerating Training Speed of Tiny Recursive Models via Curriculum Guided Adaptive Recursion : Abstract: Recursive reasoning models achieve remarkable performance on complex reasoning tasks through iterative refinement, enabling tiny networks to match large language models thousands of times th...
- Learning the Basis: A Kolmogorov-Arnold Network Approach Embedding Green's Function Priors : Abstract: The Method of Moments (MoM) is constrained by the usage of static, geometry-defined basis functions, such as the Rao-Wilton-Glisson (RWG) basis. This letter reframes electromagnetic modeling...
- TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models : Abstract: The first tabular foundation model, TabPFN, and its successor TabPFNv2 have impacted tabular AI substantially, with dozens of methods building on it and hundreds of applications across diffe...
- PEGNet: A Physics-Embedded Graph Network for Long-Term Stable Multiphysics Simulation : Abstract: Accurate and efficient simulations of physical phenomena governed by partial differential equations (PDEs) are important for scientific and engineering progress. While traditional numerical ...
- FAIRPLAI: A Human-in-the-Loop Approach to Fair and Private Machine Learning : Abstract: As machine learning systems move from theory to practice, they are increasingly tasked with decisions that affect healthcare access, financial opportunities, hiring, and public services. In ...
- Benevolent Dictators? On LLM Agent Behavior in Dictator Games : Abstract: In behavioral sciences, experiments such as the ultimatum game are conducted to assess preferences for fairness or self-interest of study participants. In the dictator game, a simplified ver...
- Macroscopic Emission Modeling of Urban Traffic Using Probe Vehicle Data: A Machine Learning Approach : Abstract: Urban congestions cause inefficient movement of vehicles and exacerbate greenhouse gas emissions and urban air pollution. Macroscopic emission fundamental diagram (eMFD)captures an orderly r...
- Gromov-Wasserstein Graph Coarsening : Abstract: We study the problem of graph coarsening within the Gromov-Wasserstein geometry. Specifically, we propose two algorithms that leverage a novel representation of the distortion induced by mer...
- Hey Pentti, We Did (More of) It!: A Vector-Symbolic Lisp With Residue Arithmetic : Abstract: Using Frequency-domain Holographic Reduced Representations (FHRRs), we extend a Vector-Symbolic Architecture (VSA) encoding of Lisp 1.5 with primitives for arithmetic operations using Residu...
- A Generalized Bias-Variance Decomposition for Bregman Divergences : Abstract: The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-varianc...
- BayesQ: Uncertainty-Guided Bayesian Quantization : Abstract: We present BayesQ, an uncertainty-guided post-training quantization framework that is the first to optimize quantization under the posterior expected loss. BayesQ fits a lightweight Gaussian...
- Physics-Informed Machine Learning for Characterizing System Stability : Abstract: In the design and operation of complex dynamical systems, it is essential to ensure that all state trajectories of the dynamical system converge to a desired equilibrium within a guaranteed ...
- TIGER-MARL: Enhancing Multi-Agent Reinforcement Learning with Temporal Information through Graph-based Embeddings and Representations : Abstract: In this paper, we propose capturing and utilizing \textit{Temporal Information through Graph-based Embeddings and Representations} or \textbf{TIGER} to enhance multi-agent reinforcement lear...
- Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering : Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) oft...
- On topological descriptors for graph products : Abstract: Topological descriptors have been increasingly utilized for capturing multiscale structural information in relational data. In this work, we consider various filtrations on the (box) product...
- Rethinking Graph Super-resolution: Dual Frameworks for Topological Fidelity : Abstract: Graph super-resolution, the task of inferring high-resolution (HR) graphs from low-resolution (LR) counterparts, is an underexplored yet crucial research direction that circumvents the need ...
- Decomposition of Small Transformer Models : Abstract: Recent work in mechanistic interpretability has shown that decomposing models in parameter space may yield clean handles for analysis and intervention. Previous methods have demonstrated suc...
- ForeSWE: Forecasting Snow-Water Equivalent with an Uncertainty-Aware Attention Model : Abstract: Various complex water management decisions are made in snow-dominant watersheds with the knowledge of Snow-Water Equivalent (SWE) -- a key measure widely used to estimate the water content o...
- EEG-X: Device-Agnostic and Noise-Robust Foundation Model for EEG : Abstract: Foundation models for EEG analysis are still in their infancy, limited by two key challenges: (1) variability across datasets caused by differences in recording devices and configurations, a...
- Transformer-Based Sleep Stage Classification Enhanced by Clinical Information : Abstract: Manual sleep staging from polysomnography (PSG) is labor-intensive and prone to inter-scorer variability. While recent deep learning models have advanced automated staging, most rely solely ...
- Covariance Scattering Transforms : Abstract: Machine learning and data processing techniques relying on covariance information are widespread as they identify meaningful patterns in unsupervised and unlabeled settings. As a prominent e...
- Spectral Predictability as a Fast Reliability Indicator for Time Series Forecasting Model Selection : Abstract: Practitioners deploying time series forecasting models face a dilemma: exhaustively validating dozens of models is computationally prohibitive, yet choosing the wrong model risks poor perfor...
- FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis : Abstract: Stroke is an acute cerebrovascular disease, and timely diagnosis significantly improves patient survival. However, existing automated diagnosis methods suffer from fairness issues across dem...
- Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting : Abstract: Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dyna...
- DeepDR: an integrated deep-learning model web server for drug repositioning : Abstract: Background: Identifying new indications for approved drugs is a complex and time-consuming process that requires extensive knowledge of pharmacology, clinical data, and advanced computationa...
- Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning : Abstract: In offline reinforcement learning, value overestimation caused by out-of-distribution (OOD) actions significantly limits policy performance. Recently, diffusion models have been leveraged fo...
- TransactionGPT : Abstract: We present TransactionGPT (TGPT), a foundation model for consumer transaction data within one of world's largest payment networks. TGPT is designed to understand and generate transaction tra...
- QIBONN: A Quantum-Inspired Bilevel Optimizer for Neural Networks on Tabular Classification : Abstract: Hyperparameter optimization (HPO) for neural networks on tabular data is critical to a wide range of applications, yet it remains challenging due to large, non-convex search spaces and the c...
- Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation : Abstract: Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often...
- Improving Conditional VAE with approximation using Normalizing Flows : Abstract: Variational Autoencoders and Generative Adversarial Networks remained the state-of-the-art (SOTA) generative models until 2022. Now they are superseded by diffusion based models. Efforts to ...
- Bayesian Mixture of Experts For Large Language Models : Abstract: We present Bayesian Mixture of Experts (Bayesian-MoE), a post-hoc uncertainty estimation framework for fine-tuned large language models (LLMs) based on Mixture-of-Experts architectures. Our ...
- Selective Sinkhorn Routing for Improved Sparse Mixture of Experts : Abstract: Sparse Mixture-of-Experts (SMoE) has gained prominence as a scalable and computationally efficient architecture, enabling significant growth in model capacity without incurring additional in...
- Data reuse enables cost-efficient randomized trials of medical AI models : Abstract: Randomized controlled trials (RCTs) are indispensable for establishing the clinical value of medical artificial-intelligence (AI) tools, yet their high cost and long timelines hinder timely ...
- Fast $k$-means clustering in Riemannian manifolds via Fr\'{e}chet maps: Applications to large-dimensional SPD matrices : Abstract: We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic meth...
- FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks : Abstract: Large Language Models (LLMs) have impressive data fusion and reasoning capabilities for autonomous driving (AD). However, training LLMs for AD faces significant challenges including high com...
- FedSDWC: Federated Synergistic Dual-Representation Weak Causal Learning for OOD : Abstract: Amid growing demands for data privacy and advances in computational infrastructure, federated learning (FL) has emerged as a prominent distributed learning paradigm. Nevertheless, difference...
- Fairness-Aware Few-Shot Learning for Audio-Visual Stress Detection : Abstract: Fairness in AI-driven stress detection is critical for equitable mental healthcare, yet existing models frequently exhibit gender bias, particularly in data-scarce scenarios. To address this...
- GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs : Abstract: Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood ...
- Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback : Abstract: Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems. Dueling bandit (DB) algorithms enable op...
- Guaranteeing Conservation of Integrals with Projection in Physics-Informed Neural Networks : Abstract: We propose a novel projection method that guarantees the conservation of integral quantities in Physics-Informed Neural Networks (PINNs). While the soft constraint that PINNs use to enforce ...
- Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering : Abstract: Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets. Unlike the Euclidean distance of numerical attributes, the categorical attributes lack wel...
- Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies : Abstract: Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dua...
- Factorization-in-Loop: Proximal Fill-in Minimization for Sparse Matrix Reordering : Abstract: Fill-ins are new nonzero elements in the summation of the upper and lower triangular factors generated during LU factorization. For large sparse matrices, they will increase the memory usage...
- FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters : Abstract: We propose Federated Preconditioned Mixing (FedPM), a novel Federated Learning (FL) method that leverages second-order optimization. Prior methods--such as LocalNewton, LTDA, and FedSophia--...
- Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment : Abstract: Large language models (LLMs) are increasingly deployed in real-world systems, making it critical to understand their vulnerabilities. While data poisoning attacks during RLHF/DPO alignment h...
- Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks : Abstract: Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, m...
- Trusted Multi-view Learning for Long-tailed Classification : Abstract: Class imbalance has been extensively studied in single-view scenarios; however, addressing this challenge in multi-view contexts remains an open problem, with even scarcer research focusing ...
- Practical Global and Local Bounds in Gaussian Process Regression via Chaining : Abstract: Gaussian process regression (GPR) is a popular nonparametric Bayesian method that provides predictive uncertainty estimates and is widely used in safety-critical applications. While prior re...
- Enabling Agents to Communicate Entirely in Latent Space : Abstract: While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling rich, internal latent states into discrete...
- Unsupervised Feature Selection Through Group Discovery : Abstract: Unsupervised feature selection (FS) is essential for high-dimensional learning tasks where labels are not available. It helps reduce noise, improve generalization, and enhance interpretabili...
- Compact Memory for Continual Logistic Regression : Abstract: Despite recent progress, continual learning still does not match the performance of batch training. To avoid catastrophic forgetting, we need to build compact memory of essential past knowle...
- Data Fusion-Enhanced Decision Transformer for Stable Cross-Domain Generalization : Abstract: Cross-domain shifts present a significant challenge for decision transformer (DT) policies. Existing cross-domain policy adaptation methods typically rely on a single simple filtering criter...
- FSampler: Training Free Acceleration of Diffusion Sampling via Epsilon Extrapolation : Abstract: FSampler is a training free, sampler agnostic execution layer that accelerates diffusion sampling by reducing the number of function evaluations (NFE). FSampler maintains a short history of ...
- Iterated Population Based Training with Task-Agnostic Restarts : Abstract: Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks ...
- Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks" : Abstract: Large language models are increasingly used for many applications. To prevent illicit use, it is desirable to be able to detect AI-generated text. Training and evaluation of such detectors c...
- Stochastic Mean-Shift Clustering : Abstract: We present a stochastic version of the mean-shift clustering algorithm. In this stochastic version a randomly chosen sequence of data points move according to partial gradient ascent steps o...
- CoCo-MILP: Inter-Variable Contrastive and Intra-Constraint Competitive MILP Solution Prediction : Abstract: Mixed-Integer Linear Programming (MILP) is a cornerstone of combinatorial optimization, yet solving large-scale instances remains a significant computational challenge. Recently, Graph Neura...
- Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version) : Abstract: Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperparameters such as the number of clusters or other sensitive settings, limiting thei...
- Controllable protein design through Feynman-Kac steering : Abstract: Diffusion-based models have recently enabled the generation of realistic and diverse protein structures, yet they remain limited in their ability to steer outcomes toward specific functional...
- Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization : Abstract: Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing...
- A Distributed Training Architecture For Combinatorial Optimization : Abstract: In recent years, graph neural networks (GNNs) have been widely applied in tackling combinatorial optimization problems. However, existing methods still suffer from limited accuracy when addr...
- Multi-step Predictive Coding Leads To Simplicity Bias : Abstract: Predictive coding is a framework for understanding the formation of low-dimensional internal representations mirroring the environment's latent structure. The conditions under which such rep...
- GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks : Abstract: Federated learning (FL) enables privacy-preserving collaborative model training but remains vulnerable to adversarial behaviors that compromise model utility or fairness across sensitive gro...
- Efficiently Transforming Neural Networks into Decision Trees: A Path to Ground Truth Explanations with RENTT : Abstract: Although neural networks are a powerful tool, their widespread use is hindered by the opacity of their decisions and their black-box nature, which result in a lack of trustworthiness. To all...
- A Tensor Residual Circuit Neural Network Factorized with Matrix Product Operation : Abstract: It is challenging to reduce the complexity of neural networks while maintaining their generalization ability and robustness, especially for practical applications. Conventional solutions for...
- Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference : Abstract: Large language models (LLMs) have demonstrated remarkable success across diverse artificial intelligence tasks, driven by scaling laws that correlate model size and training data with perfor...
- MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment : Abstract: Restless Multi-Armed Bandits (RMABs) are powerful models for decision-making under uncertainty, yet classical formulations typically assume fixed dynamics, an assumption often violated in no...
- GAMMA_FLOW: Guided Analysis of Multi-label spectra by MAtrix Factorization for Lightweight Operational Workflows : Abstract: GAMMA_FLOW is an open-source Python package for real-time analysis of spectral data. It supports classification, denoising, decomposition, and outlier detection of both single- and multi-com...
- Distribution-Based Feature Attribution for Explaining the Predictions of Any Classifier : Abstract: The proliferation of complex, black-box AI models has intensified the need for techniques that can explain their decisions. Feature attribution methods have become a popular solution for pro...
Research Sources: 398 | Generated: 11/13/2025
