AI RESEARCH PAPERS & ACADEMIC SOURCES
- SuperFlow: Training Flow Matching Models with RL on the Fly : Abstract: Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has t...
- GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts : Abstract: Low-light enhancement has wide applications in autonomous driving, 3D reconstruction, remote sensing, surveillance, and so on, which can significantly improve information utilization. Howeve...
- CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting : Abstract: Recent works in 3D multimodal learning have made remarkable progress. However, typically 3D multimodal models are only capable of handling point clouds. Compared to the emerging 3D represent...
- SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians : Abstract: 3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthes...
- TRASE: Tracking-free 4D Segmentation and Editing : Abstract: Understanding dynamic 3D scenes is crucial for extended reality (XR) and autonomous driving. Incorporating semantic information into 3D reconstruction enables holistic scene representations,...
- TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration : Abstract: This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of re...
- HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models : Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs c...
- SpectralKAN: Weighted Activation Distribution Kolmogorov-Arnold Network for Hyperspectral Image Change Detection : Abstract: Kolmogorov-Arnold networks (KANs) represent data features by learning the activation functions and demonstrate superior accuracy with fewer parameters, FLOPs, GPU memory usage (Memory), shor...
- Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra : Abstract: The rich biodiversity of coral reefs in Indonesian waters represents a valuable asset that must be preserved. Rapid climate change and uncontrolled human activities have caused significant d...
- JoIN: Joint GANs Inversion for Intrinsic Image Decomposition : Abstract: Intrinsic Image Decomposition (IID) is a challenging inverse problem that seeks to decompose a natural image into its underlying intrinsic components such as albedo and shading. While recent...
- SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations : Abstract: Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in ...
- A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data : Abstract: Oral presentation skills are a critical component of higher education, yet comprehensive datasets capturing real-world student performance across multiple modalities remain scarce. To addres...
- Fast Multi-Stack Slice-to-Volume Reconstruction via Multi-Scale Unrolled Optimization : Abstract: Fully convolutional networks have become the backbone of modern medical imaging due to their ability to learn multi-scale representations and perform end-to-end inference. Yet their potentia...
- HERE: Hierarchical Active Exploration of Radiance Field with Epistemic Uncertainty Minimization : Abstract: We present HERE, an active 3D scene reconstruction framework based on neural radiance fields, enabling high-fidelity implicit mapping. Our approach centers around an active learning strategy...
- BlindU: Blind Machine Unlearning without Revealing Erasing Data : Abstract: Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning me...
- SC-MII: Infrastructure LiDAR-based 3D Object Detection on Edge Devices for Split Computing with Multiple Intermediate Outputs Integration : Abstract: 3D object detection using LiDAR-based point cloud data and deep neural networks is essential in autonomous driving technology. However, deploying state-of-the-art models on edge devices pres...
- ObjSplat: Geometry-Aware Gaussian Surfels for Active Object Reconstruction : Abstract: Autonomous high-fidelity object reconstruction is fundamental for creating digital assets and bridging the simulation-to-reality gap in robotics. We present ObjSplat, an active reconstructio...
- AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs : Abstract: We present AutoTour, a system that enhances user exploration by automatically generating fine-grained landmark annotations and descriptive narratives for photos captured by users. The key id...
- USFetal: Tools for Fetal Brain Ultrasound Compounding : Abstract: Ultrasound offers a safe, cost-effective, and widely accessible technology for fetal brain imaging, making it especially suitable for routine clinical use. However, it suffers from view-depe...
- Hard Thresholding Pursuit Algorithms for Least Absolute Deviations Problem : Abstract: Least absolute deviations (LAD) is a statistical optimality criterion widely utilized in scenarios where a minority of measurements are contaminated by outliers of arbitrary magnitudes. In t...
- Precision Meets Art: Autonomous Multi-UAV System for Large Scale Mural Drawing : Abstract: The integration of autonomous unmanned aerial vehicles (UAVs) into large-scale artistic projects has emerged as a new application in robotics. This paper presents the design, deployment, and...
- R$^3$D: Regional-guided Residual Radar Diffusion : Abstract: Millimeter-wave radar enables robust environment perception in autonomous systems under adverse conditions yet suffers from sparse, noisy point clouds with low angular resolution. Existing d...
- VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference : Abstract: Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes, and spatial relations. They are increasingl...
- CulinaryCut-VLAP: A Vision-Language-Action-Physics Framework for Food Cutting via a Force-Aware Material Point Method : Abstract: Food cutting is a highly practical yet underexplored application at the intersection of vision and robotic manipulation. The task remains challenging because interactions between the knife a...
- Semantic Enrichment of CAD-Based Industrial Environments via Scene Graphs for Simulation and Reasoning : Abstract: Utilizing functional elements in an industrial environment, such as displays and interactive valves, provide effective possibilities for robot training. When preparing simulations for robots...
- From Easy to Hard++: Promoting Differentially Private Image Synthesis Through Spatial-Frequency Curriculum : Abstract: To improve the quality of Differentially private (DP) synthetic images, most studies have focused on improving the core optimization techniques (e.g., DP-SGD). Recently, we have witnessed a ...
- Performance Analysis of DCT, Hadamard, and PCA in Block-Based Image Compression : Abstract: Block based image compression relies on transform coding to concentrate signal energy into a small number of coefficients. While classical codecs use fixed transforms such as the Discrete Co...
- Gamma2Patterns: Deep Cognitive Attention Region Identification and Gamma-Alpha Pattern Analysis : Abstract: Deep cognitive attention is characterized by heightened gamma oscillations and coordinated visual behavior. Despite the physiological importance of these mechanisms, computational studies ra...
- Real-Time Image Processing Algorithms for Embedded Systems : Abstract: Embedded vision systems need efficient and robust image processing algorithms to perform real-time, with resource-constrained hardware. This research investigates image processing algorithms...
- Leveraging Membership Inference Attacks for Privacy Measurement in Federated Learning for Remote Sensing Images : Abstract: Federated Learning (FL) enables collaborative model training while keeping training data localized, allowing us to preserve privacy in various domains including remote sensing. However, rece...
- Deep Joint Source-Channel Coding for Wireless Video Transmission with Asymmetric Context : Abstract: In this paper, we propose a high-efficiency deep joint source-channel coding (JSCC) method for video transmission based on conditional coding with asymmetric context. The conditional coding-...
- Using street view images and visual LLMs to predict heritage values for governance support: Risks, ethics, and policy implications : Abstract: During 2025 and 2026, the Energy Performance of Buildings Directive is being implemented in the European Union member states, requiring all member states to have National Building Renovation...
- Investigating Anthropometric Fidelity in SAM 3D Body : Abstract: The recent release of SAM 3D Body \cite{sam3dbody2025} marks a significant milestone in human mesh recovery, demonstrating state-of-the-art performance in producing clean, topologically cohe...
- Tuning-free Visual Effect Transfer across Videos : Abstract: We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at promp...
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head : Abstract: While the Transformer architecture dominates many fields, its quadratic self-attention complexity hinders its use in large-scale applications. Linear attention offers an efficient alternativ...
- More Images, More Problems? A Controlled Analysis of VLM Failure Modes : Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities, yet their proficiency in understanding and reasoning over multiple images remains largely unexplored. While ex...
- Exchange Is All You Need for Remote Sensing Change Detection : Abstract: Remote sensing change detection fundamentally relies on the effective fusion and discrimination of bi-temporal features. Prevailing paradigms typically utilize Siamese encoders bridged by ex...
- Vision-Language Model for Accurate Crater Detection : Abstract: The European Space Agency (ESA), driven by its ambitions on planned lunar missions with the Argonaut lander, has a profound interest in reliable crater detection, since craters pose a risk t...
- Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training : Abstract: Recent works such as REPA have shown that guiding diffusion models with external semantic features (e.g., DINO) can significantly accelerate the training of diffusion transformers (DiTs). Ho...
- Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding : Abstract: Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reasoning and the hallucination ris...
- On the application of the Wasserstein metric to 2D curves classification : Abstract: In this work we analyse a number of variants of the Wasserstein distance which allow to focus the classification on the prescribed parts (fragments) of classified 2D curves. These variants a...
- Evaluating the encoding competence of visual language models using uncommon actions : Abstract: We propose UAIT (Uncommon-sense Action Image-Text) dataset, a new evaluation benchmark designed to test the semantic understanding ability of visual language models (VLMs) in uncommon-sense ...
- FMAC: a Fair Fiducial Marker Accuracy Comparison Software : Abstract: This paper presents a method for carrying fair comparisons of the accuracy of pose estimation using fiducial markers. These comparisons rely on large sets of high-fidelity synthetic images e...
- Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model : Abstract: Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding. Traditional reinforcement learning (RL) approaches, primarily b...
- Leveraging 3D Representation Alignment and RGB Pretrained Priors for LiDAR Scene Generation : Abstract: LiDAR scene synthesis is an emerging solution to scarcity in 3D data for robotic tasks such as autonomous driving. Recent approaches employ diffusion or flow matching models to generate real...
- Advancing Multinational License Plate Recognition Through Synthetic and Real Data Fusion: A Comprehensive Evaluation : Abstract: Automatic License Plate Recognition is a frequent research topic due to its wide-ranging practical applications. While recent studies use synthetic images to improve License Plate Recognitio...
- Variational Contrastive Learning for Skeleton-based Action Recognition : Abstract: In recent years, self-supervised representation learning for skeleton-based action recognition has advanced with the development of contrastive learning methods. However, most of contrastive...
- StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation : Abstract: We present StdGEN++, a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs. Existing 3D generative methods often produce mo...
- GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models : Abstract: Discrete motion tokenization has recently enabled Large Language Models (LLMs) to serve as versatile backbones for motion understanding and motion-language reasoning. However, existing pipel...
- PARL: Position-Aware Relation Learning Network for Document Layout Analysis : Abstract: Document layout analysis aims to detect and categorize structural elements (e.g., titles, tables, figures) in scanned or digital documents. Popular methods often rely on high-quality Optical...
- UIKA: Fast Universal Head Avatar from Pose-Free Images : Abstract: We present UIKA, a feed-forward animatable Gaussian head model from an arbitrary number of unposed inputs, including a single image, multi-view captures, and smartphone-captured videos. Unli...
- Diffusion in SPAD Signals : Abstract: We derive the likelihood of a raw signal in a single photon avalanche diode (SPAD), given a fixed photon flux. The raw signal comprises timing of detection events, which are nonlinearly rela...
- Robust Multicentre Detection and Classification of Colorectal Liver Metastases on CT: Application of Foundation Models : Abstract: Colorectal liver metastases (CRLM) are a major cause of cancer-related mortality, and reliable detection on CT remains challenging in multi-centre settings. We developed a foundation model-b...
- BenchSeg: A Large-Scale Dataset and Benchmark for Multi-View Food Video Segmentation : Abstract: Food image segmentation is a critical task for dietary analysis, enabling accurate estimation of food volume and nutrients. However, current methods suffer from limited multi-view data and p...
- ViewMorpher3D: A 3D-aware Diffusion Framework for Multi-Camera Novel View Synthesis in Autonomous Driving : Abstract: Autonomous driving systems rely heavily on multi-view images to ensure accurate perception and robust decision-making. To effectively develop and evaluate perception stacks and planning algo...
- Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization : Abstract: Immersive telepresence aims to transform human interaction in AR/VR applications by enabling lifelike full-body holographic representations for enhanced remote collaboration. However, existi...
- Anatomy Aware Cascade Network: Bridging Epistemic Uncertainty and Geometric Manifold for 3D Tooth Segmentation : Abstract: Accurate three-dimensional (3D) tooth segmentation from Cone-Beam Computed Tomography (CBCT) is a prerequisite for digital dental workflows. However, achieving high-fidelity segmentation rem...
- FocalOrder: Focal Preference Optimization for Reading Order Detection : Abstract: Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout r...
- From Sketch to Fresco: Efficient Diffusion Transformer with Progressive Resolution : Abstract: Diffusion Transformers achieve impressive generative quality but remain computationally expensive due to iterative sampling. Recently, dynamic resolution sampling has emerged as a promising ...
- PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion : Abstract: Existing image foundation models are not optimized for spherical images having been trained primarily on perspective images. PanoSAMic integrates the pre-trained Segment Anything (SAM) encod...
- SDHSI-Net: Learning Better Representations for Hyperspectral Images via Self-Distillation : Abstract: Hyperspectral image (HSI) classification presents unique challenges due to its high spectral dimensionality and limited labeled data. Traditional deep learning models often suffer from overf...
- Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers : Abstract: Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive. To accelerate ...
- Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation : Abstract: In this paper, we present a new dynamic collaborative network for semi-supervised 3D vessel segmentation, termed DiCo. Conventional mean teacher (MT) methods typically employ a static approa...
- HiVid-Narrator: Hierarchical Video Narrative Generation with Scene-Primed ASR-anchored Compression : Abstract: Generating structured narrations for real-world e-commerce videos requires models to perceive fine-grained visual details and organize them into coherent, high-level stories--capabilities th...
- Seeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without Training : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities across a variety of vision-language tasks. However, their internal reasoning often exhibits a critical inconsis...
- PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis : Abstract: Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world c...
- Reconstruction Guided Few-shot Network For Remote Sensing Image Classification : Abstract: Few-shot remote sensing image classification is challenging due to limited labeled samples and high variability in land-cover types. We propose a reconstruction-guided few-shot network (RGFS...
- OSCAR: Open-Set CAD Retrieval from a Language Prompt and a Single Image : Abstract: 6D object pose estimation plays a crucial role in scene understanding for applications such as robotics and augmented reality. To support the needs of ever-changing object sets in such conte...
- Revisiting the Ordering of Channel and Spatial Attention: A Comprehensive Study on Sequential and Parallel Designs : Abstract: Attention mechanisms have become a core component of deep learning models, with Channel Attention and Spatial Attention being the two most representative architectures. Current research on t...
- Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding : Abstract: While Multimodal Large Language Models (MLLMs) excel at single-image understanding, they exhibit significantly degraded performance in multi-image reasoning scenarios. Multi-image reasoning ...
- Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples : Abstract: While inference-time scaling has significantly enhanced generative quality in large language and diffusion models, its application to vector-quantized (VQ) visual autoregressive modeling (VA...
- A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model : Abstract: Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks intr...
- VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding : Abstract: This paper presents VideoLoom, a unified Video Large Language Model (Video LLM) for joint spatial-temporal understanding. To facilitate the development of fine-grained spatial and temporal l...
- Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models : Abstract: The task of Image-to-Video (I2V) generation aims to synthesize a video from a reference image and a text prompt. This requires diffusion models to reconcile high-frequency visual constraints...
- GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection : Abstract: This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by lever...
- PALUM: Part-based Attention Learning for Unified Motion Retargeting : Abstract: Retargeting motion between characters with different skeleton structures is a fundamental challenge in computer animation. When source and target characters have vastly different bone arrang...
- From Landslide Conditioning Factors to Satellite Embeddings: Evaluating the Utilisation of Google AlphaEarth for Landslide Susceptibility Mapping using Deep Learning : Abstract: Data-driven landslide susceptibility mapping (LSM) typically relies on landslide conditioning factors (LCFs), whose availability, heterogeneity, and preprocessing-related uncertainties can c...
- Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion : Abstract: Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversarial noise...
- Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance : Abstract: Multi-domain image-to-image translation re quires grounding semantic differences ex pressed in natural language prompts into corresponding visual transformations, while preserving unrelated ...
- VENUS: Visual Editing with Noise Inversion Using Scene Graphs : Abstract: State-of-the-art text-based image editing models often struggle to balance background preservation with semantic consistency, frequently resulting either in the synthesis of entirely new ima...
- SceneNAT: Masked Generative Modeling for Language-Guided Indoor Scene Synthesis : Abstract: We present SceneNAT, a single-stage masked non-autoregressive Transformer that synthesizes complete 3D indoor scenes from natural language instructions through only a few parallel decoding p...
- SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model : Abstract: Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical reali...
- ShowUI-Aloha: Human-Taught GUI Agent : Abstract: Graphical User Interfaces (GUIs) are central to human-computer interaction, yet automating complex GUI tasks remains a major challenge for autonomous agents, largely due to a lack of scalabl...
- DIVER: Dynamic Iterative Visual Evidence Reasoning for Multimodal Fake News Detection : Abstract: Multimodal fake news detection is crucial for mitigating adversarial misinformation. Existing methods, relying on static fusion or LLMs, face computational redundancy and hallucination risks...
- Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification : Abstract: Reliable learning on low-quality multimodal data is a widely concerning issue, especially in safety-critical applications. However, multimodal noise poses a major challenge in this domain an...
- Motion Focus Recognition in Fast-Moving Egocentric Video : Abstract: From Vision-Language-Action (VLA) systems to robotics, existing egocentric datasets primarily focus on action recognition tasks, while largely overlooking the inherent role of motion analysi...
- Few-shot Class-Incremental Learning via Generative Co-Memory Regularization : Abstract: Few-shot class-incremental learning (FSCIL) aims to incrementally learn models from a small amount of novel data, which requires strong representation and adaptation ability of models learne...
- MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning : Abstract: Vision language models (VLMs) achieve strong performance on general image understanding but struggle to think with medical images, especially when performing multi-step reasoning through ite...
- 3D Wavelet-Based Structural Priors for Controlled Diffusion in Whole-Body Low-Dose PET Denoising : Abstract: Low-dose Positron Emission Tomography (PET) imaging reduces patient radiation exposure but suffers from increased noise that degrades image quality and diagnostic reliability. Although diffu...
- Efficient Visual Question Answering Pipeline for Autonomous Driving via Scene Region Compression : Abstract: Autonomous driving increasingly relies on Visual Question Answering (VQA) to enable vehicles to understand complex surroundings by analyzing visual inputs and textual queries. Currently, a p...
- Billboard in Focus: Estimating Driver Gaze Duration from a Single Image : Abstract: Roadside billboards represent a central element of outdoor advertising, yet their presence may contribute to driver distraction and accident risk. This study introduces a fully automated pip...
- Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale Features : Abstract: Medical hyperspectral imaging (HSI) enables accurate disease diagnosis by capturing rich spectral-spatial tissue information, but recent advances in deep learning have exposed its vulnerabil...
- Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI : Abstract: Accurate molecular subtype classification is essential for personalized breast cancer treatment, yet conventional immunohistochemical analysis relies on invasive biopsies and is prone to sam...
- Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification? : Abstract: Multi-modal large language models (MLLMs) exhibit strong general-purpose capabilities, yet still struggle on Fine-Grained Visual Classification (FGVC), a core perception task that requires s...
- Unified Personalized Understanding, Generating and Editing : Abstract: Unified large multimodal models (LMMs) have achieved remarkable progress in general-purpose multimodal understanding and generation. However, they still operate under a ``one-size-fits-all''...
- SketchJudge: A Diagnostic Benchmark for Grading Hand-drawn Diagrams with Multimodal Large Language Models : Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in visual understanding, they often struggle when faced with the unstructured and ambiguous nature of human-g...
- Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning : Abstract: In real-world video question answering scenarios, videos often provide only localized visual cues, while verifiable answers are distributed across the open web; models therefore need to join...
- RenderFlow: Single-Step Neural Rendering via Flow Matching : Abstract: Conventional physically based rendering (PBR) pipelines generate photorealistic images through computationally intensive light transport simulations. Although recent deep learning approaches...
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing : Abstract: Image dehazing has witnessed significant advancements with the development of deep learning models. However, a few methods predominantly focus on single-modal RGB features, neglecting the in...
- CLIMP: Contrastive Language-Image Mamba Pretraining : Abstract: Contrastive Language-Image Pre-training (CLIP) relies on Vision Transformers whose attention mechanism is susceptible to spurious correlations, and scales quadratically with resolution. To a...
- MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation : Abstract: We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finet...
- Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation : Abstract: Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation
- MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation : Abstract: Most existing 3D referring expression segmentation (3DRES) methods rely on dense, high-quality point clouds, while real-world agents such as robots and mobile phones operate with only a few ...
- MedGround: Bridging the Evidence Gap in Medical Vision-Language Models with Verified Grounding Data : Abstract: Vision-Language Models (VLMs) can generate convincing clinical narratives, yet frequently struggle to visually ground their statements. We posit this limitation arises from the scarcity of h...
- PRISM: Color-Stratified Point Cloud Sampling : Abstract: We present PRISM, a novel color-guided stratified sampling method for RGB-LiDAR point clouds. Our approach is motivated by the observation that unique scene features often exhibit chromatic ...
- OSCAR: Optical-aware Semantic Control for Aleatoric Refinement in Sar-to-Optical Translation : Abstract: Synthetic Aperture Radar (SAR) provides robust all-weather imaging capabilities; however, translating SAR observations into photo-realistic optical images remains a fundamentally ill-posed p...
- Enhancing Low-resolution Image Representation Through Normalizing Flows : Abstract: Low-resolution image representation is a special form of sparse representation that retains only low-frequency information while discarding high-frequency components. This property reduces s...
- SARA: Scene-Aware Reconstruction Accelerator : Abstract: We present SARA (Scene-Aware Reconstruction Accelerator), a geometry-driven pair selection module for Structure-from-Motion (SfM). Unlike conventional pipelines that select pairs based on vi...
- SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation : Abstract: Although learning-based vision-and-language navigation (VLN) agents can learn spatial knowledge implicitly from large-scale training data, zero-shot VLN agents lack this process, relying pri...
- The Normalized Difference Layer: A Differentiable Spectral Index Formulation for Deep Learning : Abstract: Normalized difference indices have been a staple in remote sensing for decades. They stay reliable under lighting changes produce bounded values and connect well to biophysical signals. Even...
- When Humans Judge Irises: Pupil Size Normalization as an Aid and Synthetic Irises as a Challenge : Abstract: Iris recognition is a mature biometric technology offering remarkable precision and speed, and allowing for large-scale deployments to populations exceeding a billion enrolled users (e.g., A...
- Quantification and Classification of Carbon Nanotubes in Electron Micrographs using Vision Foundation Models : Abstract: Accurate characterization of carbon nanotube morphologies in electron microscopy images is vital for exposure assessment and toxicological studies, yet current workflows rely on slow, subjec...
- eSkiTB: A Synthetic Event-based Dataset for Tracking Skiers : Abstract: Tracking skiers in RGB broadcast footage is challenging due to motion blur, static overlays, and clutter that obscure the fast-moving athlete. Event cameras, with their asynchronous contrast...
- Boosting Overlapping Organoid Instance Segmentation Using Pseudo-Label Unmixing and Synthesis-Assisted Learning : Abstract: Organoids, sophisticated in vitro models of human tissues, are crucial for medical research due to their ability to simulate organ functions and assess drug responses accurately. Accurate or...
- Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration : Abstract: Text-guided image generation has advanced rapidly with large-scale diffusion models, yet achieving precise stylization with visual exemplars remains difficult. Existing approaches often depe...
- APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation : Abstract: Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimizat...
- QCaption: Video Captioning and Q&A through Fusion of Large Multimodal Models : Abstract: This paper introduces QCaption, a novel video captioning and Q&A pipeline that enhances video analytics by fusing three models: key frame extraction, a Large Multimodal Model (LMM) for image...
- ArrowGEV: Grounding Events in Video via Learning the Arrow of Time : Abstract: Grounding events in videos serves as a fundamental capability in video analysis. While Vision-Language Models (VLMs) are increasingly employed for this task, existing approaches predominantl...
- LLMTrack: Semantic Multi-Object Tracking with Multi-modal Large Language Models : Abstract: Traditional Multi-Object Tracking (MOT) systems have achieved remarkable precision in localization and association, effectively answering \textit{where} and \textit{who}. However, they often...
- Towards Egocentric 3D Hand Pose Estimation in Unseen Domains : Abstract: We present V-HPOT, a novel approach for improving the cross-domain performance of 3D hand pose estimation from egocentric images across diverse, unseen domains. State-of-the-art methods demo...
- Toward Generalizable Deblurring: Leveraging Massive Blur Priors with Linear Attention for Real-World Scenarios : Abstract: Image deblurring has advanced rapidly with deep learning, yet most methods exhibit poor generalization beyond their training datasets, with performance dropping significantly in real-world s...
- Bridging Robustness and Efficiency: Real-Time Low-Light Enhancement via Attention U-Net GAN : Abstract: Recent advancements in Low-Light Image Enhancement (LLIE) have focused heavily on Diffusion Probabilistic Models, which achieve high perceptual quality but suffer from significant computatio...
- 3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence : Abstract: Spatial intelligence refers to the ability to perceive, reason about, and describe objects and their relationships within three-dimensional environments, forming a foundation for embodied pe...
- Learning Domain Agnostic Latent Embeddings of 3D Faces for Zero-shot Animal Expression Transfer : Abstract: We present a zero-shot framework for transferring human facial expressions to 3D animal face meshes. Our method combines intrinsic geometric descriptors (HKS/WKS) with a mesh-agnostic latent...
- SRFlow: A Dataset and Regularization Model for High-Resolution Facial Optical Flow via Splatting Rasterization : Abstract: Facial optical flow supports a wide range of tasks in facial motion analysis. However, the lack of high-resolution facial optical flow datasets has hindered progress in this area. In this pa...
- VVTRec: Radio Interferometric Reconstruction through Visual and Textual Modality Enrichment : Abstract: Radio astronomy is an indispensable discipline for observing distant celestial objects. Measurements of wave signals from radio telescopes, called visibility, need to be transformed into ima...
- SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning : Abstract: In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning , whereas semantic occupancy provides fine-grained details. Despite significant progress in individual fiel...
- On the Adversarial Robustness of 3D Large Vision-Language Models : Abstract: 3D Vision-Language Models (VLMs), such as PointLLM and GPT4Point, have shown strong reasoning and generalization abilities in 3D understanding tasks. However, their adversarial robustness re...
- How to Build Robust, Scalable Models for GSV-Based Indicators in Neighborhood Research : Abstract: A substantial body of health research demonstrates a strong link between neighborhood environments and health outcomes. Recently, there has been increasing interest in leveraging advances in...
- WHU-PCPR: A cross-platform heterogeneous point cloud dataset for place recognition in complex urban scenes : Abstract: Point Cloud-based Place Recognition (PCPR) demonstrates considerable potential in applications such as autonomous driving, robot localization and navigation, and map update. In practical app...
- GlobalPaint: Spatiotemporal Coherent Video Outpainting with Global Feature Guidance : Abstract: Video outpainting extends a video beyond its original boundaries by synthesizing missing border content. Compared with image outpainting, it requires not only per-frame spatial plausibility ...
- Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification : Abstract: Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require ...
- Object-WIPER : Training-Free Object and Associated Effect Removal in Videos : Abstract: In this paper, we introduce Object-WIPER, a training-free framework for removing dynamic objects and their associated visual effects from videos, and inpainting them with semantically consis...
- VideoWeave: A Data-Centric Approach for Efficient Video Understanding : Abstract: Training video-language models is often prohibitively expensive due to the high cost of processing long frame sequences and the limited availability of annotated long videos. We present Vide...
- Perception Test 2025: Challenge Summary and a Unified VQA Extension : Abstract: The Third Perception Test challenge was organised as a full-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2025. Its primary goal is to benchmark stat...
- NAS-GS: Noise-Aware Sonar Gaussian Splatting : Abstract: Underwater sonar imaging plays a crucial role in various applications, including autonomous navigation in murky water, marine archaeology, and environmental monitoring. However, the unique c...
- EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox : Abstract: We introduce EyeTheia, a lightweight and open deep learning pipeline for webcam-based gaze estimation, designed for browser-based experimental platforms and real-world cognitive and clinical...
- A survey of facial recognition techniques : Abstract: As multimedia content is quickly growing, the field of facial recognition has become one of the major research fields, particularly in the recent years. The most problematic area to research...
- Synthetic FMCW Radar Range Azimuth Maps Augmentation with Generative Diffusion Model : Abstract: The scarcity and low diversity of well-annotated automotive radar datasets often limit the performance of deep-learning-based environmental perception. To overcome these challenges, we propo...
- Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization : Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse tasks, their practical deployment is severely hindered by hallucination issues, which become pa...
- SAPL: Semantic-Agnostic Prompt Learning in CLIP for Weakly Supervised Image Manipulation Localization : Abstract: Malicious image manipulation threatens public safety and requires efficient localization methods. Existing approaches depend on costly pixel-level annotations which make training expensive. ...
- Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition : Abstract: We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pi...
- Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur : Abstract: We present Akasha 2, a state-of-the-art multimodal architecture that integrates Hamiltonian State Space Duality (H-SSD) with Visual-Language Joint Embedding Predictive Architecture (VL-JEPA)...
- When Imbalance Comes Twice: Active Learning under Simulated Class Imbalance and Label Shift in Binary Semantic Segmentation : Abstract: The aim of Active Learning is to select the most informative samples from an unlabelled set of data. This is useful in cases where the amount of data is large and labelling is expensive, suc...
- Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification : Abstract: Intelligent anomaly detection in dynamic visual environments requires reconciling real-time performance with semantic interpretability. Conventional approaches address only fragments of this...
- QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit : Abstract: Content-Preserving Style transfer, given content and style references, remains challenging for Diffusion Transformers (DiTs) due to its internal entangled content and style features. In this...
- How Does India Cook Biryani? : Abstract: Biryani, one of India's most celebrated dishes, exhibits remarkable regional diversity in its preparation, ingredients, and presentation. With the growing availability of online cooking vide...
- A Unified Attention U-Net Framework for Cross-Modality Tumor Segmentation in MRI and CT : Abstract: This study presents a unified Attention U-Net architecture trained jointly on MRI (BraTS 2021) and CT (LIDC-IDRI) datasets to investigate the generalizability of a single model across divers...
- TIR-Flow: Active Video Search and Reasoning with Frozen VLMs : Abstract: While Large Video-Language Models (Video-LLMs) have achieved remarkable progress in perception, their reasoning capabilities remain a bottleneck. Existing solutions typically resort to a hea...
- Think Bright, Diffuse Nice: Enhancing T2I-ICL via Inductive-Bias Hint Instruction and Query Contrastive Decoding : Abstract: Text-to-Image In-Context Learning (T2I-ICL) enables customized image synthesis via interleaved text-image examples but faces two mutually reinforcing bottlenecks, compliance failure and prio...
- Analyzing the Structure of Handwritten Digits: A Comparative Study of PCA, Factor Analysis, and UMAP : Abstract: Handwritten digit images lie in a high-dimensional pixel space but exhibit strong geometric and statistical structure. This paper investigates the latent organization of handwritten digits i...
- B-FIRE: Binning-Free Diffusion Implicit Neural Representation for Hyper-Accelerated Motion-Resolved MRI : Abstract: Accelerated dynamic volumetric magnetic resonance imaging (4DMRI) is essential for applications relying on motion resolution. Existing 4DMRI produces acceptable artifacts of averaged breathi...
- What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models : Abstract: Current vision-language benchmarks predominantly feature well-structured questions with clear, explicit prompts. However, real user queries are often informal and underspecified. Users natur...
- Low-Back Pain Physical Rehabilitation by Movement Analysis in Clinical Trial : Abstract: To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we propose a medical dataset of clinical patients carrying out low back-pain rehabilitat...
- Semantic Event Graphs for Long-Form Video Question Answering : Abstract: Long-form video question answering remains challenging for modern vision-language models, which struggle to reason over hour-scale footage without exceeding practical token and compute budge...
- OptFormer: Optical Flow-Guided Attention and Phase Space Reconstruction for SST Forecasting : Abstract: Sea Surface Temperature (SST) prediction plays a vital role in climate modeling and disaster forecasting. However, it remains challenging due to its nonlinear spatiotemporal dynamics and ext...
- HyperTopo-Adapters: Geometry- and Topology-Aware Segmentation of Leaf Lesions on Frozen Encoders : Abstract: Leaf-lesion segmentation is topology-sensitive: small merges, splits, or false holes can be biologically meaningful descriptors of biochemical pathways, yet they are weakly penalized by stan...
- Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition : Abstract: Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-...
- Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems : Abstract: In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators' viewpoints to establish a single ground truth. However...
- Correcting misinformation on social media with a large language model : Abstract: Real-world information, often multimodal, can be misinformed or potentially misleading due to factual errors, outdated claims, missing context, misinterpretation, and more. Such "misinformat...
- Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion : Abstract: With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performanc...
- Efficient Continual Pre-training for Building Domain Specific Large Language Models : Abstract: Large language models (LLMs) have demonstrated remarkable open-domain capabilities. LLMs tailored for a domain are typically trained entirely on domain corpus to excel at handling domain-spe...
- OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent : Abstract: While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in nove...
- Reasoning Models Will Blatantly Lie About Their Reasoning : Abstract: It has been shown that Large Reasoning Models (LRMs) may not *say what they think*: they do not always volunteer information about how certain parts of the input influence their reasoning. B...
- Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning : Abstract: The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, p...
- On Narrative: The Rhetorical Mechanisms of Online Polarisation : Abstract: Polarisation research has demonstrated how people cluster in homogeneous groups with opposing opinions. However, this effect emerges not only through interaction between people, limiting com...
- LRAS: Advanced Legal Reasoning with Agentic Search : Abstract: While Large Reasoning Models (LRMs) have demonstrated exceptional logical capabilities in mathematical domains, their application to the legal field remains hindered by the strict requiremen...
- Lost in the Noise: How Reasoning Models Fail with Contextual Distractors : Abstract: Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information. However, this shift introduces input contexts that are inherentl...
- Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling : Abstract: While Large Language Models (LLMs) can generate fluent text, producing high-quality creative stories remains challenging. Reinforcement Learning (RL) offers a promising solution but faces tw...
- ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System : Abstract: Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, thi...
- FinCARDS: Card-Based Analyst Reranking for Financial Document Question Answering : Abstract: Financial question answering (QA) over long corporate filings requires evidence to satisfy strict constraints on entities, financial metrics, fiscal periods, and numeric values. However, exi...
- Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos : Abstract: Vision-Language Models (VLMs) are increasingly deployed in socially consequential settings, raising concerns about social bias driven by demographic cues. A central challenge in measuring su...
- TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding : Abstract: We present TagSpeech, a unified LLM-based framework that utilizes Temporal Anchor Grounding for joint multi-speaker ASR and diarization. The framework is built on two key designs: (1) decoup...
- An Ubuntu-Guided Large Language Model Framework for Cognitive Behavioral Mental Health Dialogue : Abstract: South Africa's escalating mental health crisis, compounded by limited access to culturally responsive care, calls for innovative and contextually grounded interventions. While large language...
- Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have achieved strong performance across many tasks, yet most systems remain limited to offline inference, requiring complete inputs before generating...
- Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models : Abstract: Medical Multimodal Large Language Models (Med-MLLMs) require egocentric clinical intent understanding for real-world deployment, yet existing benchmarks fail to evaluate this critical capabi...
- L-RAG: Balancing Context and Retrieval with Entropy-Based Lazy Loading : Abstract: Retrieval-Augmented Generation (RAG) has emerged as the predominant paradigm for grounding Large Language Model outputs in factual knowledge, effectively mitigating hallucinations. However, ...
- BabyVision: Visual Reasoning Beyond Language : Abstract: While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual un...
- Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs : Abstract: Vision-Language Models (VLMs) are increasingly used in safety-critical applications that require reliable visual grounding. However, these models often hallucinate details that are not prese...
- BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment : Abstract: Large language models have undergone rapid evolution, emerging as a pivotal technology for intelligence in financial operations. However, existing benchmarks are often constrained by pitfall...
- Classroom AI: Large Language Models as Grade-Specific Teachers : Abstract: Large Language Models (LLMs) offer a promising solution to complement traditional teaching and address global teacher shortages that affect hundreds of millions of children, but they fail to...
- Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias : Abstract: As large language models (LLMs) are increasingly integrated into social decision-making, understanding their political positioning and alignment behavior is critical for safety and fairness....
- Attention Mechanism and Heuristic Approach: Context-Aware File Ranking Using Multi-Head Self-Attention : Abstract: The identification and ranking of impacted files within software reposi-tories is a key challenge in change impact analysis. Existing deterministic approaches that combine heuristic signals,...
- An evaluation of LLMs for political bias in Western media: Israel-Hamas and Ukraine-Russia wars : Abstract: Political bias in media plays a critical role in shaping public opinion, voter behaviour, and broader democratic discourse. Subjective opinions and political bias can be found in media sourc...
- Structure-Aware Diversity Pursuit as an AI Safety Strategy against Homogenization : Abstract: Generative AI models reproduce the biases in the training data and can further amplify them through mode collapse. We refer to the resulting harmful loss of diversity as homogenization. Our ...
- From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models : Abstract: Aligning large language models (LLMs) with human preferences has become essential for safe and beneficial AI deployment. While Reinforcement Learning from Human Feedback (RLHF) established t...
- Comment on arXiv:2511.21731v1: Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition : Abstract: This note is a friendly technical check of arXiv:2511.21731v1. I highlight a few places where the manuscript's interpretation of (i) the reported CHSH/Bell-type calculations and (ii) Bose--E...
- La norme technique comme catalyseur de transfert de connaissances : la francophonie a l'{\oe}uvre dans le domaine de l'{\'e}ducation : Abstract: Standards are adopted in a wide range of fields, both technical and industrial, as well as socio-economic, cultural and linguistic. They are presented explicitly as laws and regulations, tec...
- Why Slop Matters : Abstract: AI-generated "slop" is often seen as digital pollution. We argue that this dismissal of the topic risks missing important aspects of AI Slop that deserve rigorous study. AI Slop serves a soc...
- "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs : Abstract: The prevailing technical literature in AI Safety interprets scheming and sandbagging behaviors in large language models (LLMs) as indicators of deceptive agency or hidden objectives. This tr...
- Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests : Abstract: In human conversation, both interlocutors play an active role in maintaining mutual understanding. When addressees are uncertain about what speakers mean, for example, they can request clari...
- Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues : Abstract: Large language models (LLMs) are increasingly used as conversational partners for learning, yet the interactional dynamics supporting users' learning and engagement are understudied. We anal...
- Kinship Data Benchmark for Multi-hop Reasoning : Abstract: Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introdu...
- Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning : Abstract: LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem...
- Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection : Abstract: While Chain-of-Thought (CoT) prompting advances LLM reasoning, challenges persist in consistency, accuracy, and self-correction, especially for complex or ethically sensitive tasks. Existing...
- Contrastive Learning with Narrative Twins for Modeling Story Salience : Abstract: Understanding narratives requires identifying which events are most salient for a story's progression. We present a contrastive learning framework for modeling narrative salience that learns...
- Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents : Abstract: Numerical reasoning is an important task in the analysis of financial documents. It helps in understanding and performing numerical predictions with logical conclusions for the given query s...
- Is Agentic RAG worth it? An experimental comparison of RAG approaches : Abstract: Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer use...
- Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator : Abstract: As emotional support chatbots have recently gained significant traction across both research and industry, a common evaluation strategy has emerged: use help-seeker simulators to interact wi...
- Exploring the Meta-level Reasoning of Large Language Models via a Tool-based Multi-hop Tabular Question Answering Task : Abstract: Recent advancements in Large Language Models (LLMs) are increasingly focused on "reasoning" ability, a concept with many overlapping definitions in the LLM discourse. We take a more structur...
- Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends : Abstract: Despite advances in Natural Language Generation (NLG), evaluation remains challenging. Although various new metrics and LLM-as-a-judge (LaaJ) methods are proposed, human judgment persists as...
- PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs : Abstract: Multimodal Large Language Models (MLLMs) rely on strong linguistic reasoning inherited from their base language models. However, multimodal instruction fine-tuning paradoxically degrades thi...
- Integrating Machine-Generated Short Descriptions into the Wikipedia Android App: A Pilot Deployment of Descartes : Abstract: Short descriptions are a key part of the Wikipedia user experience, but their coverage remains uneven across languages and topics. In previous work, we introduced Descartes, a multilingual m...
- Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments : Abstract: Large language models are increasingly being used to assess and forecast research ideas, yet we lack scalable ways to evaluate the quality of models' judgments about these scientific ideas. ...
- ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents : Abstract: Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval ca...
- A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models : Abstract: Multimodal emotion understanding requires effective integration of text, audio, and visual modalities for both discrete emotion recognition and continuous sentiment analysis. We present EGMF...
- From RAG to Agentic RAG for Faithful Islamic Question Answering : Abstract: LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real...
- Thinking Before Constraining: A Unified Decoding Framework for Large Language Models : Abstract: Natural generation allows Language Models (LMs) to produce free-form responses with rich reasoning, but the lack of guaranteed structure makes outputs difficult to parse or verify. Structure...
- High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning : Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) us...
- Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation : Abstract: While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability...
- KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning : Abstract: Despite the impressive performance of large language models (LLMs) pretrained on vast knowledge corpora, advancing their knowledge manipulation-the ability to effectively recall, reason, and...
- SAD: A Large-Scale Strategic Argumentative Dialogue Dataset : Abstract: Argumentation generation has attracted substantial research interest due to its central role in human reasoning and decision-making. However, most existing argumentative corpora focus on non...
- Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations : Abstract: Despite their impressive capabilities, large language models (LLMs) frequently generate hallucinations. Previous work shows that their internal states encode rich signals of truthfulness, ye...
- GROKE: Vision-Free Navigation Instruction Evaluation via Graph Reasoning on OpenStreetMap : Abstract: The evaluation of navigation instructions remains a persistent challenge in Vision-and-Language Navigation (VLN) research. Traditional reference-based metrics such as BLEU and ROUGE fail to ...
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models : Abstract: While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval throug...
- Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing : Abstract: We consider the problem of distinguishing human-written creative fiction (excerpts from novels) from similar text generated by an LLM. Our results show that, while human observers perform po...
- Semantic Compression of LLM Instructions via Symbolic Metalanguages : Abstract: We introduce MetaGlyph, a symbolic language for compressing prompts by encoding instructions as mathematical symbols rather than prose. Unlike systems requiring explicit decoding rules, Meta...
- TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees : Abstract: Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality. Recent advances in speculative decoding have shifted from sequent...
- Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary maskin...
- Reward Modeling from Natural Language Human Feedback : Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks...
- Controlled Self-Evolution for Algorithmic Code Optimization : Abstract: Self-evolution methods enhance code generation through iterative "generate-verify-refine" cycles, yet existing approaches suffer from low exploration efficiency, failing to discover solution...
- DiffER: Diffusion Entity-Relation Modeling for Reversal Curse in Diffusion Large Language Models : Abstract: The "reversal curse" refers to the phenomenon where large language models (LLMs) exhibit predominantly unidirectional behavior when processing logically bidirectional relationships. Prior wo...
- Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation : Abstract: Large Language Models (LLMs) have significantly advanced Machine Translation (MT), applying them to linguistically complex domains-such as Social Network Services, literature etc. In these s...
- BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has become a pivotal paradigm for Large Language Models (LLMs), yet current approaches struggle with visually rich documents by treating text and images ...
- How to predict creativity ratings from written narratives: A comparison of co-occurrence and textual forma mentis networks : Abstract: This tutorial paper provides a step-by-step workflow for building and analysing semantic networks from short creative texts. We introduce and compare two widely used text-to-network approach...
- Mitrasamgraha: A Comprehensive Classical Sanskrit Machine Translation Dataset : Abstract: While machine translation is regarded as a "solved problem" for many high-resource languages, close analysis quickly reveals that this is not the case for content that shows challenges such ...
- PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling : Abstract: LLM-based client simulation has emerged as a promising tool for training novice counselors and evaluating automated counseling systems. However, existing client simulation approaches face th...
- ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios : Abstract: Recent advancements in Large Language Models (LLMs) have significantly catalyzed table-based question answering (TableQA). However, existing TableQA benchmarks often overlook the intricacies...
- Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects : Abstract: Despite having hundreds of millions of speakers, Chinese dialects lag behind Mandarin in speech and language technologies. Most varieties are primarily spoken, making dialect-to-Mandarin spe...
- Document-Level Zero-Shot Relation Extraction with Entity Side Information : Abstract: Document-Level Zero-Shot Relation Extraction (DocZSRE) aims to predict unseen relation labels in text documents without prior training on specific relations. Existing approaches rely on Larg...
- The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents : Abstract: Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar...
- ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models : Abstract: In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query. However, these approaches are inherently vuln...
- The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices? : Abstract: Multilingual language models (LMs) promise broader NLP access, yet current systems deliver uneven performance across the world's languages. This survey examines why these gaps persist and wh...
- MI-PRUN: Optimize Large Language Model Pruning via Mutual Information : Abstract: Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by r...
- Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) mitigates hallucinations in Large Language Models (LLMs) by grounding them in structured knowledge. However, current GraphRAG methods ar...
- Structured Reasoning for Large Language Models : Abstract: Large language models (LLMs) achieve strong performance by generating long chains of thought, but longer traces always introduce redundant or ineffective reasoning steps. One typical behavio...
- Can Large Language Models Understand, Reason About, and Generate Code-Switched Text? : Abstract: Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In th...
- Measuring Iterative Temporal Reasoning with TimePuzzles : Abstract: We introduce TimePuzzles, a constraint-based date inference task for evaluating iterative temporal reasoning. Each puzzle combines factual temporal anchors with (cross-cultural) calendar rel...
- ReMIND: Orchestrating Modular Large Language Models for Controllable Serendipity A REM-Inspired System Design for Emergent Creative Ideation : Abstract: Large language models (LLMs) are used not only for problem solving but also for creative ideation; however, eliciting serendipitous insights that are both novel and internally coherent remai...
- The Need for a Socially-Grounded Persona Framework for User Simulation : Abstract: Synthetic personas are widely used to condition large language models (LLMs) for social simulation, yet most personas are still constructed from coarse sociodemographic attributes or summari...
- Engineering of Hallucination in Generative AI: It's not a Bug, it's a Feature : Abstract: Generative artificial intelligence (AI) is conquering our lives at lightning speed. Large language models such as ChatGPT answer our questions or write texts for us, large computer vision mo...
- When Abundance Conceals Weakness: Knowledge Conflict in Multilingual Models : Abstract: Large Language Models (LLMs) encode vast world knowledge across multiple languages, yet their internal beliefs are often unevenly distributed across linguistic spaces. When external evidence...
- Task Arithmetic with Support Languages for Low-Resource ASR : Abstract: The development of resource-constrained approaches to automatic speech recognition (ASR) is of great interest due to its broad applicability to many low-resource languages for which there is...
- Codified Foreshadowing-Payoff Text Generation : Abstract: Foreshadowing and payoff are ubiquitous narrative devices through which authors introduce commitments early in a story and resolve them through concrete, observable outcomes. However, despit...
- Solar Open Technical Report : Abstract: We introduce Solar Open, a 102B-parameter bilingual Mixture-of-Experts language model for underserved languages. Solar Open demonstrates a systematic methodology for building competitive LLM...
- TurkBench: A Benchmark for Evaluating Turkish Large Language Models : Abstract: With the recent surge in the development of large language models, the need for comprehensive and language-specific evaluation benchmarks has become critical. While significant progress has ...
- Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization : Abstract: Recent years have seen growing interest in applying neural networks and contextualized word embeddings to the parsing of historical languages. However, most advances have focused on dependen...
- MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education : Abstract: The learning process for medical residents presents significant challenges, demanding both the ability to interpret complex case reports and the rapid acquisition of accurate medical knowled...
- UETQuintet at BioCreative IX - MedHopQA: Enhancing Biomedical QA with Selective Multi-hop Reasoning and Contextual Retrieval : Abstract: Biomedical Question Answering systems play a critical role in processing complex medical queries, yet they often struggle with the intricate nature of medical data and the demand for multi-h...
- LLMs Can't Play Hangman: On the Necessity of a Private Working Memory for Language Agents : Abstract: As LLMs move from text completion toward autonomous agents, they remain constrained by the standard chat interface, which lacks private working memory. This raises a fundamental question: ca...
- Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition : Abstract: In speech language modeling, two architectures dominate the frontier: the Transformer and the Conformer. However, it remains unknown whether their comparable performance stems from convergen...
- RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction : Abstract: As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchm...
- Symphonym: Universal Phonetic Embeddings for Cross-Script Toponym Matching via Teacher-Student Distillation : Abstract: Linking place names across languages and writing systems is a fundamental challenge in digital humanities and geographic information retrieval. Existing approaches rely on language-specific ...
- TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG : Abstract: Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval, and has recently been advanced by reinforc...
- Fine-grained Verbal Attack Detection via a Hierarchical Divide-and-Conquer Framework : Abstract: In the digital era, effective identification and analysis of verbal attacks are essential for maintaining online civility and ensuring social security. However, existing research is limited ...
- BiasLab: A Multilingual, Dual-Framing Framework for Robust Measurement of Output-Level Bias in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes contexts where their outputs influence real-world decisions. However, evaluating bias in LLM outputs remains methodologi...
- Explainable Multimodal Aspect-Based Sentiment Analysis with Dependency-guided Large Language Model : Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to identify aspect-level sentiments by jointly modeling textual and visual information, which is essential for fine-grained opinion un...
- PDR: A Plug-and-Play Positional Decay Framework for LLM Pre-training Data Detection : Abstract: Detecting pre-training data in Large Language Models (LLMs) is crucial for auditing data privacy and copyright compliance, yet it remains challenging in black-box, zero-shot settings where c...
- AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents : Abstract: As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unli...
- Forest Before Trees: Latent Superposition for Efficient Visual Reasoning : Abstract: While Chain-of-Thought empowers Large Vision-Language Models with multi-step reasoning, explicit textual rationales suffer from an information bandwidth bottleneck, where continuous visual d...
- Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition : Abstract: Although many Automatic Speech Recognition (ASR) systems have been developed for Modern Standard Arabic (MSA) and Dialectal Arabic (DA), few studies have focused on dialect-specific implemen...
- CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering : Abstract: Triple-based Iterative Retrieval-Augmented Generation (iRAG) mitigates document-level noise for multi-hop question answering. However, existing methods still face limitations: (i) greedy sin...
- Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning : Abstract: Large Language Models (LLMs) are known to contain significant redundancy, yet a systematic explanation for why certain components, particularly in higher layers, are more redundant has remai...
- EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs : Abstract: Improving the reasoning abilities of large language models (LLMs) has largely relied on iterative self-training with model-generated data. While effective at boosting accuracy, existing appr...
- Multi-Stage Evolutionary Model Merging with Meta Data Driven Curriculum Learning for Sentiment-Specialized Large Language Modeling : Abstract: The emergence of large language models (LLMs) has significantly transformed natural language processing (NLP), enabling more generalized models to perform various tasks with minimal training...
- MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues : Abstract: Multimodal large language models (MLLMs) are increasingly deployed as assistants that interact through text and images, making it crucial to evaluate contextual safety when risk depends on b...
- Towards Computational Chinese Paleography : Abstract: Chinese paleography, the study of ancient Chinese writing, is undergoing a computational turn powered by artificial intelligence. This position paper charts the trajectory of this emerging f...
- Evaluating Accounting Reasoning Capabilities of Large Language Models : Abstract: Large language models are transforming learning, cognition, and research across many fields. Effectively integrating them into professional domains, such as accounting, is a key challenge fo...
- GRASP LoRA: GRPO Guided Adapter Sparsity Policy for Cross Lingual Transfer : Abstract: Parameter efficient fine tuning is a way to adapt LLMs to new languages when compute or data are limited, yet adapter pipelines usually choose a global prune ratio by grid search. This pract...
- Characterising Toxicity in Generative Large Language Models : Abstract: In recent years, the advent of the attention mechanism has significantly advanced the field of natural language processing (NLP), revolutionizing text processing and text generation. This ha...
- IDRBench: Interactive Deep Research Benchmark : Abstract: Deep research agents powered by Large Language Models (LLMs) can perform multi-step reasoning, web exploration, and long-form report generation. However, most existing systems operate in an ...
- Evaluating Cross-Lingual Unlearning in Multilingual Language Models : Abstract: We present the first comprehensive evaluation of cross-lingual unlearning in multilingual LLMs. Using translated TOFU benchmarks in seven language/script variants, we test major unlearning a...
- Will it Merge? On The Causes of Model Mergeability : Abstract: Model merging has emerged as a promising technique for combining multiple fine-tuned models into a single multitask model without retraining. However, the factors that determine whether merg...
- InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs : Abstract: Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and ...
- What makes for an enjoyable protagonist? An analysis of character warmth and competence : Abstract: Drawing on psychological and literary theory, we investigated whether the warmth and competence of movie protagonists predict IMDb ratings, and whether these effects vary across genres. Usin...
- Do Language Models Reason Across Languages? : Abstract: The real-world information sources are inherently multilingual, which naturally raises a question about whether language models can synthesize information across languages. In this paper, we...
- Efficient Aspect Term Extraction using Spiking Neural Network : Abstract: Aspect Term Extraction (ATE) identifies aspect terms in review sentences, a key subtask of sentiment analysis. While most existing approaches use energy-intensive deep neural networks (DNNs)...
- MedEinst: Benchmarking the Einstellung Effect in Medical LLMs through Counterfactual Differential Diagnosis : Abstract: Despite achieving high accuracy on medical benchmarks, LLMs exhibit the Einstellung Effect in clinical diagnosis--relying on statistical shortcuts rather than patient-specific evidence, caus...
- Labels have Human Values: Value Calibration of Subjective Tasks : Abstract: Building NLP systems for subjective tasks requires one to ensure their alignment to contrasting human values. We propose the MultiCalibrated Subjective Task Learner framework (MC-STL), which...
- Efficient and Reliable Estimation of Named Entity Linking Quality: A Case Study on GutBrainIE : Abstract: Named Entity Linking (NEL) is a core component of biomedical Information Extraction (IE) pipelines, yet assessing its quality at scale is challenging due to the high cost of expert annotatio...
- N2N-GQA: Noise-to-Narrative for Graph-Based Table-Text Question Answering Using LLMs : Abstract: Multi-hop question answering over hybrid table-text data requires retrieving and reasoning across multiple evidence pieces from large corpora, but standard Retrieval-Augmented Generation (RA...
- Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation : Abstract: Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (ML...
- How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs : Abstract: Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior...
- Stylistic Evolution and LLM Neutrality in Singlish Language : Abstract: Singlish is a creole rooted in Singapore's multilingual environment and continues to evolve alongside social and technological change. This study investigates the evolution of Singlish over ...
- Are Emotions Arranged in a Circle? Geometric Analysis of Emotion Representations via Hyperspherical Contrastive Learning : Abstract: Psychological research has long utilized circumplex models to structure emotions, placing similar emotions adjacently and opposing ones diagonally. Although frequently used to interpret deep...
- EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation : Abstract: Large language models are increasingly applied to various development scenarios. However, in on-chain transaction scenarios, even a minor error can cause irreversible loss for users. Existin...
- CSR-RAG: An Efficient Retrieval System for Text-to-SQL on the Enterprise Scale : Abstract: Natural language to SQL translation (Text-to-SQL) is one of the long-standing problems that has recently benefited from advances in Large Language Models (LLMs). While most academic Text-to-...
- Expos\'ia: Academic Writing Assessment of Expos\'es and Peer Feedback : Abstract: We present Exposía, the first public dataset that connects writing and feedback assessment in higher education, enabling research on educationally grounded approaches to academic writing eva...
- Atomic-SNLI: Fine-Grained Natural Language Inference through Atomic Fact Decomposition : Abstract: Current Natural Language Inference (NLI) systems primarily operate at the sentence level, providing black-box decisions that lack explanatory power. While atomic-level NLI offers a promising...
- MedRAGChecker: Claim-Level Verification for Biomedical Retrieval-Augmented Generation : Abstract: Biomedical retrieval-augmented generation (RAG) can ground LLM answers in medical literature, yet long-form outputs often contain isolated unsupported or contradictory claims with safety imp...
- Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection : Abstract: Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manuall...
- IndRegBias: A Dataset for Studying Indian Regional Biases in English and Code-Mixed Social Media Comments : Abstract: Warning: This paper consists of examples representing regional biases in Indian regions that might be offensive towards a particular region. While social biases corresponding to gender, race...
- LitVISTA: A Benchmark for Narrative Orchestration in Literary Text : Abstract: Computational narrative analysis aims to capture rhythm, tension, and emotional dynamics in literary texts. Existing large language models can generate long stories but overly focus on causa...
- Time Travel Engine: A Shared Latent Chronological Manifold Enables Historical Navigation in Large Language Models : Abstract: Time functions as a fundamental dimension of human cognition, yet the mechanisms by which Large Language Models (LLMs) encode chronological progression remain opaque. We demonstrate that tem...
- NC-Bench: An LLM Benchmark for Evaluating Conversational Competence : Abstract: The Natural Conversation Benchmark (NC-Bench) introduce a new approach to evaluating the general conversational competence of large language models (LLMs). Unlike prior benchmarks that focus...
- Can a Unimodal Language Agent Provide Preferences to Tune a Multimodal Vision-Language Model? : Abstract: To explore a more scalable path for adding multimodal capabilities to existing LLMs, this paper addresses a fundamental question: Can a unimodal LLM, relying solely on text, reason about its...
- Structured Episodic Event Memory : Abstract: Current approaches to memory in Large Language Models (LLMs) predominantly rely on static Retrieval-Augmented Generation (RAG), which often results in scattered retrieval and fails to captur...
- Value of Information: A Framework for Human-Agent Communication : Abstract: Large Language Model (LLM) agents deployed for real-world tasks face a fundamental dilemma: user requests are underspecified, yet agents must decide whether to act on incomplete information ...
- Steer Model beyond Assistant: Controlling System Prompt Strength via Contrastive Decoding : Abstract: Large language models excel at complex instructions yet struggle to deviate from their helpful assistant persona, as post-training instills strong priors that resist conflicting instructions...
- MITRA: A Large-Scale Parallel Corpus and Multilingual Pretrained Language Model for Machine Translation and Semantic Retrieval for P\=ali, Sanskrit, Buddhist Chinese, and Tibetan : Abstract: Ancient Buddhist literature features frequent, yet often unannotated, textual parallels spread across diverse languages: Sanskrit, Pāli, Buddhist Chinese, Tibetan, and more. The scale of thi...
- AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages : Abstract: Large language models (LLMs) are increasingly multilingual, yet open models continue to underperform relative to proprietary systems, with the gap most pronounced for African languages. Cont...
- Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology : Abstract: Speech and language are valuable for interacting with technology. It would be ideal to be able to decouple their use from anthropomorphization, which has recently met an important moment of ...
- Average shortest-path length in word-adjacency networks: Chinese versus English : Abstract: Complex networks provide powerful tools for analyzing and understanding the intricate structures present in various systems, including natural language. Here, we analyze topology of growing ...
- What Matters When Building Universal Multilingual Named Entity Recognition Models? : Abstract: Recent progress in universal multilingual named entity recognition (NER) has been driven by advances in multilingual transformer models and task-specific architectures, loss functions, and t...
- On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation : Abstract: Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content while preserving attributes like speaker and emotion, serving as f...
- Annotating Dimensions of Social Perception in Text: The First Sentence-Level Dataset of Warmth and Competence : Abstract: Warmth (W) (often further broken down into Trust (T) and Sociability (S)) and Competence (C) are central dimensions along which people evaluate individuals and social groups (Fiske, 2018). W...
- A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality : Abstract: Non-compositional expressions (e.g., idioms, proverbs, and metaphors) pose significant challenges for neural machine translation systems because their meanings cannot be derived from individ...
- SyntaxMind at BLP-2025 Task 1: Leveraging Attention Fusion of CNN and GRU for Hate Speech Detection : Abstract: This paper describes our system used in the BLP-2025 Task 1: Hate Speech Detection. We participated in Subtask 1A and Subtask 1B, addressing hate speech classification in Bangla text. Our ap...
- Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models : Abstract: Low-Rank Adaptation (LoRA) is widely used for parameter-efficient fine-tuning of large language models, but it is notably ineffective at removing backdoor behaviors from poisoned pretrained ...
- Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning : Abstract: Long-term conversational agents face a fundamental scalability challenge as interactions extend over time: repeatedly processing entire conversation histories becomes computationally prohibi...
- AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning : Abstract: Extending large language models (LLMs) to the speech domain has recently gained significant attention. A typical approach connects a pretrained LLM with an audio encoder through a projection...
- A Multi-Stage Workflow for the Review of Marketing Content with Reasoning Large Language Models : Abstract: Reasoning Large Language Models (LLMs) have shown promising results when tasked with solving complex problems. In this paper, we propose and evaluate a multi-stage workflow that leverages th...
- Lexical and Statistical Analysis of Bangla Newspaper and Literature: A Corpus-Driven Study on Diversity, Readability, and NLP Adaptation : Abstract: In this paper, we present a comprehensive corpus-driven analysis of Bangla literary and newspaper texts to investigate their lexical diversity, structural complexity and readability. We unde...
- Operation Veja: Fixing Fundamental Concepts Missing from Modern Roleplaying Training Paradigms : Abstract: Modern roleplaying models are increasingly sophisticated, yet they consistently struggle to capture the essence of believable, engaging characters. We argue this failure stems from training ...
- TeleMem: Building Long-Term and Multimodal Memory for Agentic AI : Abstract: Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories. Retrieval-augmented generation ...
- DB3 Team's Solution For Meta KDD Cup' 25 : Abstract: This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark ...
- Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model : Abstract: Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been d...
- Memory-Efficient Training for Text-Dependent SV with Independent Pre-trained Models : Abstract: This paper presents our submission to the Iranian division of the Text-Dependent Speaker Verification Challenge (TdSV) 2024. Conventional TdSV approaches typically jointly model speaker and ...
- Point processes with event time uncertainty : Abstract: Point processes are widely used statistical models for continuous-time discrete event data, such as medical records, crime reports, and social network interactions, to capture the influence ...
- Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation : Abstract: Federated Knowledge Graph Embedding (FKGE) aims to facilitate collaborative learning of entity and relation embeddings from distributed Knowledge Graphs (KGs) across multiple clients, while ...
- Hierarchic Flows to Estimate and Sample High-dimensional Probabilities : Abstract: Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensiona...
- Low-Rank Online Dynamic Assortment with Dual Contextual Information : Abstract: As e-commerce expands, delivering real-time personalized recommendations from vast catalogs poses a critical challenge for retail platforms. Maximizing revenue requires careful consideration...
- Reimagining Anomalies: What If Anomalies Were Normal? : Abstract: Deep learning-based methods have achieved a breakthrough in image anomaly detection, but their complexity introduces a considerable challenge to understanding why an instance is predicted to...
- Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces : Abstract: This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operat...
- A Convex Framework for Confounding Robust Inference : Abstract: We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case c...
- The Interpolating Information Criterion for Overparameterized Models : Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria...
- Generative Modeling via Hierarchical Tensor Sketching : Abstract: We propose a hierarchical tensor-network approach for approximating high-dimensional probability density via empirical distribution. This leverages randomized singular value decomposition (S...
- Approximating Persistent Homology for Large Datasets : Abstract: Persistent homology is an important methodology in topological data analysis which adapts theory from algebraic topology to data settings. Computing persistent homology produces persistence ...
- Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation : Abstract: With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while m...
- Integrated Multivariate Segmentation Tree for Heterogeneous Credit Data Analysis in Small- and Medium-Sized Enterprises : Abstract: Traditional decision tree models, which rely exclusively on numerical variables, often face challenges in handling high-dimensional data and are limited in their ability to incorporate textu...
- Canopy: Property-Driven Learning for Congestion Control : Abstract: Learning-based congestion controllers offer better adaptability compared to traditional heuristics. However, the unreliability of learning techniques can cause learning-based controllers to ...
- $\texttt{skwdro}$: a library for Wasserstein distributionally robust machine learning : Abstract: We present skwdro, a Python library for training robust machine learning models. The library is based on distributionally robust optimization using Wasserstein distances, popular in optimal ...
- EMP: Enhance Memory in Data Pruning : Abstract: Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning...
- Finite-Time Analysis of Simultaneous Double Q-learning : Abstract: $Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-lear...
- Multiple-policy Evaluation via Density Estimation : Abstract: We study the multiple-policy evaluation problem where we are given a set of $K$ policies and the goal is to evaluate their performance (expected total reward over a fixed horizon) to an accu...
- A Complete Decomposition of Stochastic Differential Equations : Abstract: We show that any stochastic differential equation with prescribed time-dependent marginal distributions admits a decomposition into three components: a unique scalar field governing marginal...
- Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation : Abstract: Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Inte...
- The Confidence Trap: Gender Bias and Predictive Certainty in LLMs : Abstract: The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the align...
- Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics : Abstract: Categorizing events using discriminant observables is central to many high-energy physics analyses. Yet, bin boundaries are often chosen by hand. A simple, popular choice is to apply argmax ...
- Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning : Abstract: Estimating the Riesz representer is a central problem in debiased machine learning for causal and structural parameter estimation. Various methods for Riesz representer estimation have been ...
- PFT: Phonon Fine-tuning for Machine Learned Interatomic Potentials : Abstract: Many materials properties depend on higher-order derivatives of the potential energy surface, yet machine learned interatomic potentials (MLIPs) trained with standard a standard loss on ener...
- Backward Reconstruction of the Chafee--Infante Equation via Physics-Informed WGAN-GP : Abstract: We present a physics-informed Wasserstein GAN with gradient penalty (WGAN-GP) for solving the inverse Chafee--Infante problem on two-dimensional domains with Dirichlet boundary conditions. T...
- Hidden Monotonicity: Explaining Deep Neural Networks via their DC Decomposition : Abstract: It has been demonstrated in various contexts that monotonicity leads to better explainability in neural networks. However, not every function can be well approximated by a monotone neural ne...
- A Framework for Feature Discovery in Intracranial Pressure Monitoring Data Using Neural Network Attention : Abstract: We present a novel framework for analyzing intracranial pressure monitoring data by applying interpretability principles. Intracranial pressure monitoring data was collected from 60 patients...
- Physics-Informed Singular-Value Learning for Cross-Covariances Forecasting in Financial Markets : Abstract: A new wave of work on covariance cleaning and nonlinear shrinkage has delivered asymptotically optimal analytical solutions for large covariance matrices. Building on this progress, these id...
- Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks : Abstract: Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, the...
- Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference : Abstract: Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in rec...
- Learning to accelerate Krasnosel'skii-Mann fixed-point iterations with guarantees : Abstract: We introduce a principled learning to optimize (L2O) framework for solving fixed-point problems involving general nonexpansive mappings. Our idea is to deliberately inject summable perturbat...
- Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms : Abstract: As intelligent agents become more generally-capable, i.e. able to master a wide variety of tasks, the complexity and cost of properly evaluating them rises significantly. Tasks that assess s...
- Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting : Abstract: Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only doe...
- Dual-Level Models for Physics-Informed Multi-Step Time Series Forecasting : Abstract: This paper develops an approach for multi-step forecasting of dynamical systems by integrating probabilistic input forecasting with physics-informed output prediction. Accurate multi-step fo...
- Reinforcement Learning for Micro-Level Claims Reserving : Abstract: Outstanding claim liabilities are revised repeatedly as claims develop, yet most modern reserving models are trained as one-shot predictors and typically learn only from settled claims. We f...
- Learning About Learning: A Physics Path from Spin Glasses to Artificial Intelligence : Abstract: The Hopfield model, originally inspired by spin-glass physics, occupies a central place at the intersection of statistical mechanics, neural networks, and modern artificial intelligence. Des...
- GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation : Abstract: RTL design often relies heavily on ad-hoc testbench creation early in the design cycle. While large language models (LLMs) show promise for RTL code generation, their ability to reason about...
- Temporal-Aligned Meta-Learning for Risk Management: A Stacking Approach for Multi-Source Credit Scoring : Abstract: This paper presents a meta-learning framework for credit risk assessment of Italian Small and Medium Enterprises (SMEs) that explicitly addresses the temporal misalignment of credit scoring ...
- Machine learning nonequilibrium phase transitions in charge-density wave insulators : Abstract: Nonequilibrium electronic forces play a central role in voltage-driven phase transitions but are notoriously expensive to evaluate in dynamical simulations. Here we develop a machine learnin...
- Large Language Models for Physics Instrument Design : Abstract: We study the use of large language models (LLMs) for physics instrument design and compare their performance to reinforcement learning (RL). Using only prompting, LLMs are given task constra...
- An adjoint method for training data-driven reduced-order models : Abstract: Reduced-order modeling lies at the interface of numerical analysis and data-driven scientific computing, providing principled ways to compress high-fidelity simulations in science and engine...
- Nonparametric Kernel Clustering with Bandit Feedback : Abstract: Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The probl...
- Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions : Abstract: Vision-language models are increasingly employed as multimodal conversational agents (MCAs) for diverse conversational tasks. Recently, reinforcement learning (RL) has been widely explored f...
- The Secretary Problem with Predictions and a Chosen Order : Abstract: We study a learning-augmented variant of the secretary problem, recently introduced by Fujii and Yoshida (2023), in which the decision-maker has access to machine-learned predictions of cand...
- Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning : Abstract: Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primar...
- Improving Video Question Answering through query-based frame selection : Abstract: Video Question Answering (VideoQA) models enhance understanding and interaction with audiovisual content, making it more accessible, searchable, and useful for a wide range of fields such as...
- PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation : Abstract: We propose physics-informed digital twin (PIDT): a fiber parameter estimation approach that combines a parameterized split-step method with a physics-informed loss. PIDT improves accuracy an...
- Position: Don't be Afraid of Over-Smoothing And Over-Squashing : Abstract: Over-smoothing and over-squashing have been extensively studied in the literature on Graph Neural Networks (GNNs) over the past years. We challenge this prevailing focus in GNN research, arg...
- Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning : Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks. However, standard GRPO employs a coarse-grained credit a...
- SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models : Abstract: Large Audio Language Models (LALMs) have been widely applied in real-time scenarios, such as in-car assistants and online meeting comprehension. In practice, audio inputs are often corrupted...
- Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning : Abstract: This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algo...
- Variational Approximations for Robust Bayesian Inference via Rho-Posteriors : Abstract: The $ρ$-posterior framework provides universal Bayesian estimation with explicit contamination rates and optimal convergence guarantees, but has remained computationally difficult due to an ...
- ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging : Abstract: Interactive large language model agents have advanced rapidly, but most remain specialized to a single environment and fail to adapt robustly to other environments. Model merging offers a tr...
- Covariance-Driven Regression Trees: Reducing Overfitting in CART : Abstract: Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART ...
- A High-Recall Cost-Sensitive Machine Learning Framework for Real-Time Online Banking Transaction Fraud Detection : Abstract: Fraudulent activities on digital banking services are becoming more intricate by the day, challenging existing defenses. While older rule driven methods struggle to keep pace, even precision...
- Multi-environment Invariance Learning with Missing Data : Abstract: Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across enviro...
- Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models : Abstract: Large language models (LLMs) achieve strong aver- age performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidenc...
- Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration : Abstract: While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between t...
- On Lie Groups Preserving Subspaces of Degenerate Clifford Algebras : Abstract: This paper introduces Lie groups in degenerate geometric (Clifford) algebras that preserve four fundamental subspaces determined by the grade involution and reversion under the adjoint and t...
- AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units : Abstract: To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requ...
- Optimal Transport under Group Fairness Constraints : Abstract: Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness re...
- Proof of Reasoning for Privacy Enhanced Federated Blockchain Learning at the Edge : Abstract: Consensus mechanisms are the core of any blockchain system. However, the majority of these mechanisms do not target federated learning directly nor do they aid in the aggregation step. This ...
- Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework : Abstract: While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience...
- Robust Bayesian Optimization via Tempered Posteriors : Abstract: Bayesian optimization (BO) iteratively fits a Gaussian process (GP) surrogate to accumulated evaluations and selects new queries via an acquisition function such as expected improvement (EI)...
- XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators : Abstract: Emerging memory technologies have gained significant attention as a promising pathway to overcome the limitations of conventional computing architectures in deep learning applications. By en...
- Robust Mean Estimation under Quantization : Abstract: We consider the problem of mean estimation under quantization and adversarial corruption. We construct multivariate robust estimators that are optimal up to logarithmic factors in two differ...
- Local EGOP for Continuous Index Learning : Abstract: We introduce the setting of continuous index learning, in which a function of many variables varies only along a small number of directions at each point. For efficient estimation, it is ben...
- Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge : Abstract: Multi-hop question answering is widely used to evaluate the reasoning capabilities of large language models (LLMs), as it requires integrating multiple pieces of supporting knowledge to arri...
- Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers : Abstract: Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely drive...
- Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation : Abstract: Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters - show performance degradation when applied to n...
- Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests : Abstract: Random forests (RFs) are widely used for prediction and variable importance analysis and are often believed to capture any types of interactions via recursive splitting. However, since the s...
- Match Made with Matrix Completion: Efficient Learning under Matching Interference : Abstract: Matching markets face increasing needs to learn the matching qualities between demand and supply for effective design of matching policies. In practice, the matching rewards are high-dimensi...
- Generalization Bounds for Transformer Channel Decoders : Abstract: Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains the...
- The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks : Abstract: The success of deep neural networks largely depends on the statistical structure of the training data. While learning dynamics and generalization on isotropic data are well-established, the ...
- X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests : Abstract: Competitive programming presents great challenges for Code LLMs due to its intensive reasoning demands and high logical complexity. However, current Code LLMs still rely heavily on real-worl...
- mind_call: A Dataset for Mental Health Function Calling with Large Language Models : Abstract: Large Language Model (LLM)-based systems increasingly rely on function calling to enable structured and controllable interaction with external data sources, yet existing datasets do not addr...
- Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models : Abstract: Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while other...
- Paraphrasing Adversarial Attack on LLM-as-a-Reviewer : Abstract: The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential vulnerabilities. Prior attacks rely on prompt i...
- Applying Embedding-Based Retrieval to Airbnb Search : Abstract: The goal of Airbnb search is to match guests with the ideal accommodation that fits their travel needs. This is a challenging problem, as popular search locations can have around a hundred t...
- qAttCNN - Self Attention Mechanism for Video QoE Prediction in Encrypted Traffic : Abstract: The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom a...
- Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems : Abstract: Future wireless communication systems will increasingly rely on the integration of millimeter wave (mmWave) and sub-6 GHz bands to meet heterogeneous demands on high-speed data transmission ...
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems : Abstract: Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To ...
- Constrained Density Estimation via Optimal Transport : Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the...
- Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced reasoning capabilities in Large Language Models. However, adapting RLVR to multimodal domains suffers from a ...
- CliffordNet: All You Need is Geometric Algebra : Abstract: Modern computer vision architectures, from CNNs to Transformers, predominantly rely on the stacking of heuristic modules: spatial mixers (Attention/Conv) followed by channel mixers (FFNs). I...
- Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies : Abstract: Individualized treatment regimes (ITRs) aim to improve clinical outcomes by assigning treatment based on patient-specific characteristics. However, existing methods often struggle with high-...
- ALFA: A Safe-by-Design Approach to Mitigate Quishing Attacks Launched via Fancy QR Codes : Abstract: Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their confidential data. Recently, we see the colorf...
- GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO : Abstract: We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, "Ganit"), together with a new difficulty-aware Bengali math corpus and a curri...
- Comparative Separation: Evaluating Separation on Comparative Judgment Test Data : Abstract: This research seeks to benefit the software engineering society by proposing comparative separation, a novel group fairness notion to evaluate the fairness of machine learning software on co...
- A Backpropagation-Free Feedback-Hebbian Network for Continual Learning Dynamics : Abstract: Feedback-rich neural architectures can regenerate earlier representations and inject temporal context, making them a natural setting for strictly local synaptic plasticity. We ask whether a ...
- Logic-Driven Semantic Communication for Resilient Multi-Agent Systems : Abstract: The advent of 6G networks is accelerating autonomy and intelligence in large-scale, decentralized multi-agent systems (MAS). While this evolution enables adaptive behavior, it also heightens...
- DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models : Abstract: Stochastic computing (SC) offers hardware simplicity but suffers from low throughput, while high-throughput Digital Computing-in-Memory (DCIM) is bottlenecked by costly adder logic for matri...
- Diffusion Models with Heavy-Tailed Targets: Score Estimation and Sampling Guarantees : Abstract: Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation lar...
- A Multimodal Deep Learning Framework for Predicting ICU Deterioration: Integrating ECG Waveforms with Clinical Data and Clinician Benchmarking : Abstract: Artificial intelligence holds strong potential to support clinical decision making in intensive care units where timely and accurate risk assessment is critical. However, many existing model...
- Lower Bounds for the Algorithmic Complexity of Learned Indexes : Abstract: Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, the...
- Cross-Border Data Security and Privacy Risks in Large Language Models and IoT Systems : Abstract: The reliance of Large Language Models and Internet of Things systems on massive, globally distributed data flows creates systemic security and privacy challenges. When data traverses borders...
- Pragya: An AI-Based Semantic Recommendation System for Sanskrit Subhasitas : Abstract: Sanskrit Subhasitas encapsulate centuries of cultural and philosophical wisdom, yet remain underutilized in the digital age due to linguistic and contextual barriers. In this work, we presen...
- Object-Centric World Models Meet Monte Carlo Tree Search : Abstract: In this paper, we introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments more effectively....
- UMLoc: Uncertainty-Aware Map-Constrained Inertial Localization with Quantified Bounds : Abstract: Inertial localization is particularly valuable in GPS-denied environments such as indoors. However, localization using only Inertial Measurement Units (IMUs) suffers from drift caused by mot...
- Detecting LLM-Generated Text with Performance Guarantees : Abstract: Large language models (LLMs) such as GPT, Claude, Gemini, and Grok have been deeply integrated into our daily life. They now support a wide range of tasks -- from dialogue and email drafting...
- SimLLM: Fine-Tuning Code LLMs for SimPy-Based Queueing System Simulation : Abstract: The Python package SimPy is widely used for modeling queueing systems due to its flexibility, simplicity, and smooth integration with modern data analysis and optimization frameworks. Recent...
- Pareto-Optimal Model Selection for Low-Cost, Single-Lead EMG Control in Embedded Systems : Abstract: Consumer-grade biosensors offer a cost-effective alternative to medical-grade electromyography (EMG) systems, reducing hardware costs from thousands of dollars to approximately $13. However,...
- Inference-Time Alignment for Diffusion Models via Doob's Matching : Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the gener...
- Hybrid LSTM-UKF Framework: Ankle Angle and Ground Reaction Force Estimation : Abstract: Accurate prediction of joint kinematics and kinetics is essential for advancing gait analysis and developing intelligent assistive systems such as prosthetics and exoskeletons. This study pr...
- PRISP: Privacy-Safe Few-Shot Personalization via Lightweight Adaptation : Abstract: Large language model (LLM) personalization aims to adapt general-purpose models to individual users. Most existing methods, however, are developed under data-rich and resource-abundant setti...
- Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators : Abstract: Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(\mathcal{L}-λ)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predicti...
- PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation : Abstract: Large Language Models (LLMs) have recently shown strong potential for usage in sequential recommendation tasks through text-only models, which combine advanced prompt design, contrastive ali...
- On a Gradient Approach to Chebyshev Center Problems with Applications to Function Learning : Abstract: We introduce $\textsf{gradOL}$, the first gradient-based optimization framework for solving Chebyshev center problems, a fundamental challenge in optimal function learning and geometric opti...
- Continual Quantum Architecture Search with Tensor-Train Encoding: Theory and Applications to Signal Processing : Abstract: We introduce CL-QAS, a continual quantum architecture search framework that mitigates the challenges of costly amplitude encoding and catastrophic forgetting in variational quantum circuits....
- Supervised and Unsupervised Neural Network Solver for First Order Hyperbolic Nonlinear PDEs : Abstract: We present a neural network-based method for learning scalar hyperbolic conservation laws. Our method replaces the traditional numerical flux in finite volume schemes with a trainable neural...
- Computational Mapping of Reactive Stroma in Prostate Cancer Yields Interpretable, Prognostic Biomarkers : Abstract: Current histopathological grading of prostate cancer relies primarily on glandular architecture, largely overlooking the tumor microenvironment. Here, we present PROTAS, a deep learning fram...
- Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers : Abstract: Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text pro...
- $\texttt{AMEND++}$: Benchmarking Eligibility Criteria Amendments in Clinical Trials : Abstract: Clinical trial amendments frequently introduce delays, increased costs, and administrative burden, with eligibility criteria being the most commonly amended component. We introduce \textit{e...
- How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? : Abstract: Mass spectrometry (MS) is a powerful analytical technique for identifying small molecules, yet determining complete molecular structures directly from tandem mass spectra (MS/MS) remains a l...
- Hard Constraint Projection in a Physics Informed Neural Network : Abstract: In this work, we embed hard constraints in a physics informed neural network (PINN) which predicts solutions to the 2D incompressible Navier Stokes equations. We extend the hard constraint m...
- Cyber Threat Detection and Vulnerability Assessment System using Generative AI and Large Language Model : Abstract: Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threat...
- Towards Public Administration Research Based on Interpretable Machine Learning : Abstract: Causal relationships play a pivotal role in research within the field of public administration. Ensuring reliable causal inference requires validating the predictability of these relationshi...
- Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering : Abstract: Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models integrate groups of conflicting retrieved ...
- Neuro-Symbolic Compliance: Integrating LLMs and SMT Solvers for Automated Financial Legal Analysis : Abstract: Financial regulations are increasingly complex, hindering automated compliance-especially the maintenance of logical consistency with minimal human oversight. We introduce a Neuro-Symbolic C...
- Performance of models for monitoring sustainable development goals from remote sensing: A three-level meta-regression : Abstract: Machine learning (ML) is a tool to exploit remote sensing data for the monitoring and implementation of the United Nations' Sustainable Development Goals (SDGs). In this paper, we report on ...
- Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking : Abstract: The widespread adoption of text-to-image (T2I) diffusion models has raised concerns about their potential to generate copyrighted, inappropriate, or sensitive imagery learned from massive tr...
- Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece : Abstract: Tokens are the basic units of Large Language Models (LLMs). LLMs rely on tokenizers to segment text into these tokens, and tokenization is the primary determinant of computational and infere...
- COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control : Abstract: Visual reinforcement learning (RL) suffers from poor sample efficiency due to high-dimensional observations in complex tasks. While existing works have shown that vision-language models (VLM...
- CBMAS: Cognitive Behavioral Modeling via Activation Steering : Abstract: Large language models (LLMs) often encode cognitive behaviors unpredictably across prompts, layers, and contexts, making them difficult to diagnose and control. We present CBMAS, a diagnosti...
- Dynamic Intelligence Ceilings: Measuring Long-Horizon Limits of Planning and Creativity in Artificial Systems : Abstract: Recent advances in artificial intelligence have produced systems capable of remarkable performance across a wide range of tasks. These gains, however, are increasingly accompanied by concern...
- PriceSeer: Evaluating Large Language Models in Real-Time Stock Prediction : Abstract: Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied. Current large language models (LLMs) have shown...
- One if by Land, Two if by Sea, Three if by Four Seas, and More to Come -- Values of Perception, Prediction, Communication, and Common Sense in Decision Making : Abstract: This work aims to rigorously define the values of perception, prediction, communication, and common sense in decision making. The defined quantities are decision-theoretic, but have informat...
- Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization : Abstract: Chain-of-thought reasoning in large language models often creates an "overthinking trap," leading to excessive computational cost and latency for unreliable accuracy gains. Prior work has ty...
- Leveraging Foundation Models for Calibration-Free c-VEP BCIs : Abstract: Foundation Models (FMs) have surged in popularity over the past five years, with applications spanning fields from computer vision to natural language processing. Brain-Computer Interfaces (...
- Optimal Learning Rate Schedule for Balancing Effort and Performance : Abstract: Learning how to learn efficiently is a fundamental challenge for biological agents and a growing concern for artificial ones. To learn effectively, an agent must regulate its learning speed,...
- DT-ICU: Towards Explainable Digital Twins for ICU Patient Monitoring via Multi-Modal and Multi-Task Iterative Inference : Abstract: We introduce DT-ICU, a multimodal digital twin framework for continuous risk estimation in intensive care. DT-ICU integrates variable-length clinical time series with static patient informat...
- Are LLM Decisions Faithful to Verbal Confidence? : Abstract: Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the re...
- Free-RBF-KAN: Kolmogorov-Arnold Networks with Adaptive Radial Basis Functions for Efficient Function Learning : Abstract: Kolmogorov-Arnold Networks (KANs) have shown strong potential for efficiently approximating complex nonlinear functions. However, the original KAN formulation relies on B-spline basis functi...
- Improving Domain Generalization in Contrastive Learning using Adaptive Temperature Control : Abstract: Self-supervised pre-training with contrastive learning is a powerful method for learning from sparsely labeled data. However, performance can drop considerably when there is a shift in the d...
- Tab-TRM: Tiny Recursive Model for Insurance Pricing on Tabular Data : Abstract: We introduce Tab-TRM (Tabular-Tiny Recursive Model), a network architecture that adapts the recursive latent reasoning paradigm of Tiny Recursive Models (TRMs) to insurance modeling. Drawing...
- Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning : Abstract: Continual Learning (CL) aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge. Recent studies have shown that optimizing towards flatter loss minim...
- Neural Architecture for Fast and Reliable Coagulation Assessment in Clinical Settings: Leveraging Thromboelastography : Abstract: In an ideal medical environment, real-time coagulation monitoring can enable early detection and prompt remediation of risks. However, traditional Thromboelastography (TEG), a widely employe...
- d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation : Abstract: Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits ...
- TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning : Abstract: Multivariate Time-Series (MTS) clustering is crucial for signal processing and data analysis. Although deep learning approaches, particularly those leveraging Contrastive Learning (CL), are ...
- Contextual Discrepancy-Aware Contrastive Learning for Robust Medical Time Series Diagnosis in Small-Sample Scenarios : Abstract: Medical time series data, such as EEG and ECG, are vital for diagnosing neurological and cardiovascular diseases. However, their precise interpretation faces significant challenges due to hi...
- Near-Optimal Private Linear Regression via Iterative Hessian Mixing : Abstract: We study differentially private ordinary least squares (DP-OLS) with bounded data. The dominant approach, adaptive sufficient-statistics perturbation (AdaSSP), adds an adaptively chosen pert...
- Stagewise Reinforcement Learning and the Geometry of the Regret Landscape : Abstract: Singular learning theory characterizes Bayesian learning as an evolving tradeoff between accuracy and complexity, with transitions between qualitatively different solutions as sample size in...
- Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission : Abstract: Due to strict rate and reliability demands, wireless image transmission remains difficult for both classical layered designs and joint source-channel coding (JSCC), especially under low late...
- FROAV: A Framework for RAG Observation and Agent Verification - Lowering the Barrier to LLM Agent Research : Abstract: The rapid advancement of Large Language Models (LLMs) and their integration into autonomous agent systems has created unprecedented opportunities for document analysis, decision support, and...
- Graph Inference Towards ICD Coding : Abstract: Automated ICD coding involves assigning standardized diagnostic codes to clinical narratives. The vast label space and extreme class imbalance continue to challenge precise prediction. To ad...
- ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs : Abstract: The emergence of fine-grained numerical formats like NVFP4 presents new opportunities for efficient Large Language Model (LLM) inference. However, it is difficult to adapt existing Post-Trai...
- Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data : Abstract: Multi-task learning (MTL) is critical in real-world applications such as autonomous driving and robotics, enabling simultaneous handling of diverse tasks. However, obtaining fully annotated ...
- AntiPaSTO: Self-Supervised Steering of Moral Reasoning : Abstract: As models grow more capable, human supervision breaks down: labels don't scale, outputs can be gamed, and training doesn't generalize. Scalable oversight requires steering methods that are i...
- Surrogate-based Optimization via Clustering for Box-Constrained Problems : Abstract: Global optimization of large-scale, complex systems such as multi-physics black-box simulations and real-world industrial systems is important but challenging. This work presents a novel Sur...
- Variational Autoencoder with Normalizing flow for X-ray spectral fitting : Abstract: Black hole X-ray binaries (BHBs) can be studied with spectral fitting to provide physical constraints on accretion in extreme gravitational environments. Traditional methods of spectral fitt...
- PLANET v2.0: A comprehensive Protein-Ligand Affinity Prediction Model Based on Mixture Density Network : Abstract: Drug discovery represents a time-consuming and financially intensive process, and virtual screening can accelerate it. Scoring functions, as one of the tools guiding virtual screening, have ...
- The Practicality of Normalizing Flow Test-Time Training in Bayesian Inference for Agent-Based Models : Abstract: Agent-Based Models (ABMs) are gaining great popularity in economics and social science because of their strong flexibility to describe the realistic and heterogeneous decisions and interacti...
- SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis : Abstract: Large language models excel across diverse domains, yet their deployment in healthcare, legal systems, and autonomous decision-making remains limited by incomplete understanding of their int...
- OceanSAR-2: A Universal Feature Extractor for SAR Ocean Observation : Abstract: We present OceanSAR-2, the second generation of our foundation model for SAR-based ocean observation. Building on our earlier release, which pioneered self-supervised learning on Sentinel-1 ...
- On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training : Abstract: Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cro...
- Computing patient similarity based on unstructured clinical notes : Abstract: Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that repr...
- CompNO: A Novel Foundation Model approach for solving Partial Differential Equations : Abstract: Partial differential equations (PDEs) govern a wide range of physical phenomena, but their numerical solution remains computationally demanding, especially when repeated simulations are requ...
- Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training : Abstract: Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a...
- BEAT-Net: Injecting Biomimetic Spatio-Temporal Priors for Interpretable ECG Classification : Abstract: Although deep learning has advanced automated electrocardiogram (ECG) diagnosis, prevalent supervised methods typically treat recordings as undifferentiated one-dimensional (1D) signals or t...
- Explaining Machine Learning Predictive Models through Conditional Expectation Methods : Abstract: The rapid adoption of complex Artificial Intelligence (AI) and Machine Learning (ML) models has led to their characterization as black boxes due to the difficulty of explaining their interna...
- Kernel Alignment-based Multi-view Unsupervised Feature Selection with Sample-level Adaptive Graph Learning : Abstract: Although multi-view unsupervised feature selection (MUFS) has demonstrated success in dimensionality reduction for unlabeled multi-view data, most existing methods reduce feature redundancy ...
- Pseudodata-guided Invariant Representation Learning Boosts the Out-of-Distribution Generalization in Enzymatic Kinetic Parameter Prediction : Abstract: Accurate prediction of enzyme kinetic parameters is essential for understanding catalytic mechanisms and guiding enzyme engineering.However, existing deep learning-based enzyme-substrate int...
- Simulated Annealing-based Candidate Optimization for Batch Acquisition Functions : Abstract: Bayesian Optimization with multi-objective acquisition functions such as q-Expected Hypervolume Improvement (qEHVI) requires efficient candidate optimization to maximize acquisition function...
- Innovation Capacity of Dynamical Learning Systems : Abstract: In noisy physical reservoirs, the classical information-processing capacity $C_{\mathrm{ip}}$ quantifies how well a linear readout can realize tasks measurable from the input history, yet $C...
- DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting : Abstract: Accurate energy time-series forecasting is crucial for ensuring grid stability and promoting the integration of renewable energy, yet it faces significant challenges from complex temporal de...
- MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization : Abstract: Group-Relative Policy Optimization (GRPO) has emerged as an efficient paradigm for aligning Large Language Models (LLMs), yet its efficacy is primarily confined to domains with verifiable gr...
- CalPro: Prior-Aware Evidential--Conformal Prediction with Structure-Aware Guarantees for Protein Structures : Abstract: Deep protein structure predictors such as AlphaFold provide confidence estimates (e.g., pLDDT) that are often miscalibrated and degrade under distribution shifts across experimental modaliti...
- Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment : Abstract: The inherent safety alignment of Large Language Models (LLMs) is prone to erosion during fine-tuning, even when using seemingly innocuous datasets. While existing defenses attempt to mitigat...
- Forward versus Backward: Comparing Reasoning Objectives in Direct Preference Optimization : Abstract: Large language models exhibit impressive reasoning capabilities yet frequently generate plausible but incorrect solutions, a phenomenon commonly termed hallucination. This paper investigates...
- Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics : Abstract: Post-training activation compression is essential for deploying Large Language Models (LLMs) on resource-constrained hardware. However, standard methods like Singular Value Decomposition (SV...
- Standardization of Post-Publication Code Verification by Journals is Possible with the Support of the Community : Abstract: Reproducibility remains a challenge in machine learning research. While code and data availability requirements have become increasingly common, post-publication verification in journals is ...
- PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization : Abstract: Policy optimization for large language models often suffers from sparse reward signals in multi-step reasoning tasks. Critic-free methods like GRPO assign a single normalized outcome reward ...
- Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization : Abstract: Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient ...
- Stable On-Policy Distillation through Adaptive Target Reformulation : Abstract: Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers f...
- Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning : Abstract: Developing new fluorophores for advanced imaging techniques requires exploring new chemical space. While generative AI approaches have shown promise in designing novel dye scaffolds, prior e...
- Towards Automated Diagnosis of Inherited Arrhythmias: Combined Arrhythmia Classification Using Lead-Aware Spatial Attention Networks : Abstract: Arrhythmogenic right ventricular cardiomyopathy (ARVC) and long QT syndrome (LQTS) are inherited arrhythmia syndromes associated with sudden cardiac death. Deep learning shows promise for EC...
- Reward-Preserving Attacks For Robust Reinforcement Learning : Abstract: Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate...
- When Should We Introduce Safety Interventions During Pretraining? : Abstract: Ensuring the safety of language models in high-stakes settings remains a pressing challenge, as aligned behaviors are often brittle and easily undone by adversarial pressure or downstream fi...
- Hallucinations Live in Variance : Abstract: Benchmarks measure whether a model is correct. They do not measure whether a model is reliable. This distinction is largely academic for single-shot inference, but becomes critical for agent...
- Explainable Deep Radiogenomic Molecular Imaging for MGMT Methylation Prediction in Glioblastoma : Abstract: Glioblastoma (GBM) is a highly aggressive primary brain tumor with limited therapeutic options and poor prognosis. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT)...
- Tight Analysis of Decentralized SGD: A Markov Chain Perspective : Abstract: We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show t...
- A Robust Certified Machine Unlearning Method Under Distribution Shift : Abstract: The Newton method has been widely adopted to achieve certified unlearning. A critical assumption in existing approaches is that the data requested for unlearning are selected i.i.d.(independ...
- HAS-VQ: Hessian-Adaptive Sparse Vector Quantization for High-Fidelity LLM Compression : Abstract: Post-training quantization is essential for deploying Large Language Models (LLMs) on resource- constrained devices. However, standard integer quantization (e.g., INT4) fundamentally degrade...
- Towards Operational Streamflow Forecasting in the Limpopo River Basin using Long Short-Term Memory Networks : Abstract: Robust hydrological simulation is key for sustainable development, water management strategies, and climate change adaptation. In recent years, deep learning methods have been demonstrated t...
- Forgetting Similar Samples: Can Machine Unlearning Do it Better? : Abstract: Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive resear...
- Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems : Abstract: Efficient discovery of new materials demands strategies to reduce the number of costly first-principles calculations required to train predictive machine learning models. We develop and vali...
- Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities : Abstract: We study the multinomial logit (MNL) contextual bandit problem for sequential assortment selection. Although most existing research assumes utility functions to be linear in item features, t...
- DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis : Abstract: Multimodal large language models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their effectiveness on multimodal sentiment analysis remains constrained by the sc...
- U-MASK: User-adaptive Spatio-Temporal Masking for Personalized Mobile AI Applications : Abstract: Personalized mobile artificial intelligence applications are widely deployed, yet they are expected to infer user behavior from sparse and irregular histories under a continuously evolving s...
- MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models : Abstract: Training large-scale Mixture-of-Experts (MoE) models typically requires high-memory, high-bandwidth GPUs (e.g., A100), and their high cost has become a major barrier to large-model training....
- Variational decomposition autoencoding improves disentanglement of latent representations : Abstract: Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedic...
- Analyzing the effect of prediction accuracy on the distributionally-robust competitive ratio : Abstract: The field of algorithms with predictions aims to improve algorithm performance by integrating machine learning predictions into algorithm design. A central question in this area is how predi...
- WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport : Abstract: The Wasserstein-Fisher-Rao (WFR) metric extends dynamic optimal transport (OT) by coupling displacement with change of mass, providing a principled geometry for modeling unbalanced snapshot ...
- Graph Neural Network with One-side Edge Sampling for Fraud Detection : Abstract: Financial fraud is always a major problem in the field of finance, as it can cause significant consequences. As a result, many approaches have been designed to detect it, and lately Graph Ne...
- Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature : Abstract: The electroencephalogram (EEG) has been the gold standard for quantifying mental workload; however, due to its complexity and non-portability, it can be constraining. ECG signals, which are ...
- Artificial Entanglement in the Fine-Tuning of Large Language Models : Abstract: Large language models (LLMs) can be adapted to new tasks using parameter-efficient fine-tuning (PEFT) methods that modify only a small number of trainable parameters, often through low-rank ...
- Structure-preserving learning and prediction in optimal control of collective motion : Abstract: Wide-spread adoption of unmanned vehicle technologies requires the ability to predict the motion of the combined vehicle operation from observations. While the general prediction of such mot...
- Federated Continual Learning for Privacy-Preserving Hospital Imaging Classification : Abstract: Deep learning models for radiology interpretation increasingly rely on multi-institutional data, yet privacy regulations and distribution shift across hospitals limit central data pooling. F...
- Why are there many equally good models? An Anatomy of the Rashomon Effect : Abstract: The Rashomon effect -- the existence of multiple, distinct models that achieve nearly equivalent predictive performance -- has emerged as a fundamental phenomenon in modern machine learning ...
- Predicting Student Success with Heterogeneous Graph Deep Learning and Machine Learning Models : Abstract: Early identification of student success is crucial for enabling timely interventions, reducing dropout rates, and promoting on time graduation. In educational settings, AI powered systems ha...
- Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning : Abstract: Machine learning (ML) models show strong promise for new biomedical prediction tasks, but concerns about trustworthiness have hindered their clinical adoption. In particular, it is often unc...
- Explainability of Complex AI Models with Correlation Impact Ratio : Abstract: Complex AI systems make better predictions but often lack transparency, limiting trustworthiness, interpretability, and safe deployment. Common post hoc AI explainers, such as LIME, SHAP, HS...
- Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget : Abstract: Recent advances in mathematical reasoning typically rely on massive scale, yet the question remains: can strong reasoning capabilities be induced in small language models ($\leq1.5\text{B}$)...
- Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction : Abstract: Real-time traffic prediction is critical for managing transportation systems during hurricane evacuations. Although data-driven graph-learning models have demonstrated strong capabilities in...
- Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency : Abstract: Research in machine learning has questioned whether increases in training token counts reliably produce proportional performance gains in large language models. Building on prior work introd...
- Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning : Abstract: Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific d...
- KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks : Abstract: Open-ended tasks, such as coding problems that are common in computer science education, provide detailed insights into student knowledge. However, training large language models (LLMs) to s...
- CEDAR: Context Engineering for Agentic Data Science : Abstract: We demonstrate CEDAR, an application for automating data science (DS) tasks with an agentic setup. Solving DS problems with LLMs is an underexplored area that has immense market value. The c...
- Implicit bias as a Gauge correction: Theory and Inverse Design : Abstract: A central problem in machine learning theory is to characterize how learning dynamics select particular solutions among the many compatible with the training objective, a phenomenon, called ...
- Softly Induced Functional Simplicity Implications for Neural Network Generalisation, Robustness, and Distillation : Abstract: Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning and its applications to high-energy physics (HEP). Solutions of low...
- Hellinger Multimodal Variational Autoencoders : Abstract: Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions ...
- Mosaic: Unlocking Long-Context Inference for Diffusion LLMs via Global Memory Planning and Dynamic Peak Taming : Abstract: Diffusion-based large language models (dLLMs) have emerged as a promising paradigm, utilizing simultaneous denoising to enable global planning and iterative refinement. While these capabilit...
- Short-term electricity load forecasting with multi-frequency reconstruction diffusion : Abstract: Diffusion models have emerged as a powerful method in various applications. However, their application to Short-Term Electricity Load Forecasting (STELF) -- a typical scenario in energy syst...
- Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies : Abstract: Accurate forecasting of the grid carbon intensity factor (CIF) is critical for enabling demand-side management and reducing emissions in modern electricity systems. Leveraging multiple inter...
- A novel RF-enabled Non-Destructive Inspection Method through Machine Learning and Programmable Wireless Environments : Abstract: Contemporary industrial Non-Destructive Inspection (NDI) methods require sensing capabilities that operate in occluded, hazardous, or access restricted environments. Yet, the current visual ...
- Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings : Abstract: Bayesian optimization (BO) is a common framework for optimizing black-box functions, yet most existing methods assume static query costs and rely on myopic acquisition strategies. We introdu...
- ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking : Abstract: Reinforcement learning has substantially improved the performance of LLM agents on tasks with verifiable outcomes, but it still struggles on open-ended agent tasks with vast solution spaces ...
- Deriving Decoder-Free Sparse Autoencoders from First Principles : Abstract: Gradient descent on log-sum-exp (LSE) objectives performs implicit expectation--maximization (EM): the gradient with respect to each component output equals its responsibility. The same theo...
- StablePDENet: Enhancing Stability of Operator Learning for Solving Differential Equations : Abstract: Learning solution operators for differential equations with neural networks has shown great potential in scientific computing, but ensuring their stability under input perturbations remains ...
- Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths : Abstract: Designing a unified neural network to efficiently and inherently process sequential data with arbitrary lengths is a central and challenging problem in sequence modeling. The design choices ...
- Physics-Informed Tree Search for High-Dimensional Computational Design : Abstract: High-dimensional design spaces underpin a wide range of physics-based modeling and computational design tasks in science and engineering. These problems are commonly formulated as constraine...
- FlexAct: Why Learn when you can Pick? : Abstract: Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a n...
- Certified Unlearning in Decentralized Federated Learning : Abstract: Driven by the right to be forgotten (RTBF), machine unlearning has become an essential requirement for privacy-preserving machine learning. However, its realization in decentralized federate...
- A Unified Shape-Aware Foundation Model for Time Series Classification : Abstract: Foundation models pre-trained on large-scale source datasets are reshaping the traditional training paradigm for time series classification. However, existing time series foundation models p...
- Teach Diffusion Language Models to Learn from Their Own Mistakes : Abstract: Masked Diffusion Language Models (DLMs) achieve significant speed by generating multiple tokens in parallel. However, this parallel sampling approach, especially when using fewer inference s...
- One-Shot Hierarchical Federated Clustering : Abstract: Driven by the growth of Web-scale decentralized services, Federated Clustering (FC) aims to extract knowledge from heterogeneous clients in an unsupervised manner while preserving the client...
- Hierarchical Pooling and Explainability in Graph Neural Networks for Tumor and Tissue-of-Origin Classification Using RNA-seq Data : Abstract: This study explores the use of graph neural networks (GNNs) with hierarchical pooling and multiple convolution layers for cancer classification based on RNA-seq data. We combine gene express...
- Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning : Abstract: Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage...
- A Fast and Effective Method for Euclidean Anticlustering: The Assignment-Based-Anticlustering Algorithm : Abstract: The anticlustering problem is to partition a set of objects into K equal-sized anticlusters such that the sum of distances within anticlusters is maximized. The anticlustering problem is NP-...
- Federated Learning and Class Imbalances : Abstract: Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, real-world FL deployments face critical challenges such as d...
- Evaluating Robustness of Large Language Models in Enterprise Applications: Benchmarks for Perturbation Consistency Across Formats and Languages : Abstract: Enterprise LLM applications require consistently high quality and reliable performance across diverse scenarios, demanding robustness to minor variations. Existing research shows that even s...
- Future-as-Label: Scalable Supervision from Real-World Outcomes : Abstract: Many real-world prediction problems lack labels observable at prediction time, creating a temporal gap between prediction and outcome that yields supervision only after events resolve. To ad...
- SourceNet: Interpretable Sim-to-Real Inference on Variable-Geometry Sensor Arrays for Earthquake Source Inversion : Abstract: Inferring high-dimensional physical states from sparse, ad-hoc sensor arrays is a fundamental challenge across AI for Science, as they are complicated by irregular geometries and the profoun...
- AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving : Abstract: Optimizing Large Language Model (LLM) inference in production systems is increasingly difficult due to dynamic workloads, stringent latency/throughput targets, and a rapidly expanding config...
- SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers : Abstract: Direct Preference Optimization (DPO) is a principled, scalable alternative to RLHF for aligning large language models from pairwise preferences, but its internal geometric footprint remains ...
- Triadic Concept Analysis for Logic Interpretation of Simple Artificial Networks : Abstract: An artificial neural network (ANN) is a numerical method used to solve complex classification problems. Due to its high classification power, the ANN method often outperforms other classific...
- When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics : Abstract: Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage disti...
- Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification : Abstract: Large language models (LLMs) exhibit exceptional performance but pose inherent risks of generating toxic content, restricting their safe deployment. While traditional methods (e.g., alignmen...
- LDTC: Lifelong deep temporal clustering for multivariate time series : Abstract: Clustering temporal and dynamically changing multivariate time series from real-world fields, called temporal clustering for short, has been a major challenge due to inherent complexities. A...
- Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space : Abstract: The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a si...
- CEEMDAN-Based Multiscale CNN for Wind Turbine Gearbox Fault Detection : Abstract: Wind turbines play a critical role in the shift toward sustainable energy generation. Their operation relies on multiple interconnected components, and a failure in any of these can compromi...
- Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling : Abstract: Protein-protein interaction (PPI) represents a central challenge within the biology field, and accurately predicting the consequences of mutations in this context is crucial for drug design ...
- Manifold-based Sampling for In-Context Hallucination Detection in Large Language Models : Abstract: Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval au...
- EntroLnn: Entropy-Guided Liquid Neural Networks for Operando Refinement of Battery Capacity Fade Trajectories : Abstract: Battery capacity degradation prediction has long been a central topic in battery health analytics, and most studies focus on state of health (SoH) estimation and end of life (EoL) prediction...
- MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications : Abstract: The proliferation of Large Language Models (LLMs) presents transformative potential for healthcare, yet practical deployment is hindered by the absence of frameworks that assess real-world c...
- TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC : Abstract: With the rapid growth of IoT devices and latency-sensitive applications, the demand for both real-time and energy-efficient computing has surged, placing significant pressure on traditional ...
- Time-Series Anomaly Classification for Launch Vehicle Propulsion Systems: Fast Statistical Detectors Enhancing LSTM Accuracy and Data Quality : Abstract: Supporting Go/No-Go decisions prior to launch requires assessing real-time telemetry data against redline limits established during the design qualification phase. Family data from ground te...
- Data-Driven Reduced-Complexity Modeling of Fluid Flows: A Community Challenge : Abstract: We introduce a community challenge designed to facilitate direct comparisons between data-driven methods for compression, forecasting, and sensing of complex aerospace flows. The challenge i...
- MixDPO: Modeling Preference Strength for Pluralistic Alignment : Abstract: Preference based alignment objectives implicitly assume that all human preferences are expressed with equal strength. In practice, however, preference strength varies across individuals and ...
- Parent-Guided Adaptive Reliability (PGAR): A Behavioural Meta-Learning Framework for Stable and Trustworthy AI : Abstract: Parent-Guided Adaptive Reliability (PGAR) is a lightweight behavioural meta-learning framework that adds a supervisory "parent" layer on top of a standard learner to improve stability, calib...
- Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models : Abstract: Text-to-image diffusion models have achieved remarkable progress, yet their use raises copyright and misuse concerns, prompting research into machine unlearning. However, extending multi-con...
- Can we Improve Prediction of Psychotherapy Outcomes Through Pretraining With Simulated Data? : Abstract: In the context of personalized medicine, machine learning algorithms are growing in popularity. These algorithms require substantial information, which can be acquired effectively through th...
- ECLIPTICA - A Framework for Switchable LLM Alignment via CITA - Contrastive Instruction-Tuned Alignment : Abstract: Alignment in large language models (LLMs) is still largely static: after training, the policy is frozen. DPO, GRPO methods typically imprint one behavior into the weights, leaving little run...
- PromptPort: A Reliability Layer for Cross-Model Structured Extraction : Abstract: Structured extraction with LLMs fails in production not because models lack understanding, but because output formatting is unreliable across models and prompts. A prompt that returns clean ...
- A Foundation Model Approach for Fetal Stress Prediction During Labor From cardiotocography (CTG) recordings : Abstract: Intrapartum cardiotocography (CTG) is widely used for fetal monitoring during labor, yet its interpretation suffers from high inter-observer variability and limited predictive accuracy. Deep...
- LLM Flow Processes for Text-Conditioned Regression : Abstract: Meta-learning methods for regression like Neural (Diffusion) Processes achieve impressive results, but with these models it can be difficult to incorporate expert prior knowledge and informa...
- Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations : Abstract: Cardiovascular disease (CVD) continues to be the major cause of death globally, calling for predictive models that not only handle diverse and high-dimensional biomedical signals but also ma...
- RainBalance: Alleviating Dual Imbalance in GNSS-based Precipitation Nowcasting via Continuous Probability Modeling : Abstract: Global navigation satellite systems (GNSS) station-based Precipitation Nowcasting aims to predict rainfall within the next 0-6 hours by leveraging a GNSS station's historical observations of...
- Attention in Geometry: Scalable Spatial Modeling via Adaptive Density Fields and FAISS-Accelerated Kernels : Abstract: This work introduces Adaptive Density Fields (ADF), a geometric attention framework that formulates spatial aggregation as a query-conditioned, metric-induced attention operator in continuou...
- DeeperBrain: A Neuro-Grounded EEG Foundation Model Towards Universal BCI : Abstract: Electroencephalography (EEG) foundation models hold significant promise for universal Brain-Computer Interfaces (BCIs). However, existing approaches often rely on end-to-end fine-tuning and ...
- A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control : Abstract: Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy ...
- AIS-CycleGen: A CycleGAN-Based Framework for High-Fidelity Synthetic AIS Data Generation and Augmentation : Abstract: Automatic Identification System (AIS) data are vital for maritime domain awareness, yet they often suffer from domain shifts, data sparsity, and class imbalance, which hinder the performance...
- Learning Minimally-Congested Drive Times from Sparse Open Networks: A Lightweight RF-Based Estimator for Urban Roadway Operations : Abstract: Accurate roadway travel-time prediction is foundational to transportation systems analysis, yet widespread reliance on either data-intensive congestion models or overly naïve heuristics limi...
- Latent Space Communication via K-V Cache Alignment : Abstract: Solving increasingly complex problems with large language models (LLMs) necessitates a move beyond individual models and towards multi-model systems that can effectively collaborate. While t...
- L2CU: Learning to Complement Unseen Users : Abstract: Recent research highlights the potential of machine learning models to learn to complement (L2C) human strengths; however, generalizing this capability to unseen users remains a significant ...
- Stress Testing Machine Learning at $10^{10}$ Scale: A Comprehensive Study of Adversarial Robustness on Algebraically Structured Integer Streams : Abstract: This paper presents a large-scale stress test of machine learning systems using structured mathematical data as a benchmark. We evaluate the robustness of tree-based classifiers at an unprec...
- GroupSegment-SHAP: Shapley Value Explanations with Group-Segment Players for Multivariate Time Series : Abstract: Multivariate time-series models achieve strong predictive performance in healthcare, industry, energy, and finance, but how they combine cross-variable interactions with temporal dynamics re...
- Judge Model for Large-scale Multimodality Benchmarks : Abstract: We propose a dedicated multimodal Judge Model designed to provide reliable, explainable evaluation across a diverse suite of tasks. Our benchmark spans text, audio, image, and video modaliti...
- Australian Bushfire Intelligence with AI-Driven Environmental Analytics : Abstract: Bushfires are among the most destructive natural hazards in Australia, causing significant ecological, economic, and social damage. Accurate prediction of bushfire intensity is therefore ess...
- The Impact of Post-training on Data Contamination : Abstract: We present a controlled study of how dataset contamination interacts with the post-training stages now standard in large language model training pipelines. Starting from clean checkpoints of...
- Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs : Abstract: We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as im...
- The Hessian of tall-skinny networks is easy to invert : Abstract: We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or it...
- Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking : Abstract: The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolut...
- CrossTrafficLLM: A Human-Centric Framework for Interpretable Traffic Intelligence via Large Language Model : Abstract: While accurate traffic forecasting is vital for Intelligent Transportation Systems (ITS), effectively communicating predicted conditions via natural language for human-centric decision suppo...
- Tree-Preconditioned Differentiable Optimization and Axioms as Layers : Abstract: This paper introduces a differentiable framework that embeds the axiomatic structure of Random Utility Models (RUM) directly into deep neural networks. Although projecting empirical choice d...
Research Sources: 592 | Generated: 1/13/2026
