AI RESEARCH PAPERS & ACADEMIC SOURCES
- Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings : Abstract: Given an image query, visually prompted keyword localisation (VPKL) aims to find occurrences of the depicted word in a speech collection. This can be useful when transcriptions are not avail...
- Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes : Abstract: Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns with geometry and triggers early...
- Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning : Abstract: Spatio-temporal video grounding (STVG) requires localizing a target object in untrimmed videos both temporally and spatially from natural language descriptions. Despite their strong language...
- DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models : Abstract: We introduce DiverseVAR, a framework that enhances the diversity of text-conditioned visual autoregressive models (VAR) at test time without requiring retraining, fine-tuning, or substantial...
- E-M3RF: An Equivariant Multimodal 3D Re-assembly Framework : Abstract: 3D reassembly is a fundamental geometric problem, and in recent years it has increasingly been challenged by deep learning methods rather than classical optimization. While learning approach...
- MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices : Abstract: Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the substantial computational complexi...
- CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation : Abstract: We propose Cross-Attention-based Non-local Knowledge Distillation (CanKD), a novel feature-based knowledge distillation framework that leverages cross-attention mechanisms to enhance the kno...
- Generalized Design Choices for Deepfake Detectors : Abstract: The effectiveness of deepfake detection methods often depends less on their core design and more on implementation details such as data preprocessing, augmentation strategies, and optimizati...
- Self-Paced Learning for Images of Antinuclear Antibodies : Abstract: Antinuclear antibody (ANA) testing is a crucial method for diagnosing autoimmune disorders, including lupus, Sjögren's syndrome, and scleroderma. Despite its importance, manual ANA detection...
- EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor? : Abstract: Recent advances in foundation models have shown great promise in domains such as natural language processing and computer vision, and similar efforts are now emerging in the Earth Observatio...
- The Age-specific Alzheimer 's Disease Prediction with Characteristic Constraints in Nonuniform Time Span : Abstract: Alzheimer's disease is a debilitating disorder marked by a decline in cognitive function. Timely identification of the disease is essential for the development of personalized treatment stra...
- Video Generation Models Are Good Latent Reward Models : Abstract: Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing v...
- UAVLight: A Benchmark for Illumination-Robust 3D Reconstruction in Unmanned Aerial Vehicle (UAV) Scenes : Abstract: Illumination inconsistency is a fundamental challenge in multi-view 3D reconstruction. Variations in sunlight direction, cloud cover, and shadows break the constant-lighting assumption under...
- Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss : Abstract: Automated landmark detection offers an efficient approach for medical professionals to understand patient anatomic structure and positioning using intra-operative imaging. While current dete...
- Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy : Abstract: The synthesis of synchronized audio-visual content is a key challenge in generative AI, with open-source models facing challenges in robust audio-video alignment. Our analysis reveals that t...
- Deep Learning-Based Multiclass Classification of Oral Lesions with Stratified Augmentation : Abstract: Oral cancer is highly common across the globe and is mostly diagnosed during the later stages due to the close visual similarity to benign, precancerous, and malignant lesions in the oral ca...
- MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training : Abstract: Video diffusion models achieve strong frame-level fidelity but still struggle with motion coherence, dynamics and realism, often producing jitter, ghosting, or implausible dynamics. A key li...
- ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images : Abstract: Interactive segmentation models such as the Segment Anything Model (SAM) have demonstrated remarkable generalization on natural images, but perform suboptimally on remote sensing imagery (RS...
- Active Learning for GCN-based Action Recognition : Abstract: Despite the notable success of graph convolutional networks (GCNs) in skeleton-based action recognition, their performance often depends on large volumes of labeled data, which are frequentl...
- CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow : Abstract: Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation. Long-term AQA, as in figure s...
- Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following : Abstract: Large multimodal models (LMMs) are increasingly adopted as judges in multimodal evaluation systems due to their strong instruction following and consistency with human preferences. However, ...
- Revolutionizing Glioma Segmentation & Grading Using 3D MRI - Guided Hybrid Deep Learning Models : Abstract: Gliomas are brain tumor types that have a high mortality rate which means early and accurate diagnosis is important for therapeutic intervention for the tumors. To address this difficulty, t...
- Seeing without Pixels: Perception from Camera Trajectories : Abstract: Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to systematically investigate this see...
- Canvas-to-Image: Compositional Image Generation with Multimodal Controls : Abstract: While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simult...
- A Fractional Variational Approach to Spectral Filtering Using the Fourier Transform : Abstract: The interference of fluorescence signals and noise remains a significant challenge in Raman spectrum analysis, often obscuring subtle spectral features that are critical for accurate analysi...
- Prompt-Aware Adaptive Elastic Weight Consolidation for Continual Learning in Medical Vision-Language Models : Abstract: Medical AI systems face catastrophic forgetting when deployed in clinical settings, where models must learn new imaging protocols while retaining prior diagnostic capabilities. This challeng...
- Automated Histopathologic Assessment of Hirschsprung Disease Using a Multi-Stage Vision Transformer Framework : Abstract: Hirschsprung Disease is characterized by the absence of ganglion cells in the myenteric plexus. Therefore, their correct identification is crucial for diagnosing Hirschsprung disease. We int...
- Deep Parameter Interpolation for Scalar Conditioning : Abstract: We propose deep parameter interpolation (DPI), a general-purpose method for transforming an existing deep neural network architecture into one that accepts an additional scalar input. Recent...
- AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios : Abstract: Referring Multi-Object Tracking (RMOT) aims to achieve precise object detection and tracking through natural language instructions, representing a fundamental capability for intelligent robo...
- STAR: Smartphone-analogous Typing in Augmented Reality : Abstract: While text entry is an essential and frequent task in Augmented Reality (AR) applications, devising an efficient and easy-to-use text entry method for AR remains an open challenge. This rese...
- AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control : Abstract: Sound effect editing-modifying audio by adding, removing, or replacing elements-remains constrained by existing approaches that rely solely on low-level signal processing or coarse text prom...
- Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale : Abstract: Recent advances in Large Language Models (LLMs) have transformed text-to-speech (TTS) synthesis, inspiring autoregressive frameworks that represent speech as sequences of discrete codec toke...
- Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects : Abstract: Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creation for BdSLT is of immense im...
- PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation : Abstract: Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is difficult when dealing with d...
- AMLP: Adjustable Masking Lesion Patches for Self-Supervised Medical Image Segmentation : Abstract: Self-supervised masked image modeling (MIM) methods have shown promising performances on analyzing natural images. However, directly applying such methods to medical image segmentation tasks...
- Restoration-Oriented Video Frame Interpolation with Region-Distinguishable Priors from SAM : Abstract: In existing restoration-oriented Video Frame Interpolation (VFI) approaches, the motion estimation between neighboring frames plays a crucial role. However, the estimation accuracy in existi...
- A Simple Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation : Abstract: Traffic signal control (TSC) is crucial for reducing traffic congestion leading to smoother traffic flow, reduced idle time, and mitigated CO2 emissions. In this paper, we explore the comput...
- Activator: GLU Activation Function as the Core Component of a Vision Transformer : Abstract: The transformer architecture has driven many successes in a variety of tasks within the field of deep learning, in particular the recent advances in natural language processing (NLP) culmina...
- Uncertainty Quantification for Visual Object Pose Estimation : Abstract: Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically ri...
- Open Vocabulary Monocular 3D Object Detection : Abstract: We propose and study open-vocabulary monocular 3D detection, a novel task that aims to detect objects of any categores in metric 3D space from a single RGB image. Existing 3D object detector...
- Active Negative Loss: A Robust Framework for Learning with Noisy Labels : Abstract: Deep supervised learning has achieved remarkable success across a wide range of tasks, yet it remains susceptible to overfitting when confronted with noisy labels. To address this issue, noi...
- Unsupervised Segmentation by Diffusing, Walking and Cutting : Abstract: We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models. Inspired by classic spectral clustering approaches, we construct adjacenc...
- Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy : Abstract: Creating realistic 3D objects and clothed avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D ...
- Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data : Abstract: Occlusion boundaries (OBs) geometrically localize occlusion events in 2D images and provide critical cues for scene understanding. In this paper, we present the first systematic study of Int...
- PG-ControlNet: A Physics-Guided ControlNet for Generative Spatially Varying Image Deblurring : Abstract: Spatially varying image deblurring remains a fundamentally ill-posed problem, especially when degradations arise from complex mixtures of motion and other forms of blur under significant noi...
- MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization : Abstract: Images evoke emotions that profoundly influence perception, often prioritized over content. Current Image Emotional Synthesis (IES) approaches artificially separate generation and editing ta...
- Long-Term Alzheimers Disease Prediction: A Novel Image Generation Method Using Temporal Parameter Estimation with Normal Inverse Gamma Distribution on Uneven Time Series : Abstract: Image generation can provide physicians with an imaging diagnosis basis in the prediction of Alzheimer's Disease (AD). Recent research has shown that long-term AD predictions by image genera...
- MIRA: Multimodal Iterative Reasoning Agent for Image Editing : Abstract: Instruction-guided image editing offers an intuitive way for users to edit images with natural language. However, diffusion-based editing models often struggle to accurately interpret comple...
- CLRecogEye : Curriculum Learning towards exploiting convolution features for Dynamic Iris Recognition : Abstract: Iris authentication algorithms have achieved impressive recognition performance, making them highly promising for real-world applications such as border control, citizen identification, and ...
- Scaling Foundation Models for Radar Scene Understanding : Abstract: Radar sensors provide reliable perception across adverse weather, lighting, and long-range conditions. Recent advances in foundation models have transformed visual and language understanding...
- EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens : Abstract: Efficient Multimodal Large Language Models (MLLMs) compress vision tokens to reduce resource consumption, but the loss of visual information can degrade comprehension capabilities. Although ...
- FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain : Abstract: In controllable driving-scene reconstruction and 3D scene generation, maintaining geometric fidelity while synthesizing visually plausible appearance under large viewpoint shifts is crucial....
- CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion : Abstract: We tackle the dual challenges of video understanding and controllable video generation within a unified diffusion framework. Our key insights are two-fold: geometry-only cues (e.g., depth, e...
- DeepRFTv2: Kernel-level Learning for Image Deblurring : Abstract: It is well-known that if a network aims to learn how to deblur, it should understand the blur process. Blurring is naturally caused by the convolution of the sharp image with the blur kernel...
- Referring Video Object Segmentation with Cross-Modality Proxy Queries : Abstract: Referring video object segmentation (RVOS) is an emerging cross-modality task that aims to generate pixel-level maps of the target objects referred by given textual expressions. The main con...
- TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models : Abstract: Text-to-Video (T2V) models are capable of synthesizing high-quality, temporally coherent dynamic video content, but the diverse generation also inherently introduces critical safety challeng...
- AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning : Abstract: Existing prompt learning methods, which are built upon CLIP models, leverage textual tokens as anchors to guide the learnable soft tokens. This guidance improves CLIP generalizations. Howeve...
- Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding : Abstract: Recent advances in 3D vision-language models (VLMs) highlight a strong potential for 3D scene understanding and reasoning. However, effectively tokenizing 3D scenes into holistic scene token...
- You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering : Abstract: Recent deep clustering models have produced impressive clustering performance. However, a common issue with existing methods is the disparity between global and local feature structures. Whi...
- Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition : Abstract: Fine-grained action recognition (FGAR) aims to identify subtle and distinctive differences among fine-grained action categories. However, current recognition methods often capture coarse-gra...
- 3-Tracer: A Tri-level Temporal-Aware Framework for Audio Forgery Detection and Localization : Abstract: Recently, partial audio forgery has emerged as a new form of audio manipulation. Attackers selectively modify partial but semantically critical frames while preserving the overall perceptual...
- FIELDS: Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision : Abstract: Facial expressions convey the bulk of emotional information in human communication, yet existing 3D face reconstruction methods often miss subtle affective details due to reliance on 2D supe...
- Shift-Equivariant Complex-Valued Convolutional Neural Networks : Abstract: Convolutional neural networks have shown remarkable performance in recent years on various computer vision problems. However, the traditional convolutional neural network architecture lacks ...
- AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs : Abstract: The threat of Audio-Video (AV) forgery is rapidly evolving beyond human-centric deepfakes to include more diverse manipulations across complex natural scenes. However, existing benchmarks ar...
- LaGen: Towards Autoregressive LiDAR Scene Generation : Abstract: Generative world models for autonomous driving (AD) have become a trending topic. Unlike the widely studied image modality, in this work we explore generative world models for LiDAR data. Ex...
- Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting : Abstract: Learning-based image matching critically depends on large-scale, diverse, and geometrically accurate training data. 3D Gaussian Splatting (3DGS) enables photorealistic novel-view synthesis a...
- Co-Training Vision Language Models for Remote Sensing Multi-task Learning : Abstract: With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks throu...
- PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery : Abstract: Achieving both high accuracy and topological continuity in road segmentation from satellite imagery is a critical goal for applications ranging from urban planning to disaster response. Stat...
- CaliTex: Geometry-Calibrated Attention for View-Coherent 3D Texture Generation : Abstract: Despite major advances brought by diffusion-based models, current 3D texture generation systems remain hindered by cross-view inconsistency -- textures that appear convincing from one viewpo...
- HTTM: Head-wise Temporal Token Merging for Faster VGGT : Abstract: The Visual Geometry Grounded Transformer (VGGT) marks a significant leap forward in 3D scene reconstruction, as it is the first model that directly infers all key 3D attributes (camera poses...
- A Connection Between Score Matching and Local Intrinsic Dimension : Abstract: The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historica...
- LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training : Abstract: Adversarial training is a widely adopted strategy to bolster the robustness of neural network models against adversarial attacks. This paper revisits the fundamental assumptions underlying i...
- Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data : Abstract: Federated learning is an emerging distributed machine learning framework aiming at protecting data privacy. Data heterogeneity is one of the core challenges in federated learning, which coul...
- A Catalyst Framework for the Quantum Linear System Problem via the Proximal Point Algorithm : Abstract: Solving systems of linear equations is a fundamental problem, but it can be computationally intensive for classical algorithms in high dimensions. Existing quantum algorithms can achieve exp...
- Superstate Quantum Mechanics : Abstract: We introduce Superstate Quantum Mechanics (SQM), a theory that considers states in Hilbert space subject to multiple quadratic constraints, with ``energy'' also expressed as a quadratic func...
- A Common Pipeline for Harmonizing Electronic Health Record Data for Translational Research : Abstract: Despite the growing availability of Electronic Health Record (EHR) data, researchers often face substantial barriers in effectively using these data for translational research due to their c...
- Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? : Abstract: Fast and accurate video object recognition, which relies on frame-by-frame video analytics, remains a challenge for resource-constrained devices such as traffic cameras. Recent advances in m...
- Text-Guided Semantic Image Encoder : Abstract: Image encoders, a fundamental component of vision-language models (VLMs), are typically pretrained independently before being aligned with a language model. This standard paradigm results in...
- One Patch is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues : Abstract: Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation, and material perception. However, most existing methods rely on dense or full-sc...
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling : Abstract: Large multimodal models (LMMs) have shown great potential for video reasoning with textual Chain-of-Thought. However, they remain vulnerable to hallucinations, especially when processing lon...
- Intriguing Properties of Dynamic Sampling Networks : Abstract: Dynamic sampling mechanisms in deep learning architectures have demonstrated utility across many computer vision models, though the theoretical analysis of these structures has not yet been ...
- Layer-Aware Video Composition via Split-then-Merge : Abstract: We present Split-then-Merge (StM), a novel framework designed to enhance control in generative video composition and address its data scarcity problem. Unlike conventional methods relying on...
- Estimating Fog Parameters from a Sequence of Stereo Images : Abstract: We propose a method which, given a sequence of stereo foggy images, estimates the parameters of a fog model and updates them dynamically. In contrast with previous approaches, which estimate...
- V$^{2}$-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence : Abstract: Cross-view object correspondence, exemplified by the representative task of ego-exo object correspondence, aims to establish consistent associations of the same object across different viewp...
- GaINeR: Geometry-Aware Implicit Network Representation : Abstract: Implicit Neural Representations (INRs) have become an essential tool for modeling continuous 2D images, enabling high-fidelity reconstruction, super-resolution, and compression. Popular arch...
- A deep learning model to reduce agent dose for contrast-enhanced MRI of the cerebellopontine angle cistern : Abstract: Objectives: To evaluate a deep learning (DL) model for reducing the agent dose of contrast-enhanced T1-weighted MRI (T1ce) of the cerebellopontine angle (CPA) cistern. Materials and methods:...
- Smooth regularization for efficient video recognition : Abstract: We propose a smooth regularization technique that instills a strong temporal inductive bias in video recognition models, particularly benefiting lightweight architectures. Our method encoura...
- UruDendro4: A Benchmark Dataset for Automatic Tree-Ring Detection in Cross-Section Images of Pinus taeda L : Abstract: Tree-ring growth represents the annual wood increment for a tree, and quantifying it allows researchers to assess which silvicultural practices are best suited for each species. Manual measu...
- Beyond Realism: Learning the Art of Expressive Composition with StickerNet : Abstract: As a widely used operation in image editing workflows, image composition has traditionally been studied with a focus on achieving visual realism and semantic plausibility. However, in practi...
- TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs : Abstract: Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforcement capabilities, traffic ma...
- Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI : Abstract: Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct pati...
- Inversion-Free Style Transfer with Dual Rectified Flows : Abstract: Style transfer, a pivotal task in image processing, synthesizes visually compelling images by seamlessly blending realistic content with artistic styles, enabling applications in photo editi...
- RefOnce: Distilling References into a Prototype Memory for Referring Camouflaged Object Detection : Abstract: Referring Camouflaged Object Detection (Ref-COD) segments specified camouflaged objects in a scene by leveraging a small set of referring images. Though effective, current systems adopt a du...
- From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition : Abstract: Images can be viewed as layered compositions, foreground objects over background, with potential occlusions. This layered representation enables independent editing of elements, offering gre...
- MetaRank: Task-Aware Metric Selection for Model Transferability Estimation : Abstract: Selecting an appropriate pre-trained source model is a critical, yet computationally expensive, task in transfer learning. Model Transferability Estimation (MTE) methods address this by prov...
- CameraMaster: Unified Camera Semantic-Parameter Control for Photography Retouching : Abstract: Text-guided diffusion models have greatly advanced image editing and generation. However, achieving physically consistent image retouching with precise parameter control (e.g., exposure, whi...
- CaptionQA: Is Your Caption as Useful as the Image Itself? : Abstract: Image captions serve as efficient surrogates for visual content in multimodal systems such as retrieval, recommendation, and multi-step agentic inference pipelines. Yet current evaluation pr...
- FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation : Abstract: Music-to-dance generation aims to translate auditory signals into expressive human motion, with broad applications in virtual reality, choreography, and digital entertainment. Despite promis...
- LungNoduleAgent: A Collaborative Multi-Agent System for Precision Diagnosis of Lung Nodules : Abstract: Diagnosing lung cancer typically involves physicians identifying lung nodules in Computed tomography (CT) scans and generating diagnostic reports based on their morphological features and me...
- Efficient Diffusion Planning with Temporal Diffusion : Abstract: Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous ...
- A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs : Abstract: Offline data selection and online self-refining generation, which enhance the data quality, are crucial steps in adapting large language models (LLMs) to specific downstream tasks. We tackle...
- G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks : Abstract: We propose a novel randomized algorithm for constructing binary neural networks with tunable accuracy. This approach is motivated by hyperdimensional computing (HDC), which is a brain-inspir...
- Deceptron: Learned Local Inverses for Fast and Stable Physics Inversion : Abstract: Inverse problems in the physical sciences are often ill-conditioned in input space, making progress step-size sensitive. We propose the Deceptron, a lightweight bidirectional module that lea...
- Generative Early Stage Ranking : Abstract: Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item d...
- BRIDGE: Building Representations In Domain Guided Program Verification : Abstract: Large language models (LLMs) have achieved impressive results in code generation, yet struggle with program verification, especially in interactive proof frameworks such as Lean4. A central ...
- Interpretable Fair Clustering : Abstract: Fair clustering has gained increasing attention in recent years, especially in applications involving socially sensitive attributes. However, existing fair clustering methods often lack inte...
- Trustless Federated Learning at Edge-Scale: A Compositional Architecture for Decentralized, Verifiable, and Incentive-Aligned Coordination : Abstract: Artificial intelligence is retracing the Internet's path from centralized provision to distributed creation. Initially, resource-intensive computation concentrates within institutions capabl...
- How to Correctly Report LLM-as-a-Judge Evaluations : Abstract: Large language models (LLMs) are increasingly used as evaluators in lieu of humans. While scalable, their judgments are noisy due to imperfect specificity and sensitivity of LLMs, leading to...
- I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation : Abstract: Accurate remaining useful life (RUL) prediction hinges on the quality of health indicators (HIs), yet existing methods often fail to disentangle complex degradation mechanisms in multi-senso...
- Robust Gene Prioritization via Fast-mRMR Feature Selection in high-dimensional omics data : Abstract: Gene prioritization (identifying genes potentially associated with a biological process) is increasingly tackled with Artificial Intelligence. However, existing methods struggle with the hig...
- A Physics-Informed U-net-LSTM Network for Data-Driven Seismic Response Modeling of Structures : Abstract: Accurate and efficient seismic response prediction is essential for the design of resilient structures. While the Finite Element Method (FEM) remains the standard for nonlinear seismic analy...
- Sawtooth Sampling for Time Series Denoising Diffusion Implicit Models : Abstract: Denoising Diffusion Probabilistic Models (DDPMs) can generate synthetic timeseries data to help improve the performance of a classifier, but their sampling process is computationally expensi...
- TSGM: Regular and Irregular Time-series Generation using Score-based Generative Models : Abstract: Score-based generative models (SGMs) have demonstrated unparalleled sampling quality and diversity in numerous fields, such as image generation, voice synthesis, and tabular data synthesis, ...
- Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models : Abstract: Masked Diffusion Language Models (MDLMs) have recently emerged as a promising alternative to Autoregressive Language Models (ARLMs), leveraging a denoising objective that, in principle, shou...
- Best Practices for Machine Learning Experimentation in Scientific Applications : Abstract: Machine learning (ML) is increasingly adopted in scientific research, yet the quality and reliability of results often depend on how experiments are designed and documented. Poor baselines, ...
- BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla : Abstract: Natural disasters remain a major challenge for Bangladesh, so real-time monitoring and quick response systems are essential. In this study, we present BanglaMM-Disaster, an end-to-end deep l...
- Controlling changes to attention logits : Abstract: Stability of neural network weights is critical when training transformer models. The query and key weights are particularly problematic, as they tend to grow large without any intervention....
- BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning : Abstract: Aspect-Based Sentiment Analysis (ABSA) has emerged as a critical tool for extracting fine-grained sentiment insights from user-generated content, particularly in e-commerce and social media ...
- SUPN: Shallow Universal Polynomial Networks : Abstract: Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a ...
- Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams : Abstract: Ensemble learning improves classification performance by combining multiple base classifiers. While increasing the number of classifiers generally enhances accuracy, excessively large ensemb...
- Mean-Field Limits for Two-Layer Neural Networks Trained with Consensus-Based Optimization : Abstract: We study two-layer neural networks and train these with a particle-based method called consensus-based optimization (CBO). We compare the performance of CBO against Adam on two test cases an...
- Lost in Time? A Meta-Learning Framework for Time-Shift-Tolerant Physiological Signal Transformation : Abstract: Translating non-invasive signals such as photoplethysmography (PPG) and ballistocardiography (BCG) into clinically meaningful signals like arterial blood pressure (ABP) is vital for continuo...
- IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference : Abstract: Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the soft...
- Context-Specific Causal Graph Discovery with Unobserved Contexts: Non-Stationarity, Regimes and Spatio-Temporal Patterns : Abstract: Real-world data, for example in climate applications, often consists of spatially gridded time series data or data with comparable structure. While the underlying system is often believed to...
- Computing Strategic Responses to Non-Linear Classifiers : Abstract: We consider the problem of strategic classification, where the act of deploying a classifier leads to strategic behaviour that induces a distribution shift on subsequent observations. Curren...
- Machine Learning Approaches to Clinical Risk Prediction: Multi-Scale Temporal Alignment in Electronic Health Records : Abstract: This study proposes a risk prediction method based on a Multi-Scale Temporal Alignment Network (MSTAN) to address the challenges of temporal irregularity, sampling interval differences, and ...
- A decoupled alignment kernel for peptide membrane permeability predictions : Abstract: Cyclic peptides are promising modalities for targeting intracellular sites; however, cell-membrane permeability remains a key bottleneck, exacerbated by limited public data and the need for ...
- Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning : Abstract: Latent reasoning represents a new development in Transformer language models that has shown potential in compressing reasoning lengths compared to chain-of-thought reasoning. By directly pas...
- An AI-Enabled Hybrid Cyber-Physical Framework for Adaptive Control in Smart Grids : Abstract: Smart grids are a fusion of classical power infrastructure and advanced communication networks and smart control, to create a cyber-physical environment that is more efficient and flexible t...
- Visualizing LLM Latent Space Geometry Through Dimensionality Reduction : Abstract: Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, proce...
- Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO : Abstract: Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI marketing or sales agents who fac...
- EvilGenie: A Reward Hacking Benchmark : Abstract: We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such a...
- DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving : Abstract: Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniq...
- Harmonic Token Projection (HTP): A Vocabulary-Free, Training-Free, Deterministic, and Reversible Embedding Methodology : Abstract: This paper introduces the Harmonic Token Projection (HTP), a reversible and deterministic framework for generating text embeddings without training, vocabularies, or stochastic parameters. U...
- Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms : Abstract: This paper proposes a reinforcement learning--based framework for cryptocurrency portfolio management using the Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorit...
- Dual-Domain Deep Learning Method to Accelerate Local Basis Functions Computation for Reservoir Simulation in High-Contrast Porous Media : Abstract: In energy science, Darcy flow in heterogeneous porous media is a central problem in reservoir sim-ulation. However, the pronounced multiscale characteristics of such media pose significant c...
- The Human Brain as a Combinatorial Complex : Abstract: We propose a framework for constructing combinatorial complexes (CCs) from fMRI time series data that captures both pairwise and higher-order neural interactions through information-theoreti...
- A Set of Rules for Model Validation : Abstract: The validation of a data-driven model is the process of assessing the model's ability to generalize to new, unseen data in the population of interest. This paper proposes a set of general ru...
- $\Delta$-NeRF: Incremental Refinement of Neural Radiance Fields through Residual Control and Knowledge Transfer : Abstract: Neural Radiance Fields (NeRFs) have demonstrated remarkable capabilities in 3D reconstruction and novel view synthesis. However, most existing NeRF frameworks require complete retraining whe...
- Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks : Abstract: Sparse Convolution (SpC) powers 3D point cloud networks widely used in autonomous driving and AR/VR. SpC builds a kernel map that stores mappings between input voxel coordinates, output coor...
- When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing : Abstract: Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for is...
- A review on data fusion in multimodal learning analytics and educational data mining : Abstract: The new educational models such as smart learning environments use of digital and context-aware devices to facilitate the learning process. In this new educational scenario, a huge quantity ...
- Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets : Abstract: This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging su...
- Readout-Side Bypass for Residual Hybrid Quantum-Classical Models : Abstract: Quantum machine learning (QML) promises compact and expressive representations, but suffers from the measurement bottleneck - a narrow quantum-to-classical readout that limits performance an...
- Fusion of classical and quantum kernels enables accurate and robust two-sample tests : Abstract: Two-sample tests have been extensively employed in various scientific fields and machine learning such as evaluation on the effectiveness of drugs and A/B testing on different marketing stra...
- Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification : Abstract: Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain. We develop a geometric framework for post-hoc calibration of neural network probability...
- Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via $50,000 Kaggle Competition : Abstract: Subgrid machine-learning (ML) parameterizations have the potential to introduce a new generation of climate models that incorporate the effects of higher-resolution physics without incurring...
- RosettaSpeech: Zero-Shot Speech-to-Speech Translation from Monolingual Data : Abstract: The scarcity of parallel speech corpora critically hampers speech-to-speech translation (S2ST), often forcing reliance on complex, multi-stage pipelines. This paper introduces RosettaSpeech,...
- Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems : Abstract: Efficiency and reliability are both crucial for energy management, especially in multi-microgrid systems (MMSs) integrating intermittent and distributed renewable energy sources. This study ...
- Wavefront-Constrained Passive Obscured Object Detection : Abstract: Accurately localizing and segmenting obscured objects from faint light patterns beyond the field of view is highly challenging due to multiple scattering and medium-induced perturbations. Mo...
- ASR Error Correction in Low-Resource Burmese with Alignment-Enhanced Transformers using Phonetic Features : Abstract: This paper investigates sequence-to-sequence Transformer models for automatic speech recognition (ASR) error correction in low-resource Burmese, focusing on different feature integration str...
- MortgageLLM: Domain-Adaptive Pretraining with Residual Instruction Transfer, Alignment Tuning, and Task-Specific Routing : Abstract: Large Language Models (LLMs) demonstrate exceptional capabilities across general domains, yet their application to specialized sectors such as mortgage finance requires domain-specific knowl...
- Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms : Abstract: This paper investigates the partial linear model by Least Absolute Deviation (LAD) regression. We parameterize the nonparametric term using Deep Neural Networks (DNNs) and formulate a penali...
- Lattice-to-total thermal conductivity ratio: a phonon-glass electron-crystal descriptor for data-driven thermoelectric design : Abstract: Thermoelectrics (TEs) are promising candidates for energy harvesting with performance quantified by figure of merit, $ZT$. To accelerate the discovery of high-$ZT$ materials, efforts have fo...
- From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting : Abstract: We present a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow. While DDP...
- Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference : Abstract: Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models that would otherwise be intractable. However, its formulation depend...
- RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI : Abstract: The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs). While lightweight archite...
- The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval : Abstract: This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classica...
- Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso : Abstract: Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of intere...
- On the Periodic Orbits of the Dual Logarithmic Derivative Operator : Abstract: We study the periodic behaviour of the dual logarithmic derivative operator $\mathcal{A}[f]=\mathrm{d}\ln f/\mathrm{d}\ln x$ in a complex analytic setting. We show that $\mathcal{A}$ admits ...
- Phase-Aware Code-Aided EM Algorithm for Blind Channel Estimation in PSK-Modulated OFDM : Abstract: This paper presents a fully blind phase-aware expectation-maximization (EM) algorithm for OFDM systems with the phase-shift keying (PSK) modulation. We address the well-known local maximum p...
- Learning Multi-Order Block Structure in Higher-Order Networks : Abstract: Higher-order networks, naturally described as hypergraphs, are essential for modeling real-world systems involving interactions among three or more entities. Stochastic block models offer a ...
- Differentiable Physics-Neural Models enable Learning of Non-Markovian Closures for Accelerated Coarse-Grained Physics Simulations : Abstract: Numerical simulations provide key insights into many physical, real-world problems. However, while these simulations are solved on a full 3D domain, most analysis only require a reduced set ...
- Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning : Abstract: Text-attributed graphs require models to effectively combine strong textual understanding with structurally informed reasoning. Existing approaches either rely on GNNs--limited by over-smoot...
- A Systematic Study of Model Merging Techniques in Large Language Models : Abstract: Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performanc...
- Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities (II) : Abstract: A fundamental theoretical question in network analysis is to determine under which conditions community recovery is possible in polynomial time in the Stochastic Block Model (SBM). When the ...
- MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors : Abstract: Human activity recognition (HAR) from inertial sensors is essential for ubiquitous computing, mobile health, and ambient intelligence. Conventional deep models such as Convolutional Neural N...
- TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data : Abstract: The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data...
- Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation : Abstract: Handling missing data is a central challenge in data-driven analysis. Modern imputation methods not only aim for accurate reconstruction but also differ in how they represent and quantify un...
- On Evolution-Based Models for Experimentation Under Interference : Abstract: Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or soci...
- TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos : Abstract: Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are a...
- Single- vs. Dual-Policy Reinforcement Learning for Dynamic Bike Rebalancing : Abstract: Bike-sharing systems (BSS) provide a sustainable urban mobility solution, but ensuring their reliability requires effective rebalancing strategies to address stochastic demand and prevent st...
- Federated Learning: A Stochastic Approximation Approach : Abstract: This paper considers the Federated learning (FL) in a stochastic approximation (SA) framework. Here, each client $i$ trains a local model using its dataset $\mathcal{D}^{(i)}$ and periodical...
- CTSyn: A Foundation Model for Cross Tabular Data Generation : Abstract: Generative Foundation Models (GFMs) have achieved remarkable success in producing high-quality synthetic data for images and text. However, their application to tabular data presents signifi...
- Federated Large Language Models: Current Progress and Future Directions : Abstract: Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during d...
- On the Effectiveness of Adversarial Training on Malware Classifiers : Abstract: Adversarial Training (AT) is a key defense against Machine Learning evasion attacks, but its effectiveness for real-world malware detection remains poorly understood. This uncertainty stems ...
- G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning : Abstract: Vision-Language Models (VLMs) still lack robustness in spatial intelligence, demonstrating poor performance on spatial understanding and reasoning tasks. We attribute this gap to the absence...
- ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration : Abstract: Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationall...
- Revisiting Generalization Across Difficulty Levels: It's Not So Easy : Abstract: We investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed rega...
- Earth Observation Satellite Scheduling with Graph Neural Networks and Monte Carlo Tree Search : Abstract: Earth Observation Satellite Planning (EOSP) is a difficult optimization problem with considerable practical interest. A set of requested observations must be scheduled on an agile Earth obse...
- Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness : Abstract: The rapid growth of the low-altitude economy has driven the widespread adoption of unmanned aerial vehicles (UAVs). This growing deployment presents new challenges for UAV trajectory plannin...
- Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy : Abstract: Self-assessment is a key aspect of reliable intelligence, yet evaluations of large language models (LLMs) focus mainly on task accuracy. We adapted the 10-item General Self-Efficacy Scale (G...
- Failure Modes in LLM Systems: A System-Level Taxonomy for Reliable AI Applications : Abstract: Large language models (LLMs) are being rapidly integrated into decision-support tools, automation workflows, and AI-enabled software systems. However, their behavior in production environmen...
- Universe of Thoughts: Enabling Creative Reasoning with Large Language Models : Abstract: Reasoning based on Large Language Models (LLMs) has garnered increasing attention due to outstanding performance of these models in mathematical and complex logical tasks. Beginning with the...
- FRAGMENTA: End-to-end Fragmentation-based Generative Model with Agentic Tuning for Drug Lead Optimization : Abstract: Molecule generation using generative AI is vital for drug discovery, yet class-specific datasets often contain fewer than 100 training examples. While fragment-based models handle limited da...
- ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition : Abstract: Prototypical part network (ProtoPNet) has drawn wide attention and boosted many follow-up studies due to its self-explanatory property for explainable artificial intelligence (XAI). However,...
- Dual-Balancing for Multi-Task Learning : Abstract: Multi-task learning aims to learn multiple related tasks simultaneously and has achieved great success in various fields. However, the disparity in loss and gradient scales among tasks often...
- Natural Strategic Ability in Stochastic Multi-Agent Systems : Abstract: Strategies synthesized using formal methods can be complex and often require infinite memory, which does not correspond to the expected behavior when trying to model Multi-Agent Systems (MAS...
- Data Valuation by Fusing Global and Local Statistical Information : Abstract: Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications. Among diverse data valuation approaches, Shapley value...
- Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs : Abstract: Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby ...
- Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness : Abstract: As deep neural networks (DNNs) are increasingly deployed in sensitive applications, ensuring their security and robustness has become critical. A major threat to DNNs arises from adversarial...
- SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition : Abstract: High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video sa...
- CoxKAN: Kolmogorov-Arnold Networks for Interpretable, High-Performance Survival Analysis : Abstract: Motivation: Survival analysis is a branch of statistics that is crucial in medicine for modeling the time to critical events such as death or relapse, in order to improve treatment strategie...
- Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models : Abstract: Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a nove...
- Human Experts' Evaluation of Generative AI for Contextualizing STEAM Education in the Global South : Abstract: This study investigates how human experts evaluate the capacity of Generative AI (GenAI) to contextualize STEAM education in the Global South, with a focus on Ghana. Using a convergent mixed...
- R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation : Abstract: Repairing RTL bugs is crucial for hardware design and verification. Traditional automatic program repair (APR) methods define dedicated search spaces to locate and fix bugs with program synt...
- LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework : Abstract: Unit testing is an essential but resource-intensive step in software development, ensuring individual code units function correctly. This paper introduces AgoneTest, an automated evaluation ...
- Pretraining Transformer-Based Models on Diffusion-Generated Synthetic Graphs for Alzheimer's Disease Prediction : Abstract: Early and accurate detection of Alzheimer's disease (AD) is crucial for enabling timely intervention and improving outcomes. However, developing reliable machine learning (ML) models for AD ...
- CHiQPM: Calibrated Hierarchical Interpretable Image Classification : Abstract: Globally interpretable models are a promising approach for trustworthy AI in safety-critical domains. Alongside global explanations, detailed local explanations are a crucial complement to e...
- Effects of Initialization Biases on Deep Neural Network Training Dynamics : Abstract: Untrained large neural networks, just after random initialization, tend to favour a small subset of classes, assigning high predicted probabilities to these few classes and approximately zer...
- Autoregressive Surrogate Modeling of the Solar Wind with Spherical Fourier Neural Operator : Abstract: The solar wind, a continuous outflow of charged particles from the Sun's corona, shapes the heliosphere and impacts space systems near Earth. Accurate prediction of features such as high-spe...
- Representation Integrity in Temporal Graph Learning Methods : Abstract: Real-world systems ranging from airline routes to cryptocurrency transfers are naturally modelled as dynamic graphs whose topology changes over time. Conventional benchmarks judge dynamic-gr...
- Probabilistic Hash Embeddings for Online Learning of Categorical Features : Abstract: We study streaming data with categorical features where the vocabulary of categorical feature values is changing and can even grow unboundedly over time. Feature hashing is commonly used as ...
- Operationalizing Quantized Disentanglement : Abstract: Recent theoretical work established the unsupervised identifiability of quantized factors under any diffeomorphism. The theory assumes that quantization thresholds correspond to axis-aligned...
- Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection : Abstract: Business Email Compromise (BEC) is a sophisticated social engineering threat that manipulates organizational hierarchies and exploits psychological vulnerabilities, leading to significant fi...
- Dataset Poisoning Attacks on Behavioral Cloning Policies : Abstract: Behavior Cloning (BC) is a popular framework for training sequential decision policies from expert demonstrations via supervised learning. As these policies are increasingly being deployed i...
- Estimating Ising Models in Total Variation Distance : Abstract: We consider the problem of estimating Ising models over $n$ variables in Total Variation (TV) distance, given $l$ independent samples from the model. While the statistical complexity of the ...
- ChatGpt Content detection: A new approach using xlm-roberta alignment : Abstract: The challenge of separating AI-generated text from human-authored content is becoming more urgent as generative AI technologies like ChatGPT become more widely available. In this work, we ad...
- Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning : Abstract: Massively parallel GPU simulation environments have accelerated reinforcement learning (RL) research by enabling fast data collection for on-policy RL algorithms like Proximal Policy Optimiz...
- Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression : Abstract: As efficient alternatives to softmax Attention, linear state-space models (SSMs) achieve constant memory and linear compute, but maintain only a lossy, fading summary of the past, often lead...
- A Probabilistic Framework for Temporal Distribution Generalization in Industry-Scale Recommender Systems : Abstract: Temporal distribution shift (TDS) erodes the long-term accuracy of recommender systems, yet industrial practice still relies on periodic incremental training, which struggles to capture both...
- Prediction of Herd Life in Dairy Cows Using Multi-Head Attention Transformers : Abstract: Dairy farmers should decide to keep or cull a cow based on an objective assessment of her likely performance in the herd. For this purpose, farmers need to identify more resilient cows, whic...
- RAVQ-HoloNet: Rate-Adaptive Vector-Quantized Hologram Compression : Abstract: Holography offers significant potential for AR/VR applications, yet its adoption is limited by the high demands of data compression. Existing deep learning approaches generally lack rate ada...
- CNN-LSTM Hybrid Architecture for Over-the-Air Automatic Modulation Classification Using SDR : Abstract: Automatic Modulation Classification (AMC) is a core technology for future wireless communication systems, enabling the identification of modulation schemes without prior knowledge. This capa...
- Tool-RoCo: An Agent-as-Tool Self-organization Large Language Model Benchmark in Multi-robot Cooperation : Abstract: This study proposes Tool-RoCo, a novel benchmark for evaluating large language models (LLMs) in long-term multi-agent cooperation based on RoCo, a multi-robot cooperative benchmark. Recent r...
- Mechanistic Interpretability for Transformer-based Time Series Classification : Abstract: Transformer-based models have become state-of-the-art tools in various machine learning tasks, including time series classification, yet their complexity makes understanding their internal d...
- Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation : Abstract: Unlike text, speech conveys information about the speaker, such as gender, through acoustic cues like pitch. This gives rise to modality-specific bias concerns. For example, in speech transl...
- Predictive Safety Shield for Dyna-Q Reinforcement Learning : Abstract: Obtaining safety guarantees for reinforcement learning is a major challenge to achieve applicability for real-world tasks. Safety shields extend standard reinforcement learning and achieve h...
- VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation : Abstract: Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approa...
- BAMAS: Structuring Budget-Aware Multi-Agent Systems : Abstract: Large language model (LLM)-based multi-agent systems have emerged as a powerful paradigm for enabling autonomous agents to solve complex tasks. As these systems scale in complexity, cost bec...
- Multimodal Robust Prompt Distillation for 3D Point Cloud Models : Abstract: Adversarial attacks pose a significant threat to learning-based 3D point cloud models, critically undermining their reliability in security-sensitive applications. Existing defense methods o...
- HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal : Abstract: The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated aud...
- Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving : Abstract: End-to-end (E2E) autonomous driving models have demonstrated strong performance in open-loop evaluations but often suffer from cascading errors and poor generalization in closed-loop setting...
- Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining : Abstract: Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-...
- On the Origin of Algorithmic Progress in AI : Abstract: Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovati...
- Scale-Agnostic Kolmogorov-Arnold Geometry in Neural Networks : Abstract: Recent work by Freedman and Mulligan demonstrated that shallow multilayer perceptrons spontaneously develop Kolmogorov-Arnold geometric (KAG) structure during training on synthetic three-dim...
- Qwen3-VL Technical Report : Abstract: We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports i...
- Mechanisms of Non-Monotonic Scaling in Vision Transformers : Abstract: Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-S, ViT-B, and ViT-L on ImageNe...
- Continual Error Correction on Low-Resource Devices : Abstract: The proliferation of AI models in everyday devices has highlighted a critical challenge: prediction errors that degrade user experience. While existing solutions focus on error detection, th...
- Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models : Abstract: In recent years, Vision-Language-Action (VLA) models in embodied intelligence have developed rapidly. However, existing adversarial attack methods require costly end-to-end training and ofte...
- Escaping the Verifier: Learning to Reason via Demonstrations : Abstract: Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, d...
- Through the telecom lens: Are all training samples important? : Abstract: The rise of AI in telecommunications, from optimizing Radio Access Networks to managing user experience, has sharply increased data volumes and training demands. Telecom data is often noisy,...
- Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework : Abstract: Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require ...
- Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge : Abstract: Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. Th...
- Spatio-Temporal Trajectory Foundation Model - Recent Advances and Future Directions : Abstract: Foundation models (FMs) have emerged as a powerful paradigm, enabling a diverse range of data analytics and knowledge discovery tasks across scientific fields. Inspired by the success of FMs...
- Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities : Abstract: The increasing availability of data and advancements in computational intelligence have accelerated the adoption of data-driven methods (DDMs) in product development. However, their integrat...
- InvisibleBench: A Deployment Gate for Caregiving Relationship AI : Abstract: InvisibleBench is a deployment gate for caregiving-relationship AI, evaluating 3-20+ turn interactions across five dimensions: Safety, Compliance, Trauma-Informed Design, Belonging/Cultural ...
- Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts : Abstract: Large language models (LLMs) are now deployed at unprecedented scale, assisting millions of users in daily tasks. However, the risk of these models assisting unlawful activities remains unde...
- CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design : Abstract: User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision language models ...
- Adversarial Multi-Task Learning for Liver Tumor Segmentation, Dynamic Enhancement Regression, and Classification : Abstract: Liver tumor segmentation, dynamic enhancement regression, and classification are critical for clinical assessment and diagnosis. However, no prior work has attempted to achieve these tasks s...
- Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models : Abstract: Facebook AI Research introduced KRISP [4], which integrates structured external knowledge into pipelines for vision-language reasoning. Despite its effectiveness, the original model has been...
- Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model : Abstract: Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct...
- Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models : Abstract: Large language models, trained on massive corpora, are prone to verbatim memorization of training data, creating significant privacy and copyright risks. While previous works have proposed v...
- Conformal Safety Monitoring for Flight Testing: A Case Study in Data-Driven Safety Learning : Abstract: We develop a data-driven approach for runtime safety monitoring in flight testing, where pilots perform maneuvers on aircraft with uncertain parameters. Because safety violations can arise u...
- SPHINX: A Synthetic Environment for Visual Perception and Reasoning : Abstract: We present Sphinx, a synthetic environment for visual perception and reasoning that targets core cognitive primitives. Sphinx procedurally generates puzzles using motifs, tiles, charts, icon...
- Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion : Abstract: Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visu...
- RefTr: Recurrent Refinement of Confluent Trajectories for 3D Vascular Tree Centerline Graphs : Abstract: Tubular trees, such as blood vessels and lung airways, are essential for material transport within the human body. Accurately detecting their centerlines with correct tree topology is critic...
- Structured Prompting Enables More Robust, Holistic Evaluation of Language Models : Abstract: As language models (LMs) are increasingly adopted across domains, high-quality benchmarking frameworks that accurately estimate performance are essential for guiding deployment decisions. Wh...
- Primal: A Unified Deterministic Framework for Quasi-Orthogonal Hashing and Manifold Learning : Abstract: We present Primal, a deterministic feature mapping framework that harnesses the number-theoretic independence of prime square roots to construct robust, tunable vector representations. Diver...
- A Review of Pseudospectral Optimal Control: From Theory to Flight : Abstract: The home space for optimal control is a Sobolev space. The home space for pseudospectral theory is also a Sobolev space. It thus seems natural to combine pseudospectral theory with optimal c...
- Pre-train to Gain: Robust Learning Without Clean Labels : Abstract: Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely o...
- NOIR 2.0: Neural Signal Operated Intelligent Robots for Everyday Activities : Abstract: Neural Signal Operated Intelligent Robots (NOIR) system is a versatile brain-robot interface that allows humans to control robots for daily tasks using their brain signals. This interface ut...
- Length-MAX Tokenizer for Language Models : Abstract: We introduce a new tokenizer for language models that minimizes the average tokens per character, thereby reducing the number of tokens needed to represent text during training and to genera...
- MODEST: Multi-Optics Depth-of-Field Stereo Dataset : Abstract: Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in de...
- Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries : Abstract: Visual content memorability has intrigued the scientific community for decades, with applications ranging widely, from understanding nuanced aspects of human memory to enhancing content desi...
- Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory : Abstract: Statefulness is essential for large language model (LLM) agents to perform long-term planning and problem-solving. This makes memory a critical component, yet its management and evolution re...
- Computing Evolutionarily Stable Strategies in Multiplayer Games : Abstract: We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
- Selecting Belief-State Approximations in Simulators with Latent States : Abstract: State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables ...
- Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation : Abstract: Test-time alignment (TTA) aims to adapt models to specific rewards during inference. However, existing methods tend to either under-optimise or over-optimise (reward hack) the target reward ...
- A Taxonomy of Pix Fraud in Brazil: Attack Methodologies, AI-Driven Amplification, and Defensive Strategies : Abstract: This work presents a review of attack methodologies targeting Pix, the instant payment system launched by the Central Bank of Brazil in 2020. The study aims to identify and classify the main...
- Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy : Abstract: Diffusion- and flow-based policies deliver state-of-the-art performance on long-horizon robotic manipulation and imitation learning tasks. However, these controllers employ a fixed inference...
- Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives : Abstract: Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting is a method that can mitigate such b...
- Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment : Abstract: Existing studies on reinforcement learning (RL) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although...
- Open Vocabulary Compositional Explanations for Neuron Alignment : Abstract: Neurons are the fundamental building blocks of deep neural networks, and their interconnections allow AI to achieve unprecedented results. Motivated by the goal of understanding how neurons ...
- Resilient Charging Infrastructure via Decentralized Coordination of Electric Vehicles at Scale : Abstract: The rapid adoption of electric vehicles (EVs) introduces major challenges for decentralized charging control. Existing decentralized approaches efficiently coordinate a large number of EVs t...
- SpaceX: Exploring metrics with the SPACE model for developer productivity : Abstract: This empirical investigation elucidates the limitations of deterministic, unidimensional productivity heuristics by operationalizing the SPACE framework through extensive repository mining. ...
- BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model : Abstract: Automated radiology report generation (RRG) for breast ultrasound (BUS) is limited by the lack of paired image-report datasets and the risk of hallucinations from large language models. We p...
- Towards Audio Token Compression in Large Audio Language Models : Abstract: Large Audio Language Models (LALMs) demonstrate impressive performance across diverse tasks, ranging from speech recognition to general audio understanding. However, their scalability is lim...
- AI4X Roadmap: Artificial Intelligence for the advancement of scientific pursuit and its future directions : Abstract: Artificial intelligence and machine learning are reshaping how we approach scientific discovery, not by replacing established methods but by extending what researchers can probe, predict, an...
- Even with AI, Bijection Discovery is Still Hard: The Opportunities and Challenges of OpenEvolve for Novel Bijection Construction : Abstract: Evolutionary program synthesis systems such as AlphaEvolve, OpenEvolve, and ShinkaEvolve offer a new approach to AI-assisted mathematical discovery. These systems utilize teams of large lang...
- Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning : Abstract: Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by ...
- GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision : Abstract: Multimodal large reasoning models (MLRMs) are increasingly deployed for vision-language tasks that produce explicit intermediate rationales. However, reasoning traces can contain unsafe cont...
- FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning : Abstract: Representation learning is fundamental to modern machine learning, powering applications such as text retrieval and multimodal understanding. However, learning robust and generalizable repre...
- Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning : Abstract: News image captioning aims to produce journalistically informative descriptions by combining visual content with contextual cues from associated articles. Despite recent advances, existing m...
- Probabilistic Wildfire Spread Prediction Using an Autoregressive Conditional Generative Adversarial Network : Abstract: Climate change has intensified the frequency and severity of wildfires, making rapid and accurate prediction of fire spread essential for effective mitigation and response. Physics-based sim...
- Structure-Aware Prototype Guided Trusted Multi-View Classification : Abstract: Trustworthy multi-view classification (TMVC) addresses the challenge of achieving reliable decision-making in complex scenarios where multi-source information is heterogeneous, inconsistent,...
- Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels : Abstract: Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classi...
- FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting : Abstract: Wi-Fi channel state information (CSI)-based sensing provides a non-invasive, device-free approach for tasks such as human activity recognition and crowd counting, but large-scale deployment ...
- Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs : Abstract: Fine-tuning large language models (LLMs) for downstream tasks typically exhibit a fundamental safety-capability tradeoff, where improving task performance degrades safety alignment even on b...
- Context-Aware Pragmatic Metacognitive Prompting for Sarcasm Detection : Abstract: Detecting sarcasm remains a challenging task in the areas of Natural Language Processing (NLP) despite recent advances in neural network approaches. Currently, Pre-trained Language Models (P...
- Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning : Abstract: Effective post-training is essential to align Large Language Models (LLMs) with specialized biomedical knowledge to accelerate life science research. However, current approaches face signifi...
- Data-Driven Assessment of Concrete Slab Integrity via Impact-Echo Signals and Neural Networks : Abstract: Subsurface defects such as delamination, voids, and honeycombing critically affect the durability of concrete bridge decks but are difficult to detect reliably using visual inspection or man...
- Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning : Abstract: In low-resource languages like Burmese, classification tasks often fine-tune only the final classification layer, keeping pre-trained encoder weights frozen. While Multi-Layer Perceptrons (M...
- MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts : Abstract: Large Language Models (LLMs) are predominantly deployed as dense transformers, where every parameter in every feed-forward block is activated for every token. While architecturally simple, t...
- MNM : Multi-level Neuroimaging Meta-analysis with Hyperbolic Brain-Text Representations : Abstract: Various neuroimaging studies suffer from small sample size problem which often limit their reliability. Meta-analysis addresses this challenge by aggregating findings from different studies ...
- Pygmalion Effect in Vision: Image-to-Clay Translation for Reflective Geometry Reconstruction : Abstract: Understanding reflection remains a long-standing challenge in 3D reconstruction due to the entanglement of appearance and geometry under view-dependent reflections. In this work, we present ...
- From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) have recently emerged as a strong alternative to autoregressive language models (LMs). DLMs offer comparable accuracy with faster inference speed via paralle...
- Dynamic Stratified Contrastive Learning with Upstream Augmentation for MILP Branching : Abstract: Mixed Integer Linear Programming (MILP) is a fundamental class of NP-hard problems that has garnered significant attention from both academia and industry. The Branch-and-Bound (B\&B) method...
- Deformation-aware Temporal Generation for Early Prediction of Alzheimers Disease : Abstract: Alzheimer's disease (AD), a degenerative brain condition, can benefit from early prediction to slow its progression. As the disease progresses, patients typically undergo brain atrophy. Curr...
- Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling : Abstract: Understanding how chemical perturbations propagate through biological systems is essential for robust molecular property prediction. While most existing methods focus on chemical structures ...
- Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval : Abstract: Document centric RAG pipelines usually begin with OCR, followed by brittle heuristics for chunking, table parsing, and layout reconstruction. These text first workflows are costly to maintai...
- Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models : Abstract: Large-scale vision generative models, including diffusion and flow models, have demonstrated remarkable performance in visual generation tasks. However, transferring these pre-trained models...
- SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation : Abstract: Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-a...
- Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning : Abstract: Human video generation has advanced rapidly with the development of diffusion models, but the high computational cost and substantial memory consumption associated with training these models...
- Maglev-Pentabot: Magnetic Levitation System for Non-Contact Manipulation using Deep Reinforcement Learning : Abstract: Non-contact manipulation has emerged as a transformative approach across various industrial fields. However, current flexible 2D and 3D non-contact manipulation techniques are often limited ...
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs : Abstract: Visual encoding followed by token condensing has become the standard architectural paradigm in multi-modal large language models (MLLMs). Many recent MLLMs increasingly favor global native- ...
- CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion : Abstract: Diffusion models exhibit notable fragility when faced with adversarial prompts, and strengthening attack capabilities is crucial for uncovering such vulnerabilities and building more robust ...
- Privacy in Federated Learning with Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) have emerged as prominent candidates for embedded and edge AI. Their inherent low power consumption makes them far more efficient than conventional ANNs in sce...
- Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation : Abstract: Recent visual autoregressive (AR) models have shown promising capabilities in text-to-image generation, operating in a manner similar to large language models. While test-time computation sc...
- When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models : Abstract: Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and...
- BotaCLIP: Contrastive Learning for Botany-Aware Representation of Earth Observation Data : Abstract: Foundation models have demonstrated a remarkable ability to learn rich, transferable representations across diverse modalities such as images, text, and audio. In modern machine learning pip...
- Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines : Abstract: Reasoning models have demonstrated remarkable capabilities in complex reasoning tasks. However, ensuring their safety against adversarial jailbreak prompts remains a critical challenge. Due ...
- Improvement of Collision Avoidance in Cut-In Maneuvers Using Time-to-Collision Metrics : Abstract: This paper proposes a new strategy for collision avoidance system leveraging Time-to-Collision (TTC) metrics for handling cut-in scenarios, which are particularly challenging for autonomous ...
- TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated Stories : Abstract: Millions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how such chatbots represent diverse cultures. At the same time,...
- SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection : Abstract: Deepfake (DF) audio detectors still struggle to generalize to out of distribution inputs. A central reason is spectral bias, the tendency of neural networks to learn low-frequency structure ...
- The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment : Abstract: Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operate in pairwise settings, aligni...
- Hybrid SIFT-SNN for Efficient Anomaly Detection of Traffic Flow-Control Infrastructure : Abstract: This paper presents the SIFT-SNN framework, a low-latency neuromorphic signal-processing pipeline for real-time detection of structural anomalies in transport infrastructure. The proposed ap...
- SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding : Abstract: Recent advances in multimodal large language models (LLMs) have highlighted their potential for medical and surgical applications. However, existing surgical datasets predominantly adopt a V...
- Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures : Abstract: Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or ...
- Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance : Abstract: Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense reward functions from expert dem...
- The Directed Prediction Change - Efficient and Trustworthy Fidelity Assessment for Local Feature Attribution Methods : Abstract: The utility of an explanation method critically depends on its fidelity to the underlying machine learning model. Especially in high-stakes medical settings, clinicians and regulators requir...
- Anomaly Detection with Adaptive and Aggressive Rejection for Contaminated Training Data : Abstract: Handling contaminated data poses a critical challenge in anomaly detection, as traditional models assume training on purely normal data. Conventional methods mitigate contamination by relyin...
- FITRep: Attention-Guided Item Representation via MLLMs : Abstract: Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal e...
- RIA: A Ranking-Infused Approach for Optimized listwise CTR Prediction : Abstract: Reranking improves recommendation quality by modeling item interactions. However, existing methods often decouple ranking and reranking, leading to weak listwise evaluation models that suffe...
- Monet: Reasoning in Latent Visual Space Beyond Images and Language : Abstract: "Thinking with images" has emerged as an effective paradigm for advancing visual reasoning, extending beyond text-only chains of thought by injecting visual evidence into intermediate reason...
- Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis : Abstract: How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior studies on language models have reported an inverse scaling effect, where...
- Training Introspective Behavior: Fine-Tuning Induces Reliable Internal State Detection in a 7B Model : Abstract: Lindsey (2025) investigates introspective awareness in language models through four experiments, finding that models can sometimes detect and identify injected activation patterns -- but unr...
- Subjective Depth and Timescale Transformers: Learning Where and When to Compute : Abstract: The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Ad...
- Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM : Abstract: Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging. The utilisation of High-Perfo...
- SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning : Abstract: Remote sensing change captioning is an emerging and popular research task that aims to describe, in natural language, the content of interest that has changed between two remote sensing imag...
- From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings : Abstract: We present a novel unsupervised framework to unlock vast unlabeled human demonstration data from continuous industrial video streams for Vision-Language-Action (VLA) model pre-training. Our ...
- EvRainDrop: HyperGraph-guided Completion for Effective Frame and Event Stream Aggregation : Abstract: Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically use event frames, voxels, or te...
- Constructing and Benchmarking: a Labeled Email Dataset for Text-Based Phishing and Spam Detection Framework : Abstract: Phishing and spam emails remain a major cybersecurity threat, with attackers increasingly leveraging Large Language Models (LLMs) to craft highly deceptive content. This study presents a com...
- Hierarchical Ranking Neural Network for Long Document Readability Assessment : Abstract: Readability assessment aims to evaluate the reading difficulty of a text. In recent years, while deep learning technology has been gradually applied to readability assessment, most approache...
- Going with the Speed of Sound: Pushing Neural Surrogates into Highly-turbulent Transonic Regimes : Abstract: The widespread use of neural surrogates in automotive aerodynamics, enabled by datasets such as DrivAerML and DrivAerNet++, has primarily focused on bluff-body flows with large wakes. Extend...
- Frequency-Aware Token Reduction for Efficient Vision Transformer : Abstract: Vision Transformers have demonstrated exceptional performance across various computer vision tasks, yet their quadratic computational complexity concerning token length remains a significant...
- Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning : Abstract: We present a novel training approach, named Merge-and-Bound (M&B) for Class Incremental Learning (CIL), which directly manipulates model weights in the parameter space for optimization. Our ...
- Minimizing Hyperbolic Embedding Distortion with LLM-Guided Hierarchy Restructuring : Abstract: Hyperbolic geometry is an effective geometry for embedding hierarchical data structures. Hyperbolic learning has therefore become increasingly prominent in machine learning applications wher...
- AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI : Abstract: The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-Engl...
- $A^2Flow:$ Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators : Abstract: Large language models (LLMs) have shown strong potential in automating the design of agentic workflows. However, existing methods still rely heavily on manually predefined operators, limitin...
- Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning : Abstract: Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and ...
- A Brief History of Digital Twin Technology : Abstract: Emerging from NASA's spacecraft simulations in the 1960s, digital twin technology has advanced through industrial adoption to spark a healthcare transformation. A digital twin is a dynamic, ...
- Paraconsistent-Lib: an intuitive PAL2v algorithm Python Library : Abstract: This paper introduces Paraconsistent-Lib, an open-source, easy-to-use Python library for building PAL2v algorithms in reasoning and decision-making systems. Paraconsistent-Lib is designed as...
- Cross Domain Evaluation of Multimodal Chain-of-Thought Reasoning of different datasets into the Amazon CoT Framework : Abstract: While recent work has extended CoT to multimodal settings, achieving state-of-the-art results on science question answering benchmarks like ScienceQA, the generalizability of these approache...
- Learning Multi-Access Point Coordination in Agentic AI Wi-Fi with Large Language Models : Abstract: Multi-access point coordination (MAPC) is a key technology for enhancing throughput in next-generation Wi-Fi within dense overlapping basic service sets. However, existing MAPC protocols rel...
- OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability : Abstract: Reliability is key to realizing the promise of autonomous UI-Agents, multimodal agents that directly interact with apps in the same manner as humans, as users must be able to trust an agent ...
- Representation Interventions Enable Lifelong Unstructured Knowledge Control : Abstract: Large language models (LLMs) often produce incorrect or outdated content. Updating their knowledge efficiently and accurately without costly retraining is a major challenge. This problem is ...
- Guaranteed Optimal Compositional Explanations for Neurons : Abstract: While neurons are the basic units of deep neural networks, it is still unclear what they learn and if their knowledge is aligned with that of humans. Compositional explanations aim to answer...
- ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction : Abstract: Embodied cognition argues that intelligence arises from sensorimotor interaction rather than passive observation. It raises an intriguing question: do modern vision-language models (VLMs), t...
- Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture : Abstract: In procedural skill learning, instructional explanations must convey not just steps, but the causal, goal-directed, and compositional logic behind them. Large language models (LLMs) often pr...
- ICPO: Intrinsic Confidence-Driven Group Relative Preference Optimization for Efficient Reinforcement Learning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates significant potential in enhancing the reasoning capabilities of Large Language Models (LLMs). However, existing RLVR metho...
- Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning : Abstract: The rationality of law manifests in two forms: substantive rationality, which concerns the fairness or moral desirability of outcomes, and formal rationality, which requires legal decisions ...
- OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection : Abstract: Open-Vocabulary Object Detection (OVOD) aims to enable detectors to generalize across categories by leveraging semantic information. Although existing methods are pretrained on large vision-...
- Causality Without Causal Models : Abstract: Perhaps the most prominent current definition of (actual) causality is due to Halpern and Pearl. It is defined using causal models (also known as structural equations models). We abstract ...
- Prune4Web: DOM Tree Pruning Programming for Web Agent : Abstract: Web automation employs intelligent agents to execute high-level tasks by mimicking human interactions with web interfaces. Despite the capabilities of recent Large Language Model (LLM)-based...
- New Hybrid Heuristics for Pseudo-Boolean Propagation : Abstract: In pseudo-boolean solving the currently most successful unit propagation strategy is a hybrid mode combining the watched literal scheme with the counting method. This short paper introduces ...
- Conversational no-code and multi-agentic disease module identification and drug repurposing prediction with ChatDRex : Abstract: Repurposing approved drugs offers a time-efficient and cost-effective alternative to traditional drug development. However, in silico prediction of repurposing candidates is challenging and ...
- EWE: An Agentic Framework for Extreme Weather Analysis : Abstract: Extreme weather events pose escalating risks to global society, underscoring the urgent need to unravel their underlying physical mechanisms. Yet the prevailing expert-driven, labor-intensiv...
- MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning : Abstract: Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where dangerous instructions pose significant risks...
- SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition : Abstract: Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) ...
- Pessimistic Verification for Open Ended Math Questions : Abstract: The key limitation of the verification performance lies in the ability of error detection. With this intuition we designed several variants of pessimistic verification, which are simple work...
- Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit : Abstract: If a language model cannot reliably disclose its AI identity in expert contexts, users cannot trust its competence boundaries. This study examines self-transparency in models assigned profes...
- From Prediction to Foresight: The Role of AI in Designing Responsible Futures : Abstract: In an era marked by rapid technological advancements and complex global challenges, responsible foresight has emerged as an essential framework for policymakers aiming to navigate future unc...
- On the Limits of Innate Planning in Large Language Models : Abstract: Large language models (LLMs) achieve impressive results on many benchmarks, yet their capacity for planning and stateful reasoning remains unclear. We study these abilities directly, without...
- Bridging the Unavoidable A Priori: A Framework for Comparative Causal Modeling : Abstract: AI/ML models have rapidly gained prominence as innovations for solving previously unsolved problems and their unintended consequences from amplifying human biases. Advocates for responsible ...
- Agentic Learner with Grow-and-Refine Multimodal Semantic Memory : Abstract: MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mai...
- When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition : Abstract: The increasing trust in large language models (LLMs), especially in the form of chatbots, is often undermined by the lack of their extrinsic evaluation. This holds particularly true in nutri...
- Domain-Grounded Evaluation of LLMs in International Student Knowledge : Abstract: Large language models (LLMs) are increasingly used to answer high-stakes study-abroad questions about admissions, visas, scholarships, and eligibility. Yet it remains unclear how reliably th...
- CodeVaani: A Multilingual, Voice-Based Code Learning Assistant : Abstract: Programming education often assumes English proficiency and text-based interaction, creating barriers for students from multilingual regions such as India. We present CodeVaani, a multilingu...
- Context-Aware Visual Prompting: Automating Geospatial Web Dashboards with Large Language Models and Agent Self-Validation for Decision Support : Abstract: The development of web-based geospatial dashboards for risk analysis and decision support is often challenged by the difficulty in visualization of big, multi-dimensional environmental data,...
- Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects : Abstract: The development of agents with emotional intelligence is becoming increasingly vital due to their significant role in human-computer interaction and the growing integration of computer syste...
- Transforming Higher Education with AI-Powered Video Lectures : Abstract: The integration of artificial intelligence (AI) into video lecture production has the potential to transform higher education by streamlining content creation and enhancing accessibility. Th...
- MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems : Abstract: Ensuring cognitive stability in autonomous multi-agent systems (MAS) is a central challenge for large-scale, distributed AI. While existing observability tools monitor system outputs, they c...
- Structured Definitions and Segmentations for Legal Reasoning in LLMs: A Study on Indian Legal Data : Abstract: Large Language Models (LLMs), trained on extensive datasets from the web, exhibit remarkable general reasoning skills. Despite this, they often struggle in specialized areas like law, mainly...
- MindSET: Advancing Mental Health Benchmarking through Large-Scale Social Media Data : Abstract: Social media data has become a vital resource for studying mental health, offering real-time insights into thoughts, emotions, and behaviors that traditional methods often miss. Progress in ...
- Cognitive bias in LLM reasoning compromises interpretation of clinical oncology notes : Abstract: Despite high performance on clinical benchmarks, large language models may reach correct conclusions through faulty reasoning, a failure mode with safety implications for oncology decision s...
- Hybrid coupling with operator inference and the overlapping Schwarz alternating method : Abstract: This paper presents a novel hybrid approach for coupling subdomain-local non-intrusive Operator Inference (OpInf) reduced order models (ROMs) with each other and with subdomain-local high-fi...
- Morality in AI. A plea to embed morality in LLM architectures and frameworks : Abstract: Large language models (LLMs) increasingly mediate human decision-making and behaviour. Ensuring LLM processing of moral meaning therefore has become a critical challenge. Current approaches ...
- Prototype-Guided Non-Exemplar Continual Learning for Cross-subject EEG Decoding : Abstract: Due to the significant variability in electroencephalogram (EEG) signals across individuals, knowledge acquired from previous subjects is often overwritten as new subjects are introduced in ...
- Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores : Abstract: Understanding complete musical scores requires reasoning over symbolic structures such as pitch, rhythm, harmony, and form. Despite the rapid progress of Large Language Models (LLMs) and Vis...
- On the Role of Hidden States of Modern Hopfield Network in Transformer : Abstract: Associative memory models based on Hopfield networks and self-attention based on key-value mechanisms have been popular approaches in the study of memory mechanisms in deep learning. It has ...
- In Defense of the Turing Test and its Legacy : Abstract: Considering that Turing's original test was co-opted by Weizenbaum and that six of the most common criticisms of the Turing test are unfair to both Turing's argument and the historical devel...
- Post-Pruning Accuracy Recovery via Data-Free Knowledge Distillation : Abstract: Model pruning is a widely adopted technique to reduce the computational complexity and memory footprint of Deep Neural Networks (DNNs). However, global unstructured pruning often leads to si...
- PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach : Abstract: Recent advances in Large Language Models (LLMs) have sparked concerns over their potential to acquire and misuse dangerous or high-risk capabilities, posing frontier risks. Current safety ev...
- Solving Diffusion Inverse Problems with Restart Posterior Sampling : Abstract: Inverse problems are fundamental to science and engineering, where the goal is to infer an underlying signal or state from incomplete or noisy measurements. Recent approaches employ diffusio...
- DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation : Abstract: Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that gener...
- Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage? : Abstract: In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This pap...
- Active Slice Discovery in Large Language Models : Abstract: Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model ...
- Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation : Abstract: World models serve as core simulators for fields such as agentic AI, embodied AI, and gaming, capable of generating long, physically realistic, and interactive high-quality videos. Moreover,...
- ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training : Abstract: PPO has been widely adopted for training large language models (LLMs) at the token level in multi-turn dialogue and reasoning tasks. However, its performance is often unstable and prone to c...
- DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving : Abstract: Vision-Language Action (VLA) models unify perception, reasoning, and trajectory generation for autonomous driving, but suffer from significant inference latency due to deep transformer stack...
- Foundry: Distilling 3D Foundation Models for the Edge : Abstract: Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computation...
- DinoLizer: Learning from the Best for Generative Inpainting Localization : Abstract: We introduce DinoLizer, a DINOv2-based model for localizing manipulated regions in generative inpainting. Our method builds on a DINOv2 model pretrained to detect synthetic images on the B-F...
- Gradient Descent Algorithm Survey : Abstract: Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. ...
Research Sources: 384 | Generated: 11/27/2025
