AI RESEARCH PAPERS & ACADEMIC SOURCES
- l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion : Abstract: Multi-modal image fusion (MMIF) enhances the information content of the fused image by combining the unique as well as common features obtained from different modality sensor images, improvi...
- Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints : Abstract: 3D human generation is increasingly significant in various applications. However, the direct use of 2D generative methods in 3D generation often results in losing local details, while method...
- Effective Online Exam Proctoring by Combining Lightweight Face Detection and Deep Recognition : Abstract: Online exams, conducted via video conferencing platforms such as Zoom, have become popular in educational institutions since COVID-19. While convenient, ensuring the integrity and security o...
- Dual Cluster Contrastive learning for Object Re-Identification : Abstract: Recently, cluster contrastive learning has been proven effective for object ReID by computing the contrastive loss between the individual features and the cluster memory. However, existing m...
- Design of a six wheel suspension and a three-axis linear actuation mechanism for a laser weeding robot : Abstract: Mobile robots are increasingly utilized in agriculture to automate labor-intensive tasks such as weeding, sowing, harvesting and soil analysis. Recently, agricultural robots have been develo...
- StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space : Abstract: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canon...
- WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World : Abstract: Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally. Despite rap...
- Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision : Abstract: The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map monocular visual input to control ...
- Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization : Abstract: Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on hol...
- Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration : Abstract: In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask if samples can be generated co...
- E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training : Abstract: Self-supervised pre-training has revolutionized foundation models for languages, individual 2D images and videos, but remains largely unexplored for learning 3D-aware representations from mu...
- ClusIR: Towards Cluster-Guided All-in-One Image Restoration : Abstract: All-in-One Image Restoration (AiOIR) aims to recover high-quality images from diverse degradations within a unified framework. However, existing methods often fail to explicitly model degrad...
- Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving : Abstract: We present Flex, an efficient and effective scene encoder that addresses the computational bottleneck of processing high-volume multi-camera data in end-to-end autonomous driving. Flex emplo...
- MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation : Abstract: This paper proposes a large-scale multi-modal dataset for referring motion expression video segmentation, focusing on segmenting and tracking target objects in videos based on language descr...
- VL-JEPA: Joint Embedding Predictive Architecture for Vision-language : Abstract: We introduce VL-JEPA, a vision-language model built on a Joint Embedding Predictive Architecture (JEPA). Instead of autoregressively generating tokens as in classical VLMs, VL-JEPA predicts ...
- GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting : Abstract: Speech-driven talking heads have recently emerged and enable interactive avatars. However, real-world applications are limited, as current methods achieve high visual fidelity but slow or fa...
- FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos : Abstract: Motion understanding is fundamental to physical reasoning, enabling models to infer dynamics and predict future states. However, state-of-the-art models still struggle on recent motion bench...
- DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance : Abstract: Recent vision-language model (VLM)-based approaches have achieved impressive results on SVG generation. However, because they generate only text and lack visual signals during decoding, they...
- PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction : Abstract: Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in develo...
- MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos : Abstract: Motion capture now underpins content creation far beyond digital humans, yet most existing pipelines remain species- or template-specific. We formalize this gap as Category-Agnostic Motion C...
- From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models : Abstract: This paper introduces the concept of Microscopic Spatial Intelligence (MiSI), the capability to perceive and reason about the spatial relationships of invisible microscopic entities, which i...
- SWiT-4D: Sliding-Window Transformer for Lossless and Parameter-Free Temporal 4D Generation : Abstract: Despite significant progress in 4D content generation, the conversion of monocular videos into high-quality animated 3D assets with explicit 4D meshes remains considerably challenging. The s...
- PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning : Abstract: 6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly con...
- Self-Ensemble Post Learning for Noisy Domain Generalization : Abstract: While computer vision and machine learning have made great progress, their robustness is still challenged by two key issues: data distribution shift and label noise. When domain generalizati...
- Graph Laplacian Transformer with Progressive Sampling for Prostate Cancer Grading : Abstract: Prostate cancer grading from whole-slide images (WSIs) remains a challenging task due to the large-scale nature of WSIs, the presence of heterogeneous tissue structures, and difficulty of se...
- Blood Pressure Prediction for Coronary Artery Disease Diagnosis using Coronary Computed Tomography Angiography : Abstract: Computational fluid dynamics (CFD) based simulation of coronary blood flow provides valuable hemodynamic markers, such as pressure gradients, for diagnosing coronary artery disease (CAD). Ho...
- LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation : Abstract: Colonoscopic polyp diagnosis is pivotal for early colorectal cancer detection, yet traditional automated reporting suffers from inconsistencies and hallucinations due to the scarcity of high...
- IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation : Abstract: Recent advances in motion-aware large language models have shown remarkable promise for unifying motion understanding and generation tasks. However, these models typically treat understandin...
- Video Depth Propagation : Abstract: Depth estimation in videos is essential for visual perception in real-world applications. However, existing methods either rely on simple frame-by-frame monocular models, leading to temporal...
- SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving : Abstract: End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities...
- CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images : Abstract: Uncertainty estimation is essential for the safe clinical deployment of medical image segmentation systems, enabling the identification of unreliable predictions and supporting human oversig...
- Geo6DPose: Fast Zero-Shot 6D Object Pose Estimation via Geometry-Filtered Feature Matching : Abstract: Recent progress in zero-shot 6D object pose estimation has been driven largely by large-scale models and cloud-based inference. However, these approaches often introduce high latency, elevat...
- XDen-1K: A Density Field Dataset of Real-World Objects : Abstract: A deep understanding of the physical world is a central goal for embodied AI and realistic simulation. While current models excel at capturing an object's surface geometry and appearance, th...
- NaviHydra: Controllable Navigation-guided End-to-end Autonomous Driving with Hydra-distillation : Abstract: The complexity of autonomous driving scenarios requires robust models that can interpret high-level navigation commands and generate safe trajectories. While traditional rule-based systems c...
- TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection : Abstract: Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecti...
- K-Track: Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices : Abstract: Point tracking in video sequences is a foundational capability for real-world computer vision applications, including robotics, autonomous systems, augmented reality, and video analysis. Whi...
- DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM : Abstract: Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language m...
- Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces : Abstract: We present Lang2Motion, a framework for language-guided point trajectory generation by aligning motion manifolds with joint embedding spaces. Unlike prior work focusing on human motion or vi...
- Robust Multi-Disease Retinal Classification via Xception-Based Transfer Learning and W-Net Vessel Segmentation : Abstract: In recent years, the incidence of vision-threatening eye diseases has risen dramatically, necessitating scalable and accurate screening solutions. This paper presents a comprehensive study o...
- Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos : Abstract: We propose Track and Caption Any Motion (TCAM), a motion-centric framework for automatic video understanding that discovers and describes motion patterns without user queries. Understanding ...
- Salient Object Detection in Complex Weather Conditions via Noise Indicators : Abstract: Salient object detection (SOD), a foundational task in computer vision, has advanced from single-modal to multi-modal paradigms to enhance generalization. However, most existing SOD methods ...
- Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration : Abstract: All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework, yet existing methods increasingly rely on complex architectu...
- Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner : Abstract: Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlo...
- Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks : Abstract: Isolated Sign Language Recognition (ISLR) is critical for bridging the communication gap between the Deaf and Hard-of-Hearing (DHH) community and the hearing world. However, robust ISLR is f...
- Grounding Everything in Tokens for Multimodal Large Language Models : Abstract: Multimodal large language models (MLLMs) have made significant advancements in vision understanding and reasoning. However, the autoregressive Transformer architecture used by MLLMs requries...
- Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding : Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress on various vision-language tasks, yet their visual perception remains limited. Humans, in comparison, perceive comp...
- Take a Peek: Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA : Abstract: Few-shot semantic segmentation (FSS) aims to segment novel classes in query images using only a small annotated support set. While prior research has mainly focused on improving decoders, th...
- 3D Blood Pulsation Maps : Abstract: We present Pulse3DFace, the first dataset of its kind for estimating 3D blood pulsation maps. These maps can be used to develop models of dynamic facial blood pulsation, enabling the creatio...
- Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network : Abstract: Shape-from-Focus (SFF) is a passive depth estimation technique that infers scene depth by analyzing focus variations in a focal stack. Most recent deep learning-based SFF methods typically o...
- Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment : Abstract: Existing frameworks for learned video compression suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separa...
- Neural Collapse in Test-Time Adaptation : Abstract: Test-Time Adaptation (TTA) enhances model robustness to out-of-distribution (OOD) data by updating the model online during inference, yet existing methods lack theoretical insights into the ...
- TransLocNet: Cross-Modal Attention for Aerial-Ground Vehicle Localization with Contrastive Learning : Abstract: Aerial-ground localization is difficult due to large viewpoint and modality gaps between ground-level LiDAR and overhead imagery. We propose TransLocNet, a cross-modal attention framework th...
- MultiHateLoc: Towards Temporal Localisation of Multimodal Hate Content in Online Videos : Abstract: The rapid growth of video content on platforms such as TikTok and YouTube has intensified the spread of multimodal hate speech, where harmful cues emerge subtly and asynchronously across vis...
- Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method : Abstract: High-quality point cloud data is a critical foundation for tasks such as autonomous driving and 3D reconstruction. However, LiDAR-based point cloud acquisition is often affected by various d...
- Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching : Abstract: Accurate spatial understanding is essential for image-guided surgery, augmented reality integration and context awareness. In minimally invasive procedures, where visual input is the sole in...
- RaLiFlow: Scene Flow Estimation with 4D Radar and LiDAR Point Clouds : Abstract: Recent multimodal fusion methods, integrating images with LiDAR point clouds, have shown promise in scene flow estimation. However, the fusion of 4D millimeter wave radar and LiDAR remains u...
- Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a state-of-the-art method for novel view synthesis. However, its performance heavily relies on dense, high-quality input imagery, an assumption th...
- Point to Span: Zero-Shot Moment Retrieval for Navigating Unseen Hour-Long Videos : Abstract: Zero-shot Long Video Moment Retrieval (ZLVMR) is the task of identifying temporal segments in hour-long videos using a natural language query without task-specific training. The core technic...
- Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task : Abstract: Video Question Answering (VideoQA) task serves as a critical playground for evaluating whether foundation models can effectively perceive, understand, and reason about dynamic real-world sce...
- mmCounter: Static People Counting in Dense Indoor Scenarios Using mmWave Radar : Abstract: mmWave radars struggle to detect or count individuals in dense, static (non-moving) groups due to limitations in spatial resolution and reliance on movement for detection. We present mmCount...
- Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation : Abstract: Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders ...
- Topology-Agnostic Animal Motion Generation from Text Prompt : Abstract: Motion generation is fundamental to computer animation and widely used across entertainment, robotics, and virtual environments. While recent methods achieve impressive results, most rely on...
- CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates : Abstract: Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions to...
- Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution : Abstract: Real-World Image Super-Resolution (Real-ISR) aims to recover high-quality images from low-quality inputs degraded by unknown and complex real-world factors. Real-world scenarios involve dive...
- A Conditional Generative Framework for Synthetic Data Augmentation in Segmenting Thin and Elongated Structures in Biological Images : Abstract: Thin and elongated filamentous structures, such as microtubules and actin filaments, often play important roles in biological systems. Segmenting these filaments in biological images is a fu...
- Simple Yet Effective Selective Imputation for Incomplete Multi-view Clustering : Abstract: Incomplete multi-view data, where different views suffer from missing and unbalanced observations, pose significant challenges for clustering. Existing imputation-based methods attempt to es...
- StainNet: A Special Staining Self-Supervised Vision Transformer for Computational Pathology : Abstract: Foundation models trained with self-supervised learning (SSL) on large-scale histological images have significantly accelerated the development of computational pathology. These models can s...
- EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs : Abstract: Audio-Visual Large Language Models (AV-LLMs) face prohibitive computational overhead from massive audio and video tokens. Token reduction, while extensively explored for video-only LLMs, is ...
- Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset : Abstract: We propose a novel generative approach for 3D human pose estimation. 3D human pose estimation poses several key challenges due to the complex geometry of the human body, self-occluding joint...
- ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle...
- DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra...
- Efficient-VLN: A Training-Efficient Vision-Language Navigation Model : Abstract: Multimodal large language models (MLLMs) have shown promising potential in Vision-Language Navigation (VLN). However, their practical development is severely hindered by the substantial trai...
- Physically Aware 360$^\circ$ View Generation from a Single Image using Disentangled Scene Embeddings : Abstract: We introduce Disentangled360, an innovative 3D-aware technology that integrates the advantages of direction disentangled volume rendering with single-image 360° unique view synthesis for app...
- ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions : Abstract: Shot transitions play a pivotal role in multi-shot video generation, as they determine the overall narrative expression and the directorial design of visual storytelling. However, recent pro...
- Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation : Abstract: Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, ex...
- Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction : Abstract: Recent advances in generalizable Gaussian splatting (GS) have enabled feed-forward reconstruction of scenes from tens of input views. Long-LRM notably scales this paradigm to 32 input images...
- VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models : Abstract: Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual...
- GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule : Abstract: Accurate segmentation of cardiac chambers in echocardiography sequences is crucial for the quantitative analysis of cardiac function, aiding in clinical diagnosis and treatment. The imaging ...
- THE-Pose: Topological Prior with Hybrid Graph Fusion for Estimating Category-Level 6D Object Pose : Abstract: Category-level object pose estimation requires both global context and local structure to ensure robustness against intra-class variations. However, 3D graph convolution (3D-GC) methods only...
- Multi-dimensional Preference Alignment by Conditioning Reward Itself : Abstract: Reinforcement Learning from Human Feedback has emerged as a standard for aligning diffusion models. However, we identify a fundamental limitation in the standard DPO formulation because it r...
- Emerging Standards for Machine-to-Machine Video Coding : Abstract: Machines are increasingly becoming the primary consumers of visual data, yet most deployments of machine-to-machine systems still rely on remote inference where pixel-based video is streamed...
- Latent Chain-of-Thought World Modeling for End-to-End Driving : Abstract: Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior wor...
- Feature Coding for Scalable Machine Vision : Abstract: Deep neural networks (DNNs) drive modern machine vision but are challenging to deploy on edge devices due to high compute demands. Traditional approaches-running the full model on-device or ...
- Topological Conditioning for Mammography Models via a Stable Wavelet-Persistence Vectorization : Abstract: Breast cancer is the most commonly diagnosed cancer in women and a leading cause of cancer death worldwide. Screening mammography reduces mortality, yet interpretation still suffers from sub...
- Hierarchical Instance Tracking to Balance Privacy Preservation with Accessible Information : Abstract: We propose a novel task, hierarchical instance tracking, which entails tracking all instances of predefined categories of objects and parts, while maintaining their hierarchical relationship...
- TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing : Abstract: We present TraceFlow, a novel framework for high-fidelity rendering of dynamic specular scenes by addressing two key challenges: precise reflection direction estimation and physically accura...
- Neuromorphic Eye Tracking for Low-Latency Pupil Detection : Abstract: Eye tracking for wearable systems demands low latency and milliwatt-level power, but conventional frame-based pipelines struggle with motion blur, high compute cost, and limited temporal res...
- The Spatial Semantics of Iconic Gesture : Abstract: The current multimodal turn in linguistic theory leaves a crucial question unanswered: what is the meaning of iconic gestures, and how does it compose with speech meaning? We argue for a sep...
- CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences : Abstract: Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems ...
- BRACE: A Benchmark for Robust Audio Caption Quality Evaluation : Abstract: Automatic audio captioning is essential for audio understanding, enabling applications such as accessibility and content indexing. However, evaluating the quality of audio captions remains a...
- Watermarks for Language Models via Probabilistic Automata : Abstract: A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high de...
- Diffusion Is Your Friend in Show, Suggest and Tell : Abstract: Diffusion Denoising models demonstrated impressive results across generative Computer Vision tasks, but they still fail to outperform standard autoregressive solutions in the discrete domain...
- Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning : Abstract: Urban regeneration presents significant challenges within the context of urbanization, requiring adaptive approaches to tackle evolving needs. Leveraging advancements in large language model...
- Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity : Abstract: Emotions are central to politics and analyzing their role in political communication has a long tradition. As research increasingly leverages audio-visual materials to analyze the display of...
- Quantifying Emotional Tone in Tolkien's The Hobbit: Dialogue Sentiment Analysis with RegEx, NRC-VAD, and Python : Abstract: This study analyzes the emotional tone of dialogue in J. R. R. Tolkien's The Hobbit (1937) using computational text analysis. Dialogue was extracted with regular expressions, then preprocess...
- TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage : Abstract: Emergency speech recognition systems exhibit systematic performance degradation on non-standard English varieties, creating a critical gap in services for Caribbean populations. We present T...
- From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages : Abstract: Large language models are commonly trained on dominant languages like English, and their representation of low resource languages typically reflects cultural and linguistic biases present in...
- AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence : Abstract: Despite rapid advances in multimodal large language models, agricultural applications remain constrained by the lack of multilingual speech data, unified multimodal architectures, and compre...
- RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems : Abstract: Reward modeling has become a cornerstone of aligning large language models (LLMs) with human preferences. Yet, when extended to subjective and open-ended domains such as role play, existing ...
- XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs : Abstract: Current large language models (LLMs) are trained on massive amounts of text data, primarily from a few dominant languages. Studies suggest that this over-reliance on high-resource languages,...
- Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs : Abstract: What counts as evidence for syntactic structure? In traditional generative grammar, systematic contrasts in grammaticality such as subject-auxiliary inversion and the licensing of parasitic ...
- Decoding Student Minds: Leveraging Conversational Agents for Psychological and Learning Analysis : Abstract: This paper presents a psychologically-aware conversational agent designed to enhance both learning performance and emotional well-being in educational settings. The system combines Large Lan...
- Enhancing Next-Generation Language Models with Knowledge Graphs: Extending Claude, Mistral IA, and GPT-4 via KG-BERT : Abstract: Large language models (LLMs) like Claude, Mistral IA, and GPT-4 excel in NLP but lack structured knowledge, leading to factual inconsistencies. We address this by integrating Knowledge Graph...
- Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature : Abstract: The integrity and reliability of scientific literature is facing a serious threat by adversarial text generation techniques, specifically from the use of automated paraphrasing tools to mask...
- T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground : Abstract: We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense to...
- Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models : Abstract: We explore the use of small language models (SLMs) for automatic question generation as a complement to the prevalent use of their large counterparts in learning analytics research. We prese...
- Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts : Abstract: Transfer Learning (TL) is currently the most effective approach for modeling building thermal dynamics when only limited data are available. TL uses a pretrained model that is fine-tuned to ...
- When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization : Abstract: Current image generation methods are based on a two-stage training approach. In stage 1, an auto-encoder is trained to compress an image into a latent space; in stage 2, a generative model i...
- Extrapolating Jet Radiation with Autoregressive Transformers : Abstract: Generative networks are an exciting tool for fast LHC event fixed number of particles. Autoregressive transformers allow us to generate events containing variable numbers of particles, very ...
- Deep Operator BSDE: a Numerical Scheme to Approximate Solution Operators : Abstract: Motivated by dynamic risk measures and conditional $g$-expectations, in this work we propose a numerical method to approximate the solution operator given by a Backward Stochastic Differenti...
- IRG: Modular Synthetic Relational Database Generation with Complex Relational Schemas : Abstract: Relational databases (RDBs) are widely used by corporations and governments to store multiple related tables. Their relational schemas pose unique challenges to synthetic data generation for...
- Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration : Abstract: Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data cente...
- Enhanced Spatial Clustering of Single-Molecule Localizations with Graph Neural Networks : Abstract: Single-molecule localization microscopy generates point clouds corresponding to fluorophore localizations. Spatial cluster identification and analysis of these point clouds are crucial for e...
- Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization : Abstract: Recent studies have shown that deep learning models are very vulnerable to poisoning attacks. Many defense methods have been proposed to address this issue. However, traditional poisoning at...
- Noisy Spiking Actor Network for Exploration : Abstract: As a general method for exploration in deep reinforcement learning (RL), NoisyNet can produce problem-specific exploration strategies. Spiking neural networks (SNNs), due to their binary fir...
- Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit : Abstract: Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the walls, and the perceptual limita...
- Noisy Quantum Learning Theory : Abstract: We develop a framework for learning from noisy quantum experiments, focusing on fault-tolerant devices accessing uncharacterized systems through noisy couplings. Our starting point is the co...
- Hermitian Yang--Mills connections on general vector bundles: geometry and physical Yukawa couplings : Abstract: We compute solutions to the Hermitian Yang-Mills equations on holomorphic vector bundles $V$ via an alternating optimisation procedure founded on geometric machine learning. The proposed met...
- Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets : Abstract: In this paper, we consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to b...
- Iterative Compositional Data Generation for Robot Control : Abstract: Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and m...
- A Differentiable Digital Twin of Distributed Link Scheduling for Contention-Aware Networking : Abstract: Many routing and flow optimization problems in wired networks can be solved efficiently using minimum cost flow formulations. However, this approach does not extend to wireless multi-hop net...
- Physics-informed Polynomial Chaos Expansion with Enhanced Constrained Optimization Solver and D-optimal Sampling : Abstract: Physics-informed polynomial chaos expansions (PC$^2$) provide an efficient physically constrained surrogate modeling framework by embedding governing equations and other physical constraints...
- An Elementary Proof of the Near Optimality of LogSumExp Smoothing : Abstract: We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical s...
- Deep sets and event-level maximum-likelihood estimation for fast pile-up jet rejection in ATLAS : Abstract: Multiple proton-proton collisions (pile-up) occur at every bunch crossing at the LHC, with the mean number of interactions expected to reach 80 during Run 3 and up to 200 at the High-Luminos...
- Quantum Approaches to Urban Logistics: From Core QAOA to Clustered Scalability : Abstract: The Traveling Salesman Problem (TSP) is a fundamental challenge in combinatorial optimization, widely applied in logistics and transportation. As the size of TSP instances grows, traditional...
- Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting : Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. In many such settings, speakers of Indian languages frequently communicate using romaniz...
- OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification : Abstract: Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also insepara...
- PMB-NN: Physiology-Centred Hybrid AI for Personalized Hemodynamic Monitoring from Photoplethysmography : Abstract: Continuous monitoring of blood pressure (BP) and hemodynamic parameters such as peripheral resistance (R) and arterial compliance (C) are critical for early vascular dysfunction detection. W...
- Sharp Monocular View Synthesis in Less Than a Second : Abstract: We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted ...
- Optimal transport unlocks end-to-end learning for single-molecule localization : Abstract: Single-molecule localization microscopy (SMLM) allows reconstructing biology-relevant structures beyond the diffraction limit by detecting and localizing individual fluorophores -- fluoresce...
- Virtual camera detection: Catching video injection attacks in remote biometric systems : Abstract: Face anti-spoofing (FAS) is a vital component of remote biometric authentication systems based on facial recognition, increasingly used across web-based applications. Among emerging threats,...
- Adaptive Intrusion Detection System Leveraging Dynamic Neural Models with Adversarial Learning for 5G/6G Networks : Abstract: Intrusion Detection Systems (IDS) are critical components in safeguarding 5G/6G networks from both internal and external cyber threats. While traditional IDS approaches rely heavily on signa...
- Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs : Abstract: Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are largely passive; they provide only post-hoc own...
- Topology-Guided Quantum GANs for Constrained Graph Generation : Abstract: Quantum computing (QC) promises theoretical advantages, benefiting computational problems that would not be efficiently classically simulatable. However, much of this theoretical speedup dep...
- Flexible Deep Neural Networks for Partially Linear Survival Data : Abstract: We propose a flexible deep neural network (DNN) framework for modeling survival data within a partially linear regression structure. The approach preserves interpretability through a paramet...
- Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models : Abstract: In context learning (ICL) underpins recent advances in large language models (LLMs), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop ...
- Hyperspectral Image Data Reduction for Endmember Extraction : Abstract: Endmember extraction from hyperspectral images aims to identify the spectral signatures of materials present in a scene. Recent studies have shown that self-dictionary methods can achieve hi...
- From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection : Abstract: Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggest...
- Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds : Abstract: This paper introduces a new probabilistic framework for supervised learning in neural systems. It is designed to model complex, uncertain systems whose random outputs are strongly non-Gaussi...
- Diffusion differentiable resampling : Abstract: This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly p...
- RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI : Abstract: Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference accelerati...
- Residual subspace evolution strategies for nonlinear inverse problems : Abstract: Nonlinear inverse problems often feature noisy, non-differentiable, or expensive residual evaluations that make Jacobian-based solvers unreliable. Popular derivative-free optimizers such as ...
- Tracking large chemical reaction networks and rare events by neural networks : Abstract: Chemical reaction networks are widely used to model stochastic dynamics in chemical kinetics, systems biology and epidemiology. Solving the chemical master equation that governs these system...
- Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels : Abstract: We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, und...
- Solving Semi-Supervised Few-Shot Learning from an Auto-Annotation Perspective : Abstract: Semi-supervised few-shot learning (SSFSL) formulates real-world applications like ''auto-annotation'', as it aims to learn a model over a few labeled and abundant unlabeled examples to annot...
- Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap : Abstract: As both ML training and inference are increasingly distributed, parallelization techniques that shard (divide) ML model across GPUs of a distributed system, are often deployed. With such tec...
- Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models : Abstract: Semi-analytic models are a widely used approach to simulate galaxy properties within a cosmological framework, relying on simplified yet physically motivated prescriptions. They have also pr...
- On Learning-Curve Monotonicity for Maximum Likelihood Estimators : Abstract: The property of learning-curve monotonicity, highlighted in a recent series of work by Loog, Mey and Viering, describes algorithms which only improve in average performance given more data, ...
- AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding : Abstract: Evaluating large language models (LLMs) has recently emerged as a critical issue for safe and trustworthy application of LLMs in the medical domain. Although a variety of static medical ques...
- The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights : Abstract: We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descen...
- Semantic-Aware Confidence Calibration for Automated Audio Captioning : Abstract: Automated audio captioning models frequently produce overconfident predictions regardless of semantic accuracy, limiting their reliability in deployment. This deficiency stems from two facto...
- Inference for Batched Adaptive Experiments : Abstract: The advantages of adaptive experiments have led to their rapid adoption in economics, other fields, as well as among practitioners. However, adaptive experiments pose challenges for causal i...
- STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale : Abstract: Real-world ecommerce recommender systems must deliver relevant items under strict tens-of-milliseconds latency constraints despite challenges such as cold-start products, rapidly shifting us...
- A Model-Guided Neural Network Method for the Inverse Scattering Problem : Abstract: Inverse medium scattering is an ill-posed, nonlinear wave-based imaging problem arising in medical imaging, remote sensing, and non-destructive testing. Machine learning (ML) methods offer i...
- Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation : Abstract: Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requiremen...
- Independent Density Estimation : Abstract: Large-scale Vision-Language models have achieved remarkable results in various domains, such as image captioning and conditioned image generation. Neverthe- less, these models still encounte...
- LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes : Abstract: Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively litt...
- Enhancing Fake-News Detection with Node-Level Topological Features : Abstract: In recent years, the proliferation of misinformation and fake news has posed serious threats to individuals and society, spurring intense research into automated detection methods. Previous ...
- TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0 : Abstract: The rapid growth of Web3.0 is transforming the Internet from a centralized structure to decentralized, which empowers users with unprecedented self-sovereignty over their own data. However, ...
- QSTAformer: A Quantum-Enhanced Transformer for Robust Short-Term Voltage Stability Assessment against Adversarial Attacks : Abstract: Short-term voltage stability assessment (STVSA) is critical for secure power system operation. While classical machine learning-based methods have demonstrated strong performance, they still...
- Bidirectional Normalizing Flow: From Data to Noise and Back : Abstract: Normalizing Flows (NFs) have been established as a principled framework for generative modeling. Standard NFs consist of a forward process and a reverse process: the forward process maps dat...
- Asynchronous Reasoning: Training-Free Interactive Thinking LLMs : Abstract: Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive: giv...
- Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation : Abstract: Autonomous navigation in underwater environments remains a major challenge due to the absence of GPS, degraded visibility, and the presence of submerged obstacles. This article investigates ...
- Physics-Informed Learning of Flow Distribution and Receiver Heat Losses in Parabolic Trough Solar Fields : Abstract: Parabolic trough Concentrating Solar Power (CSP) plants operate large hydraulic networks of collector loops that must deliver a uniform outlet temperature despite spatially heterogeneous opt...
- Classifier Reconstruction Through Counterfactual-Aware Wasserstein Prototypes : Abstract: Counterfactual explanations provide actionable insights by identifying minimal input changes required to achieve a desired model prediction. Beyond their interpretability benefits, counterfa...
- Guided Transfer Learning for Discrete Diffusion Models : Abstract: Discrete diffusion models achieve strong performance across language and other discrete domains, providing a powerful alternative to autoregressive models. However, their strong performance ...
- Scaling Behavior of Discrete Diffusion Language Models : Abstract: Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion ...
- Bayesian Symbolic Regression via Posterior Sampling : Abstract: Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. This paper introduces a Sequentia...
- Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments : Abstract: This paper introduces a reinforcement learning framework that enables controllable and diverse player behaviors without relying on human gameplay data. Existing approaches often require larg...
- Interpretable and Steerable Concept Bottleneck Sparse Autoencoders : Abstract: Sparse autoencoders (SAEs) promise a unified approach for mechanistic interpretability, concept discovery, and model steering in LLMs and LVLMs. However, realizing this potential requires th...
- Template-Free Retrosynthesis with Graph-Prior Augmented Transformers : Abstract: Retrosynthesis reaction prediction seeks to infer plausible reactant molecules for a given product and is a central problem in computer-aided organic synthesis. Despite recent progress, many...
- Generalized Spherical Neural Operators: Green's Function Formulation : Abstract: Neural operators offer powerful approaches for solving parametric partial differential equations, but extending them to spherical domains remains challenging due to the need to preserve intr...
- Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality : Abstract: Deep generative models, while revolutionizing fields like image and text generation, largely operate as opaque black boxes, hindering human understanding, control, and alignment. While metho...
- HybridVFL: Disentangled Feature Learning for Edge-Enabled Vertical Federated Multimodal Classification : Abstract: Vertical Federated Learning (VFL) offers a privacy-preserving paradigm for Edge AI scenarios like mobile health diagnostics, where sensitive multimodal data reside on distributed, resource-c...
- Learning by Analogy: A Causal Framework for Composition Generalization : Abstract: Compositional generalization -- the ability to understand and generate novel combinations of learned concepts -- enables models to extend their capabilities beyond limited experiences. While...
- DCFO Additional Material : Abstract: Outlier detection identifies data points that significantly deviate from the majority of the data distribution. Explaining outliers is crucial for understanding the underlying factors that c...
- Token Sample Complexity of Attention : Abstract: As context windows in large language models continue to expand, it is essential to characterize how attention behaves at extreme sequence lengths. We introduce token-sample complexity: the r...
- Supporting Migration Policies with Forecasts: Illegal Border Crossings in Europe through a Mixed Approach : Abstract: This paper presents a mixed-methodology to forecast illegal border crossings in Europe across five key migratory routes, with a one-year time horizon. The methodology integrates machine lear...
- Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification : Abstract: Bayesian Neural Networks (BNNs) provide principled uncertainty quantification but suffer from substantial computational and memory overhead compared to deterministic networks. While quantiza...
- Multi-Objective Reward and Preference Optimization: Theory and Algorithms : Abstract: This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models. T...
- THeGAU: Type-Aware Heterogeneous Graph Autoencoder and Augmentation : Abstract: Heterogeneous Graph Neural Networks (HGNNs) are effective for modeling Heterogeneous Information Networks (HINs), which encode complex multi-typed entities and relations. However, HGNNs ofte...
- Is the Information Bottleneck Robust Enough? Towards Label-Noise Resistant Information Bottleneck Learning : Abstract: The Information Bottleneck (IB) principle facilitates effective representation learning by preserving label-relevant information while compressing irrelevant information. However, its strong...
- Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders : Abstract: The Key-Value (KV) cache is the primary memory bottleneck in long-context Large Language Models, yet it is typically treated as an opaque numerical tensor. In this work, we propose \textbf{S...
- Mode-Seeking for Inverse Problems with Diffusion Models : Abstract: A pre-trained unconditional diffusion model, combined with posterior sampling or maximum a posteriori (MAP) estimation techniques, can solve arbitrary inverse problems without task-specific ...
- Disentangled and Distilled Encoder for Out-of-Distribution Reasoning with Rademacher Guarantees : Abstract: Recently, the disentangled latent space of a variational autoencoder (VAE) has been used to reason about multi-label out-of-distribution (OOD) test samples that are derived from different di...
- Hybrid Physics-ML Model for Forward Osmosis Flux with Complete Uncertainty Quantification : Abstract: Forward Osmosis (FO) is a promising low-energy membrane separation technology, but challenges in accurately modelling its water flux (Jw) persist due to complex internal mass transfer phenom...
- Metacognitive Sensitivity for Test-Time Dynamic Model Selection : Abstract: A key aspect of human cognition is metacognition - the ability to assess one's own knowledge and judgment reliability. While deep learning models can express confidence in their predictions,...
- The Operator Origins of Neural Scaling Laws: A Generalized Spectral Transport Dynamics of Deep Learning : Abstract: Modern deep networks operate in a rough, finite-regularity regime where Jacobian-induced operators exhibit heavy-tailed spectra and strong basis drift. In this work, we derive a unified oper...
- Fitting magnetization data using continued fraction of straight lines : Abstract: Magnetization of a ferromagnetic substance in response to an externally applied magnetic field increases with the strength of the field. This is because at the microscopic level, magnetic mo...
- Better Prevent than Tackle: Valuing Defense in Soccer Based on Graph Neural Networks : Abstract: Evaluating defensive performance in soccer remains challenging, as effective defending is often expressed not through visible on-ball actions such as interceptions and tackles, but through p...
- An Interpretable AI Tool for SAVR vs TAVR in Low to Intermediate Risk Patients with Severe Aortic Stenosis : Abstract: Background. Treatment selection for low to intermediate risk patients with severe aortic stenosis between surgical (SAVR) and transcatheter (TAVR) aortic valve replacement remains variable i...
- A Kernel-based Resource-efficient Neural Surrogate for Multi-fidelity Prediction of Aerodynamic Field : Abstract: Surrogate models provide fast alternatives to costly aerodynamic simulations and are extremely useful in design and optimization applications. This study proposes the use of a recent kernel-...
- R^2-HGP: A Double-Regularized Gaussian Process for Heterogeneous Transfer Learning : Abstract: Multi-output Gaussian process (MGP) models have attracted significant attention for their flexibility and uncertainty-quantification capabilities, and have been widely adopted in multi-sourc...
- Exact Recovery of Non-Random Missing Multidimensional Time Series via Temporal Isometric Delay-Embedding Transform : Abstract: Non-random missing data is a ubiquitous yet undertreated flaw in multidimensional time series, fundamentally threatening the reliability of data-driven analysis and decision-making. Pure low...
- MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification : Abstract: We present miniF2F-Dafny, the first translation of the mathematical reasoning benchmark miniF2F to an automated theorem prover: Dafny. Previously, the benchmark existed only in interactive t...
- Assessing Neuromorphic Computing for Fingertip Force Decoding from Electromyography : Abstract: High-density surface electromyography (HD-sEMG) provides a noninvasive neural interface for assistive and rehabilitation control, but mapping neural activity to user motor intent remains cha...
- CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation : Abstract: In practical deep learning deployment, the scarcity of data and the imbalance of label distributions often lead to semantically uncovered regions within the real-world data distribution, hin...
- Rethinking Causal Discovery Through the Lens of Exchangeability : Abstract: Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelli...
- Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences : Abstract: Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availabil...
- Sequence-to-Image Transformation for Sequence Classification Using Rips Complex Construction and Chaos Game Representation : Abstract: Traditional feature engineering approaches for molecular sequence classification suffer from sparsity issues and computational complexity, while deep learning models often underperform on ta...
- Partitioning the Sample Space for a More Precise Shannon Entropy Estimation : Abstract: Reliable data-driven estimation of Shannon entropy from small data sets, where the number of examples is potentially smaller than the number of possible outcomes, is a critical matter in sev...
- \textsc{Text2Graph}: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios : Abstract: Large Language Models (LLMs) have become effective zero-shot classifiers, but their high computational requirements and environmental costs limit their practicality for large-scale annotatio...
- Mitigating Exposure Bias in Risk-Aware Time Series Forecasting with Soft Tokens : Abstract: Autoregressive forecasting is central to predictive control in diabetes and hemodynamic management, where different operating zones carry different clinical risks. Standard models trained wi...
- Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition : Abstract: Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower...
- SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation : Abstract: In the unsupervised pre-training for reinforcement learning, the agent aims to learn a prior policy for downstream tasks without relying on task-specific reward functions. We focus on state ...
- Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation : Abstract: Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes ...
- Latent Action World Models for Control with Unlabeled Trajectories : Abstract: Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely o...
- BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization : Abstract: Constructing a Pareto set is pivotal for navigating the capability-efficiency trade-offs in Large Language Models (LLMs); however, existing merging techniques remain inadequate for this task...
- HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding : Abstract: Heterogeneous graph neural networks (HGNNs) have demonstrated strong capability in modeling complex semantics across multi-type nodes and relations. However, their scalability to large-scale...
- Faster Results from a Smarter Schedule: Reframing Collegiate Cross Country through Analysis of the National Running Club Database : Abstract: Collegiate cross country teams often build their season schedules on intuition rather than evidence, partly because large-scale performance datasets are not publicly accessible. To address t...
- Risk-Bounded Multi-Agent Visual Navigation via Iterative Risk Allocation : Abstract: Safe navigation is essential for autonomous systems operating in hazardous environments, especially when multiple agents must coordinate using only high-dimensional visual observations. Whil...
- MaskedManipulator: Versatile Whole-Body Manipulation : Abstract: We tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, t...
- PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving : Abstract: While significant progress has been made in research and development on open-source and cost-efficient large-language models (LLMs), serving scalability remains a critical challenge, particu...
- ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts : Abstract: We introduce ShapeWords, an approach for synthesizing images based on 3D shape guidance and text prompts. ShapeWords incorporates target 3D shape information within specialized tokens embedd...
- Brain-like emergent properties in deep networks: impact of network architecture, datasets and training : Abstract: Despite the rapid pace at which deep networks are improving on standardized vision benchmarks, they are still outperformed by humans on real-world vision tasks. One solution to this problem ...
- BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation : Abstract: Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, espec...
- Machine Learning for Quantifier Selection in cvc5 : Abstract: In this work we considerably improve the state-of-the-art SMT solving on first-order quantified problems by efficient machine learning guidance of quantifier selection. Quantifiers represent...
- Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning : Abstract: Multi-robot path finding in dynamic environments is a highly challenging classic problem. In the movement process, robots need to avoid collisions with other moving robots while minimizing t...
- SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model : Abstract: We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle...
- Hierarchical Dataset Selection for High-Quality Data Sharing : Abstract: The success of modern machine learning hinges on access to high-quality training data. In many real-world scenarios, such as acquiring data from public repositories or sharing across institu...
- Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation : Abstract: Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generation recently. However, applyin...
- ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning : Abstract: Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapi...
- AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation : Abstract: Recent advances in subject-driven video generation with large diffusion models have enabled personalized content synthesis conditioned on user-provided subjects. However, existing methods la...
- Mull-Tokens: Modality-Agnostic Latent Thinking : Abstract: Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimodal models exploring the poten...
- OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis : Abstract: Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-vi...
- Stronger Normalization-Free Transformers : Abstract: Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) has demonstrated that alternativ...
- Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks : Abstract: The construction of adversarial attacks for neural networks appears to be a crucial challenge for their deployment in various services. To estimate the adversarial robustness of a neural net...
- Any4D: Unified Feed-Forward Metric 4D Reconstruction : Abstract: We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction. Any4D directly generates per-pixel motion and geometry predictions for N frames, i...
- BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models : Abstract: Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework ...
- Decoupled Q-Chunking : Abstract: Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to boots...
- SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale : Abstract: The resource requirements of Neural Networks can be significantly reduced through pruning -- the removal of seemingly less important parameters. However, with the rise of Large Language Mode...
- UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting : Abstract: We study long-horizon exogenous-only temperature forecasting - a challenging univariate setting where only the past values of the indoor temperature are used for prediction - using linear an...
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence : Abstract: Spatial understanding over continuous visual input is crucial for MLLMs to evolve into general-purpose assistants in physical environments. Yet there is still no comprehensive benchmark that...
- Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants : Abstract: Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are ofte...
- Extrapolation of Periodic Functions Using Binary Encoding of Continuous Numerical Values : Abstract: We report the discovery that binary encoding allows neural networks to extrapolate periodic functions beyond their training bounds. We introduce Normalized Base-2 Encoding (NB2E) as a method...
- What matters for Representation Alignment: Global Information or Spatial Structure? : Abstract: Representation alignment (REPA) guides generative training by distilling representations from a strong, pretrained vision encoder to intermediate diffusion features. We investigate a fundame...
- LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification : Abstract: LabelFusion is a fusion ensemble for text classification that learns to combine a traditional transformer-based classifier (e.g., RoBERTa) with one or more Large Language Models (LLMs such a...
- The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality : Abstract: We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate ...
- Natural Language Interface for Firewall Configuration : Abstract: This paper presents the design and prototype implementation of a natural language interface for configuring enterprise firewalls. The framework allows administrators to express access contro...
- Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving : Abstract: Generative AI offers new opportunities for individualized and adaptive learning, particularly through large language model (LLM)-based feedback systems. While LLMs can produce effective feed...
- Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation : Abstract: Achieving high-performing language models which include medium- and lower-resource languages remains a challenge. Massively multilingual models still underperform compared to language-specif...
- Metaphor-based Jailbreaking Attacks on Text-to-Image Models : Abstract: Text-to-image~(T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreaking attacks have shown that adversarial promp...
- Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework : Abstract: The rapid adoption of generative AI has undermined traditional modular assessments in computing education, creating a disconnect between academic evaluation and industry practice. This paper...
- Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving : Abstract: Large language models (LLMs) have achieved significant progress in solving complex reasoning tasks by Reinforcement Learning with Verifiable Rewards (RLVR). This advancement is also insepara...
- LGAN: An Efficient High-Order Graph Neural Network via the Line Graph Aggregation : Abstract: Graph Neural Networks (GNNs) have emerged as a dominant paradigm for graph classification. Specifically, most existing GNNs mainly rely on the message passing strategy between neighbor nodes...
- Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation : Abstract: Textual data used to train large language models (LLMs) exhibits multifaceted bias manifestations encompassing harmful language and skewed demographic distributions. Regulations such as the ...
- PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code : Abstract: Large Language Model (LLM)-based code assistants have emerged as a powerful application of generative AI, demonstrating impressive capabilities in code generation and comprehension. A key re...
- How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning : Abstract: Connected and automated vehicles (CAVs) have the potential to enhance driving safety, for example by enabling safe vehicle following and more efficient traffic scheduling. For such future de...
- Rethinking Popularity Bias in Collaborative Filtering via Analytical Vector Decomposition : Abstract: Popularity bias fundamentally undermines the personalization capabilities of collaborative filtering (CF) models, causing them to disproportionately recommend popular items while neglecting ...
- Evaluating Gemini Robotics Policies in a Veo World Simulator : Abstract: Generative world models hold significant potential for simulating interactions with visuomotor policies in varied environments. Frontier video models can enable generation of realistic obser...
- Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval : Abstract: Semantic retrieval of remote sensing (RS) images is a critical task fundamentally challenged by the \textquote{semantic gap}, the discrepancy between a model's low-level visual features and ...
- LLM-Auction: Generative Auction towards LLM-Native Advertising : Abstract: The rapid advancement of large language models (LLMs) necessitates novel monetization strategies, among which LLM-native advertising has emerged as a promising paradigm by naturally integrat...
- Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning : Abstract: Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often re...
- UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning : Abstract: Robust adversarial reinforcement learning has emerged as an effective paradigm for training agents to handle uncertain disturbance in real environments, with critical applications in sequent...
- T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method : Abstract: Neural network constraint satisfaction is crucial for safety-critical applications such as power system optimization, robotic path planning, and autonomous driving. However, existing constra...
- Maximum Risk Minimization with Random Forests : Abstract: We consider a regression setting where observations are collected in different environments modeled by different data distributions. The field of out-of-distribution (OOD) generalization aim...
- Clustered Federated Learning with Hierarchical Knowledge Distillation : Abstract: Clustered Federated Learning (CFL) has emerged as a powerful approach for addressing data heterogeneity and ensuring privacy in large distributed IoT environments. By clustering clients and ...
- An M-Health Algorithmic Approach to Identify and Assess Physiotherapy Exercises in Real Time : Abstract: This work presents an efficient algorithmic framework for real-time identification, classification, and evaluation of human physiotherapy exercises using mobile devices. The proposed method ...
- Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers : Abstract: Since large language models (LLMs) have a tendency to generate factually inaccurate output, retrieval-augmented generation (RAG) has gained significant attention as a key means to mitigate t...
- Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction : Abstract: Deep learning has advanced vectorized road extraction in urban settings, yet off-road environments remain underexplored and challenging. A significant domain gap causes advanced models to fa...
- How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation : Abstract: The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by stude...
- Sliding Window Attention Adaptation : Abstract: The self-attention mechanism in Transformer-based Large Language Models (LLMs) scales quadratically with input length, making long-context inference expensive. Sliding window attention (SWA)...
- The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks : Abstract: Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks, typically reliant on heuristic brute-force methods. Despite significant empirical advanc...
- Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale : Abstract: Real-world AI software engineering demands coding agents that can reason over massive repositories, maintain durable memory across and within long sessions, and robustly coordinate complex t...
- Cross-modal Retrieval Models for Stripped Binary Analysis : Abstract: LLM-agent based binary code analysis has demonstrated significant potential across a wide range of software security scenarios, including vulnerability detection, malware analysis, etc. In a...
- The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation : Abstract: Conventional Sequential Recommender Systems (SRS) typically assign unique Hash IDs (HID) to construct item embeddings. These HID embeddings effectively learn collaborative information from h...
- Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies : Abstract: Large Vision Language Models (LVLMs) have made remarkable progress, enabling sophisticated vision-language interaction and dialogue applications. However, existing benchmarks primarily focus...
- Neural personal sound zones with flexible bright zone control : Abstract: Personal sound zone (PSZ) reproduction system, which attempts to create distinct virtual acoustic scenes for different listeners at their respective positions within the same spatial area us...
- D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning : Abstract: The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance privacy, trust, and incentives. Existing appro...
- GPG: Generalized Policy Gradient Theorem for Transformer-based Policies : Abstract: We present the Generalized Policy Gradient (GPG) Theorem, specifically designed for Transformer-based policies. Notably, we demonstrate that both standard Policy Gradient Theorem and GRPO em...
- Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) demonstrate impressive reasoning capabilities, but often fail to perceive fine-grained visual details, limiting their applicability in precision-dema...
- Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories : Abstract: Agentic systems built on large language models operate through recursive feedback loops, where each output becomes the next input. Yet the geometric behavior of these agentic loops (whether ...
- A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale : Abstract: Distributed machine learning systems require strong privacy guarantees, verifiable compliance, and scalable deploy- ment across heterogeneous and multi-cloud environments. This work introduc...
- Multilingual VLM Training: Adapting an English-Trained VLM to French : Abstract: Artificial intelligence has made great progress in recent years, particularly in the development of Vision--Language Models (VLMs) that understand both visual and textual data. However, thes...
- Translating Informal Proofs into Formal Proofs Using a Chain of States : Abstract: We address the problem of translating informal mathematical proofs expressed in natural language into formal proofs in Lean4 under a constrained computational budget. Our approach is grounde...
- High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments : Abstract: This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through...
- FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning : Abstract: Federated Learning (FL) enables collaborative model training across distributed devices while safeguarding data and user privacy. However, FL remains susceptible to privacy threats that can ...
- MotionEdit: Benchmarking and Learning Motion-Centric Image Editing : Abstract: We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibili...
- Graph Neural Network Based Adaptive Threat Detection for Cloud Identity and Access Management Logs : Abstract: The rapid expansion of cloud infrastructures and distributed identity systems has significantly increased the complexity and attack surface of modern enterprises. Traditional rule based or s...
- Computing Evolutionarily Stable Strategies in Imperfect-Information Games : Abstract: We present an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. Our main algorithm is for two-player ...
- Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters : Abstract: Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters a...
- RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection : Abstract: The proliferation of AI-generated video technologies poses challenges to information integrity. While recent benchmarks advance AIGC video detection, they overlook a critical factor: many st...
- InFerActive: Towards Scalable Human Evaluation of Large Language Models through Interactive Inference : Abstract: Human evaluation remains the gold standard for evaluating outputs of Large Language Models (LLMs). The current evaluation paradigm reviews numerous individual responses, leading to significa...
- Adaptive Information Routing for Multimodal Time Series Forecasting : Abstract: Time series forecasting is a critical task for artificial intelligence with numerous real-world applications. Traditional approaches primarily rely on historical time series data to predict ...
- Federated Domain Generalization with Latent Space Inversion : Abstract: Federated domain generalization (FedDG) addresses distribution shifts among clients in a federated learning framework. FedDG methods aggregate the parameters of locally trained client models...
- Offscript: Automated Auditing of Instruction Adherence in LLMs : Abstract: Large Language Models (LLMs) and generative search systems are increasingly used for information seeking by diverse populations with varying preferences for knowledge sourcing and presentati...
- Enhancing Large Language Models for End-to-End Circuit Analysis Problem Solving : Abstract: Large language models (LLMs) have shown strong performance in data-rich domains such as programming, but their reliability in engineering tasks remains limited. Circuit analysis -- requiring...
- Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning : Abstract: The safety alignment of large language models (LLMs) is becoming increasingly important with their democratization. In this paper, we study the safety degradation that comes with adapting LL...
- PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset : Abstract: Personalized review response generation presents a significant challenge in domains where user information is limited, such as food delivery platforms. While large language models (LLMs) off...
- Universal Hirschberg for Width Bounded Dynamic Programs : Abstract: Hirschberg's algorithm (1975) reduces the space complexity for the longest common subsequence problem from $O(N^2)$ to $O(N)$ via recursive midpoint bisection on a grid dynamic program (DP)....
- Workflow is All You Need: Escaping the "Statistical Smoothing Trap" via High-Entropy Information Foraging and Adversarial Pacing : Abstract: Central to long-form text generation in vertical domains is the "impossible trinity" confronting current large language models (LLMs): the simultaneous achievement of low hallucination, deep...
- VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio : Abstract: General-purpose audio representations aim to map acoustically variable instances of the same event to nearby points, resolving content identity in a zero-shot setting. Unlike supervised clas...
- CHyLL: Learning Continuous Neural Representations of Hybrid Systems : Abstract: Learning the flows of hybrid systems that have both continuous and discrete time dynamics is challenging. The existing method learns the dynamics in each discrete mode, which suffers from th...
- MedXAI: A Retrieval-Augmented and Self-Verifying Framework for Knowledge-Guided Medical Image Analysis : Abstract: Accurate and interpretable image-based diagnosis remains a fundamental challenge in medical AI, particularly un- der domain shifts and rare-class conditions. Deep learning mod- els often str...
- Defining the Scope of Learning Analytics: An Axiomatic Approach for Analytic Practice and Measurable Learning Phenomena : Abstract: Learning Analytics (LA) has rapidly expanded through practical and technological innovation, yet its foundational identity has remained theoretically under-specified. This paper addresses th...
- What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models : Abstract: This article looks at how reasoning works in current Large Language Models (LLMs) that function using the token-completion method. It examines their stochastic nature and their similarity to...
- Classifying Metamorphic versus Single-Fold Proteins with Statistical Learning and AlphaFold2 : Abstract: The remarkable success of AlphaFold2 in providing accurate atomic-level prediction of protein structures from their amino acid sequence has transformed approaches to the protein folding prob...
- DB2-TransF: All You Need Is Learnable Daubechies Wavelets for Time Series Forecasting : Abstract: Time series forecasting requires models that can efficiently capture complex temporal dependencies, especially in large-scale and high-dimensional settings. While Transformer-based architect...
- Detailed balance in large language model-driven agents : Abstract: Large language model (LLM)-driven agents are emerging as a powerful new paradigm for solving complex problems. Despite the empirical success of these practices, a theoretical framework to un...
- MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata : Abstract: Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of thes...
- Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs : Abstract: Fine-tuning is integral for aligning large language models (LLMs) with human preferences. Multiple-Reference Preference Optimization (MRPO) builds on Direct Preference Optimization (DPO) by ...
- Cluster-Dags as Powerful Background Knowledge For Causal Discovery : Abstract: Finding cause-effect relationships is of key importance in science. Causal discovery aims to recover a graph from data that succinctly describes these cause-effect relationships. However, cu...
- ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects : Abstract: Weakly supervised oriented object detection (WS-OOD) has gained attention as a cost-effective alternative to fully supervised methods, providing both efficiency and high accuracy. Among weak...
- ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs : Abstract: Machine unlearning aims to remove the influence of specific data points from a trained model to satisfy privacy, copyright, and safety requirements. In real deployments, providers distribute...
- ELANA: A Simple Energy and Latency Analyzer for LLMs : Abstract: The latency and power consumption of large language models (LLMs) are major constraints when serving them across a wide spectrum of hardware platforms, from mobile edge devices to cloud GPU ...
- Norm-Governed Multi-Agent Decision-Making in Simulator-Coupled Environments:The Reinsurance Constrained Multi-Agent Simulation Process (R-CMASP) : Abstract: Reinsurance decision-making exhibits the core structural properties that motivate multi-agent models: distributed and asymmetric information, partial observability, heterogeneous epistemic r...
- IoTEdu: Access Control, Detection, and Automatic Incident Response in Academic IoT Networks : Abstract: The growing presence of IoT devices in academic environments has increased operational complexity and exposed security weaknesses, especially in academic institutions without unified policie...
- On Decision-Making Agents and Higher-Order Causal Processes : Abstract: We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions, the classical limit of higher...
- Multi-Granular Node Pruning for Circuit Discovery : Abstract: Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruni...
- LLMs Can Assist with Proposal Selection at Large User Facilities : Abstract: We explore how large language models (LLMs) can enhance the proposal selection process at large user facilities, offering a scalable, consistent, and cost-effective alternative to traditiona...
- V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions : Abstract: Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically e...
- Agile Deliberation: Concept Deliberation for Subjective Visual Classification : Abstract: From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users...
- HAROOD: A Benchmark for Out-of-distribution Generalization in Sensor-based Human Activity Recognition : Abstract: Sensor-based human activity recognition (HAR) mines activity patterns from the time-series sensory data. In realistic scenarios, variations across individuals, devices, environments, and tim...
- Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly : Abstract: Retrieval-Augmented Generation (RAG) systems often fail on multi-hop queries when the initial retrieval misses a bridge fact. Prior corrective approaches, such as Self-RAG, CRAG, and Adaptiv...
- COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators : Abstract: Background: While intravascular imaging, particularly optical coherence tomography (OCT), improves percutaneous coronary intervention (PCI) outcomes, its interpretation is operator-dependent...
- Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution : Abstract: Procedural memory enables large language model (LLM) agents to internalize "how-to" knowledge, theoretically reducing redundant trial-and-error. However, existing frameworks predominantly su...
- Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning : Abstract: Recent advances in vision-language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), w...
- Challenges of Evaluating LLM Safety for User Welfare : Abstract: Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice ...
- AEBNAS: Strengthening Exit Branches in Early-Exit Networks through Hardware-Aware Neural Architecture Search : Abstract: Early-exit networks are effective solutions for reducing the overall energy consumption and latency of deep learning models by adjusting computation based on the complexity of input data. By...
- On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity : Abstract: As Large Language Models (LLM) based multi-agent systems become increasingly prevalent, the collective behaviors, e.g., collective intelligence, of such artificial communities have drawn gro...
- CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models : Abstract: Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns as these systems are increasingly deployed at scale. Existing inference-time mitigati...
- Refinement Contrastive Learning of Cell-Gene Associations for Unsupervised Cell Type Identification : Abstract: Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been de...
- Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs : Abstract: Data center (DC) infrastructure serves as the backbone to support the escalating demand for computing capacity. Traditional design methodologies that blend human expertise with specialized s...
- NormCode: A Semi-Formal Language for Context-Isolated AI Planning : Abstract: Multistep workflows that chain large language model (LLM) calls suffer from context pollution: as information accumulates across steps, models hallucinate, confuse intermediate outputs, and ...
- Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning : Abstract: Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of for...
- Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation : Abstract: Procedural Content Generation (PCG) offers scalable methods for algorithmically creating complex, customizable worlds. However, controlling these pipelines requires the precise configuration...
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection : Abstract: The landscape of scientific peer review is rapidly evolving with the integration of Large Language Models (LLMs). This shift is driven by two parallel trends: the widespread individual adopt...
- Targeted Data Protection for Diffusion Model by Matching Training Trajectory : Abstract: Recent advancements in diffusion models have made fine-tuning text-to-image models for personalization increasingly accessible, but have also raised significant concerns regarding unauthoriz...
- Representation of the structure of graphs by sequences of instructions : Abstract: The representation of graphs is commonly based on the adjacency matrix concept. This formulation is the foundation of most algebraic and computational approaches to graph processing. The adv...
- Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention : Abstract: Recently, reinforcement learning (RL) has become a common choice in enhancing the reasoning capabilities of vision-language models (VLMs). Considering existing RL- based finetuning methods, ...
- AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management : Abstract: The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the...
- LLM-Empowered Representation Learning for Emerging Item Recommendation : Abstract: In this work, we tackle the challenge of recommending emerging items, whose interactions gradually accumulate over time. Existing methods often overlook this dynamic process, typically assum...
- REMISVFU: Vertical Federated Unlearning via Representation Misdirection for Intermediate Output Feature : Abstract: Data-protection regulations such as the GDPR grant every participant in a federated system a right to be forgotten. Federated unlearning has therefore emerged as a research frontier, aiming ...
- On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering : Abstract: Inference-time steering enables pretrained diffusion/flow models to be adapted to new tasks without retraining. A widely used approach is the ratio-of-densities method, which defines a time-...
- User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation : Abstract: Vision-and-Language Navigation (VLN) requires agents to navigate complex environments by following natural-language instructions. General Scene Adaptation for VLN (GSA-VLN) shifts the focus ...
- EpiPlanAgent: Agentic Automated Epidemic Response Planning : Abstract: Epidemic response planning is essential yet traditionally reliant on labor-intensive manual methods. This study aimed to design and evaluate EpiPlanAgent, an agent-based system using large l...
- InfoCom: Kilobyte-Scale Communication-Efficient Collaborative Perception with Information Bottleneck : Abstract: Precise environmental perception is critical for the reliability of autonomous driving systems. While collaborative perception mitigates the limitations of single-agent perception through in...
- Trustworthy Orchestration Artificial Intelligence by the Ten Criteria with Control-Plane Governance : Abstract: As Artificial Intelligence (AI) systems increasingly assume consequential decision-making roles, a widening gap has emerged between technical capabilities and institutional accountability. E...
- Investigating The Functional Roles of Attention Heads in Vision Language Models: Evidence for Reasoning Modules : Abstract: Despite excelling on multimodal benchmarks, vision-language models (VLMs) largely remain a black box. In this paper, we propose a novel interpretability framework to systematically analyze t...
- Neuronal Attention Circuit (NAC) for Representation Learning : Abstract: Attention improves representation learning over RNNs, but its discrete nature limits continuous-time (CT) modeling. We introduce Neuronal Attention Circuit (NAC), a novel, biologically plaus...
- Reverse Thinking Enhances Missing Information Detection in Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in various reasoning tasks, yet they often struggle with problems involving missing information, exhibiting issues such...
- ID-PaS : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs : Abstract: Mixed-Integer Linear Programs (MIPs) are powerful and flexible tools for modeling a wide range of real-world combinatorial optimization problems. Predict-and-Search methods operate by using ...
- An exploration for higher efficiency in multi objective optimisation with reinforcement learning : Abstract: Efficiency in optimisation and search processes persists to be one of the challenges, which affects the performance and use of optimisation algorithms. Utilising a pool of operators instead ...
- CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment : Abstract: Medical care follows complex clinical pathways that extend beyond isolated physician-patient encounters, emphasizing decision-making and transitions between different stages. Current benchma...
- The 2025 Foundation Model Transparency Index : Abstract: Foundation model developers are among the world's most important companies. As these companies become increasingly consequential, how do their transparency practices evolve? The 2025 Foundat...
- AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice : Abstract: Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer f...
- Modeling Narrative Archetypes in Conspiratorial Narratives: Insights from Singapore-Based Telegram Groups : Abstract: Conspiratorial discourse is increasingly embedded within digital communication ecosystems, yet its structure and spread remain difficult to study. This work analyzes conspiratorial narrative...
- Robust AI Security and Alignment: A Sisyphean Endeavor? : Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by extending Gödel's incompleteness theorem to AI. Knowing these limitations and pre...
- Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit : Abstract: Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often r...
- Linear socio-demographic representations emerge in Large Language Models from indirect cues : Abstract: We investigate how LLMs encode sociodemographic attributes of human conversational partners inferred from indirect cues such as names and occupations. We show that LLMs develop linear repres...
- Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research : Abstract: While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" syste...
- Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning : Abstract: Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While ``Decomposition-and-Fill'' meth...
- SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration : Abstract: Recent advances in foundation models have shown promising results in developing generalist robotics that can perform diverse tasks in open-ended scenarios given multimodal inputs. However, c...
- DynaMate: An Autonomous Agent for Protein-Ligand Molecular Dynamics Simulations : Abstract: Force field-based molecular dynamics (MD) simulations are indispensable for probing the structure, dynamics, and functions of biomolecular systems, including proteins and protein-ligand comp...
- Exploring LLMs for Scientific Information Extraction Using The SciEx Framework : Abstract: Large language models (LLMs) are increasingly touted as powerful tools for automating scientific information extraction. However, existing methods and tools often struggle with the realities...
- Fuzzy Hierarchical Multiplex : Abstract: A new fuzzy optimization framework that extends FCM causality is proposed. This model utilizes the dynamics to map data into metrics and create a framework that examines logical implication ...
- Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting : Abstract: Echocardiography is central to contemporary cardiovascular care, but full-study interpretation remains a cognitively demanding, multi-view task that is still performed manually. While recent...
- Exploring Health Misinformation Detection with Multi-Agent Debate : Abstract: Fact-checking health-related claims has become increasingly critical as misinformation proliferates online. Effective verification requires both the retrieval of high-quality evidence and ri...
- Suzume-chan: Your Personal Navigator as an Embodied Information Hub : Abstract: Access to expert knowledge often requires real-time human communication. Digital tools improve access to information but rarely create the sense of connection needed for deep understanding. ...
- ExaCraft: Dynamic Learning Context Adaptation for Personalized Educational Examples : Abstract: Learning is most effective when it's connected to relevant, relatable examples that resonate with learners on a personal level. However, existing educational AI tools don't focus on generati...
Research Sources: 360 | Generated: 12/12/2025
