AI RESEARCH PAPERS & ACADEMIC SOURCES
- Test-time Correction: An Online 3D Detection System via Visual Prompting : Abstract: This paper introduces Test-time Correction (TTC), an online 3D detection system designed to rectify test-time errors using various auxiliary feedback, aiming to enhance the safety of deploye...
- GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark : Abstract: Text-to-3D (T23D) generation has emerged as a crucial visual generation task, aiming at synthesizing 3D content from textual descriptions. Studies of this task are currently shifting from pe...
- 3D and 4D World Modeling: A Survey : Abstract: World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes gener...
- InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue : Abstract: We introduce InteractiveOmni, a unified and open-source omni-modal large language model for audio-visual multi-turn interaction, ranging from 4B to 8B parameters, designed to lead the field ...
- Beyond the Ground Truth: Enhanced Supervision for Image Restoration : Abstract: Deep learning-based image restoration has achieved significant success. However, when addressing real-world degradations, model performance is limited by the quality of ground-truth images i...
- MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction : Abstract: Recent stateful recurrent neural networks have achieved remarkable progress on static 3D reconstruction but remain vulnerable to motion-induced artifacts, where non-rigid regions corrupt att...
- TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning : Abstract: Enhancing the temporal understanding of Multimodal Large Language Models (MLLMs) is essential for advancing long-form video analysis, enabling tasks such as temporal localization, action det...
- Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization : Abstract: Tuning-free face personalization methods have developed along two distinct paradigms: text embedding approaches that map facial features into the text embedding space, and adapter-based meth...
- BlurDM: A Blur Diffusion Model for Image Deblurring : Abstract: Diffusion models show promise for dynamic scene deblurring; however, existing studies often fail to leverage the intrinsic nature of the blurring process within diffusion models, limiting th...
- DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment : Abstract: Drag-based image editing using generative models provides intuitive control over image structures. However, existing methods rely heavily on manually provided masks and textual prompts to pr...
- DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation : Abstract: Vision-Language Models (VLMs) deployed in safety-critical applications such as autonomous driving must handle continuous visual streams under imperfect conditions. However, existing benchmar...
- Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation : Abstract: Test-time scaling (TTS) aims to achieve better results by increasing random sampling and evaluating samples based on rules and metrics. However, in text-to-image(T2I) diffusion models, most ...
- On the Temporality for Sketch Representation Learning : Abstract: Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a ...
- Emergent Outlier View Rejection in Visual Geometry Grounded Transformers : Abstract: Reliable 3D reconstruction from in-the-wild image collections is often hindered by "noisy" images-irrelevant inputs with little or no view overlap with others. While traditional Structure-fr...
- Learning Group Actions In Disentangled Latent Image Representations : Abstract: Modeling group actions on latent representations enables controllable transformations of high-dimensional image data. Prior works applying group-theoretic priors or modeling transformations ...
- Ultra-lightweight Neural Video Representation Compression : Abstract: Recent works have demonstrated the viability of utilizing over-fitted implicit neural representations (INRs) as alternatives to autoencoder-based models for neural video compression. Among t...
- C3G: Learning Compact 3D Representations with 2K Gaussians : Abstract: Reconstructing and understanding 3D scenes from unposed sparse views in a feed-forward manner remains as a challenging task in 3D computer vision. Recent approaches use per-pixel 3D Gaussian...
- RELIC: Interactive Video World Model with Long-Horizon Memory : Abstract: A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, most existing approaches addre...
- SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL : Abstract: Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied applications. The agentic para...
- PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design : Abstract: Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this pro...
- SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows : Abstract: Normalizing Flows (NFs) learn invertible mappings between the data and a Gaussian distribution. Prior works usually suffer from two limitations. First, they add random noise to training samp...
- Unique Lives, Shared World: Learning from Single-Life Videos : Abstract: We introduce the "single-life" learning paradigm, where we train a distinct vision model exclusively on egocentric videos captured by one individual. We leverage the multiple viewpoints natu...
- LATTICE: Democratize High-Fidelity 3D Generation at Scale : Abstract: We present LATTICE, a new framework for high-fidelity 3D asset generation that bridges the quality and scalability gap between 3D and 2D generative models. While 2D image synthesis benefits ...
- PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer : Abstract: Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient singl...
- Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments : Abstract: The deployment of multi-agent systems in dynamic, adversarial environments like robotic soccer necessitates real-time decision-making, sophisticated cooperation, and scalable algorithms to a...
- Kaleidoscopic Scintillation Event Imaging : Abstract: Scintillators are transparent materials that interact with high-energy particles and emit visible light as a result. They are used in state of the art methods of measuring high-energy partic...
- What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models : Abstract: In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance f...
- MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization : Abstract: Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic a...
- RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL : Abstract: Achieving generalizable embodied policies remains a key challenge. Traditional policy learning paradigms, including both Imitation Learning (IL) and Reinforcement Learning (RL), struggle to ...
- Artificial Microsaccade Compensation: Stable Vision for an Ornithopter : Abstract: Animals with foveated vision, including humans, experience microsaccades, small, rapid eye movements that they are not aware of. Inspired by this phenomenon, we develop a method for "Artific...
- Radiance Meshes for Volumetric Reconstruction : Abstract: We introduce radiance meshes, a technique for representing radiance fields with constant density tetrahedral cells produced with a Delaunay tetrahedralization. Unlike a Voronoi diagram, a De...
- PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference : Abstract: This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) mod...
- Margin-aware Preference Optimization for Aligning Diffusion Models without Reference : Abstract: Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability-but this creates a fundamental problem we call "reference mism...
- NVRC: Neural Video Representation Compression : Abstract: Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR met...
- Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding : Abstract: Recent advancements in foundation models for 2D vision have substantially improved the analysis of dynamic scenes from monocular videos. However, despite their strong generalization capabili...
- LAMP: Language-Assisted Motion Planning for Controllable Video Generation : Abstract: Video generation has achieved remarkable progress in visual fidelity and controllability, enabling conditioning on text, layout, or motion. Among these, motion control - specifying object dy...
- ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation : Abstract: We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based ap...
- FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features : Abstract: Although the remarkable performance of deep neural networks (DNNs) in image classification, their vulnerability to adversarial attacks remains a critical challenge. Most existing detection m...
- MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms : Abstract: Deep convolutional neural networks (DCNNs) have substantially advanced object detection capabilities, particularly in remote sensing imagery. However, challenges persist, especially in detec...
- Multi-Scale Visual Prompting for Lightweight Small-Image Classification : Abstract: Visual prompting has recently emerged as an efficient strategy to adapt vision models using lightweight, learnable parameters injected into the input space. However, prior work mainly target...
- ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos : Abstract: A core capability towards general embodied intelligence lies in localizing task-relevant objects from an egocentric perspective, formulated as Spatio-Temporal Video Grounding (STVG). Despite...
- Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning : Abstract: In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ...
- ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers : Abstract: Diffusion transformers have demonstrated strong capabilities in generating high-quality images. However, as model size increases, the growing memory footprint and inference latency pose sign...
- GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces : Abstract: 3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation...
- Active Visual Perception: Opportunities and Challenges : Abstract: Active visual perception refers to the ability of a system to dynamically engage with its environment through sensing and action, allowing it to modify its behavior in response to specific g...
- Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images : Abstract: Perceptual similarity scores that align with human vision are critical for both training and evaluating computer vision models. Deep perceptual losses, such as LPIPS, achieve good alignment ...
- DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D Reconstruction : Abstract: This paper presents DINO-RotateMatch, a deep-learning framework designed to address the chal lenges of image matching in large-scale 3D reconstruction from unstructured Internet images. The ...
- PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention : Abstract: The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still stru...
- Out-of-the-box: Black-box Causal Attacks on Object Detectors : Abstract: Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, ...
- Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification : Abstract: Two-stage learning pipeline has achieved promising results in unsupervised visible-infrared person re-identification (USL-VI-ReID). It first performs single-modality learning and then operat...
- Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models : Abstract: Text-to-image (T2I) diffusion models have achieved widespread success due to their ability to generate high-resolution, photorealistic images. These models are trained on large-scale dataset...
- Research on Brain Tumor Classification Method Based on Improved ResNet34 Network : Abstract: Previously, image interpretation in radiology relied heavily on manual methods. However, manual classification of brain tumor medical images is time-consuming and labor-intensive. Even with ...
- LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling : Abstract: Visual Autoregressive (VAR) modeling approach for image generation proposes autoregressive processing across hierarchical scales, decoding multiple tokens per scale in parallel. This method ...
- A Robust Camera-based Method for Breath Rate Measurement : Abstract: Proliferation of cheap and accessible cameras makes it possible to measure a subject's breath rate from video footage alone. Recent works on this topic have proposed a variety of approaches ...
- Lean Unet: A Compact Model for Image Segmentation : Abstract: Unet and its variations have been standard in semantic image segmentation, especially for computer assisted radiology. Current Unet architectures iteratively downsample spatial resolution wh...
- Heatmap Pooling Network for Action Recognition from RGB Videos : Abstract: Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB vid...
- CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation : Abstract: Prevailing Dataset Distillation (DD) methods leveraging generative models confront two fundamental limitations. First, despite pioneering the use of diffusion models in DD and delivering imp...
- PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation : Abstract: Cardiac image analysis remains fragmented across tasks: anatomical segmentation, disease classification, and grounded clinical report generation are typically handled by separate networks tr...
- Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba : Abstract: Traffic image restoration under adverse weather conditions remains a critical challenge for intelligent transportation systems. Existing methods primarily focus on spatial-domain modeling bu...
- Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern population : Abstract: Artificial intelligence (AI) is increasingly used in digital pathology. Publicly available histopathology datasets remain scarce, and those that do exist predominantly represent Western popu...
- Diminishing Returns in Self-Supervised Learning : Abstract: While transformer-based architectures have taken computer vision and NLP by storm, they often require a vast amount of parameters and training data to attain strong performance. In this work...
- An Automated Framework for Large-Scale Graph-Based Cerebrovascular Analysis : Abstract: We present CaravelMetrics, a computational framework for automated cerebrovascular analysis that models vessel morphology through skeletonization-derived graph representations. The framework...
- Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy : Abstract: Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TN...
- Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence : Abstract: The remarkable success in text-to-image diffusion models has motivated extensive investigation of their potential for video applications. Zero-shot techniques aim to adapt image diffusion mo...
- UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework : Abstract: We propose UniMo, an innovative autoregressive model for joint modeling of 2D human videos and 3D human motions within a unified framework, enabling simultaneous generation and understanding...
- DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding : Abstract: State Space Models (SSMs) demonstrate significant potential for long-sequence modeling, but their reliance on input order conflicts with the irregular nature of point clouds. Existing approa...
- Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry Applications : Abstract: Autonomous UAV forestry operations require robust depth estimation methods with strong cross-domain generalization. However, existing evaluations focus on urban and indoor scenarios, leaving...
- Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features : Abstract: Hyperspectral imaging (HSI) enables detailed land cover classification, yet low spatial resolution and sparse annotations pose significant challenges. We present a label-efficient framework ...
- Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation : Abstract: Vision-language pretraining (VLP) has emerged as a powerful paradigm in medical image analysis, enabling representation learning from large-scale image-text pairs without relying on expensiv...
- LM-CartSeg: Automated Segmentation of Lateral and Medial Cartilage and Subchondral Bone for Radiomics Analysis : Abstract: Background and Objective: Radiomics of knee MRI requires robust, anatomically meaningful regions of interest (ROIs) that jointly capture cartilage and subchondral bone. Most existing work re...
- GeoVideo: Introducing Geometric Regularization into Video Generation Model : Abstract: Recent advances in video generation have enabled the synthesis of high-quality and visually realistic clips using diffusion transformer models. However, most existing approaches operate pure...
- Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles : Abstract: Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically stru...
- Difference Decomposition Networks for Infrared Small Target Detection : Abstract: Infrared small target detection (ISTD) faces two major challenges: a lack of discernible target texture and severe background clutter, which results in the background obscuring the target. T...
- Procedural Mistake Detection via Action Effect Modeling : Abstract: Mistake detection in procedural tasks is essential for building intelligent systems that support learning and task execution. Existing approaches primarily analyze how an action is performed...
- Towards Object-centric Understanding for Instructional Videos : Abstract: Understanding procedural activities is crucial for developing future assistive AI that can reason about complex real-world tasks. Existing action-centric methods struggle with the flexibilit...
- EEA: Exploration-Exploitation Agent for Long Video Understanding : Abstract: Long-form video understanding requires efficient navigation of extensive visual data to pinpoint sparse yet critical information. Current approaches to longform video understanding either su...
- Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation : Abstract: Recent domain generalized semantic segmentation (DGSS) studies have achieved notable improvements by distilling semantic knowledge from Vision-Language Models (VLMs). However, they overlook ...
- AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model : Abstract: This paper presents a preliminary investigation into automated dance movement analysis using contemporary computer vision techniques. We propose a proof-of-concept framework that integrates ...
- CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving : Abstract: Crowdsourcing enables scalable autonomous driving map construction, but low-cost sensor noise hinders quality from improving with data volume. We propose CSMapping, a system that produces ac...
- FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation : Abstract: We present FloodDiffusion, a new framework for text-driven, streaming human motion generation. Given time-varying text prompts, FloodDiffusion generates text-aligned, seamless motion sequenc...
- OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation : Abstract: Generalizing open-vocabulary 3D instance segmentation (OV-3DIS) to diverse, unstructured, and mesh-free environments is crucial for robotics and AR/VR, yet remains a significant challenge. W...
- Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation : Abstract: Achieving precise alignment between user intent and generated visuals remains a central challenge in text-to-visual generation, as a single attempt often fails to produce the desired output....
- CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation : Abstract: Cooking is a sequential and visually grounded activity, where each step such as chopping, mixing, or frying carries both procedural logic and visual semantics. While recent diffusion models ...
- V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention : Abstract: Multimodal Large Language Models (MLLMs) excel in numerous vision-language tasks yet suffer from hallucinations, producing content inconsistent with input visuals, that undermine reliability...
- Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching : Abstract: Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimoda...
- GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models : Abstract: Articulated object generation has seen increasing advancements, yet existing models often lack the ability to be conditioned on text prompts. To address the significant gap between textual d...
- Global-Local Aware Scene Text Editing : Abstract: Scene Text Editing (STE) involves replacing text in a scene image with new target text while preserving both the original text style and background texture. Existing methods suffer from two ...
- UniComp: Rethinking Video Compression Through Informational Uniqueness : Abstract: Distinct from attention-based compression methods, this paper presents an information uniqueness driven video compression framework, termed UniComp, which aims to maximize the information fi...
- Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation Learning : Abstract: Universal, transferable whole-slide image (WSI) representations are central to computational pathology. Incorporating multiple markers (e.g., immunohistochemistry, IHC) alongside H&E enriche...
- Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processes : Abstract: We propose the Dynamic Optical Test for Bot Identification (DOT-BI): a quick and easy method that uses human perception of motion to differentiate between human respondents and automated sys...
- Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation : Abstract: Handling fast, complex, and highly non-linear motion patterns has long posed challenges for video frame interpolation. Although recent diffusion-based approaches improve upon traditional opt...
- Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse Folding : Abstract: The RNA inverse folding problem, a key challenge in RNA design, involves identifying nucleotide sequences that can fold into desired secondary structures, which are critical for ensuring mol...
- CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale Textures : Abstract: We present a CloseUpAvatar - a novel approach for articulated human avatar representation dealing with more general camera motions, while preserving rendering quality for close-up views. Clo...
- HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation : Abstract: Medical image segmentation is a cornerstone of modern clinical diagnostics. While Vision Transformers that leverage shifted window-based self-attention have established new benchmarks in thi...
- Memory-Guided Point Cloud Completion for Dental Reconstruction : Abstract: Partial dental point clouds often suffer from large missing regions caused by occlusion and limited scanning views, which bias encoder-only global features and force decoders to hallucinate ...
- SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling : Abstract: Recent advancements in Large Language Models (LLMs) have created new opportunities to enhance performance on complex reasoning tasks by leveraging test-time computation. However, existing sc...
- Hierarchical Process Reward Models are Symbolic Vision Learners : Abstract: Symbolic computer vision represents diagrams through explicit logical rules and structured representations, enabling interpretable understanding in machine vision. This requires fundamentall...
- Does Head Pose Correction Improve Biometric Facial Recognition? : Abstract: Biometric facial recognition models often demonstrate significant decreases in accuracy when processing real-world images, often characterized by poor quality, non-frontal subject poses, and...
- Object Counting with GPT-4o and GPT-5: A Comparative Study : Abstract: Zero-shot object counting attempts to estimate the number of object instances belonging to novel categories that the vision model performing the counting has never encountered during trainin...
- LLM-Guided Material Inference for 3D Point Clouds : Abstract: Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM...
- 2-Shots in the Dark: Low-Light Denoising with Minimal Data Acquisition : Abstract: Raw images taken in low-light conditions are very noisy due to low photon count and sensor noise. Learning-based denoisers have the potential to reconstruct high-quality images. For training...
- PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement : Abstract: Latent Diffusion Models (LDMs) have markedly advanced the quality of image inpainting and local editing. However, the inherent latent compression often introduces pixel-level inconsistencies...
- SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding : Abstract: Spatial reasoning in large-scale 3D environments remains challenging for current vision-language models, which are typically constrained to room-scale scenarios. We introduce H$^2$U3D (Holis...
- HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration : Abstract: Generative models are prone to hallucinations: plausible but incorrect structures absent in the ground truth. This issue is problematic in image restoration for safety-critical domains such ...
- Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical Keratoconus : Abstract: The detection of weak, spatially distributed anomalies in volumetric medical imaging remains a major challenge. The subtle, non-adjacent nature of early disease signals is often lost due to ...
- SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation : Abstract: Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to subopti...
- A Hybrid Deep Learning Framework with Explainable AI for Lung Cancer Classification with DenseNet169 and SVM : Abstract: Lung cancer is a very deadly disease worldwide, and its early diagnosis is crucial for increasing patient survival rates. Computed tomography (CT) scans are widely used for lung cancer diagn...
- FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting : Abstract: Fine-grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision-making precision. However, existing research predominantly focuses on coarse spatiot...
- ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding : Abstract: We introduce ShelfGaussian, an open-vocabulary multi-modal Gaussian-based 3D scene understanding framework supervised by off-the-shelf vision foundation models (VFMs). Gaussian-based methods...
- MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification : Abstract: Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has recently emerged as a critical yet underexplored task in maritime intelligence and su...
- ViDiC: Video Difference Captioning : Abstract: Understanding visual differences between dynamic scenes requires the comparative perception of compositional, spatial, and temporal changes--a capability that remains underexplored in existi...
- YOLOA: Real-Time Affordance Detection via LLM Adapter : Abstract: Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI by understanding "what" an object is, "where" the object is located, and "how" it can b...
- AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation : Abstract: Accurate and reliable search on online healthcare platforms is critical for user safety and service efficacy. Traditional methods, however, often fail to comprehend complex and nuanced user ...
- Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5 : Abstract: Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by...
- Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology : Abstract: Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a ...
- Training and Evaluation of Guideline-Based Medical Reasoning in LLMs : Abstract: Machine learning for early prediction in medicine has recently shown breakthrough performance, however, the focus on improving prediction accuracy has led to a neglect of faithful explanatio...
- Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers : Abstract: Transformer decoders have achieved strong results across tasks, but the memory required for the KV cache becomes prohibitive at long sequence lengths. Although Cross-layer KV Cache sharing (...
- BERnaT: Basque Encoders for Representing Natural Textual Diversity : Abstract: Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model robustness and ...
- Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions : Abstract: While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can re...
- Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study : Abstract: Adapting large language models (LLMs) to low-resource languages remains a major challenge due to data scarcity and cross-lingual drift. This work presents a two-stage adaptation of Qwen2.5-3...
- Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models : Abstract: Tokenizer adaptation plays an important role in transferring pre-trained language models to new domains or languages. In this work, we address two complementary aspects of this process: voca...
- AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving : Abstract: As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficiency and optimizing service-lev...
- Jina-VLM: Small Multilingual Vision Language Model : Abstract: We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The model couples a SigLIP2 vision...
- SkillFactory: Self-Distillation For Learning Cognitive Behaviors : Abstract: Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous w...
- Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping : Abstract: Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Weste...
- Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks : Abstract: Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with little supervision. Although it is increas...
- Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority : Abstract: A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge cu...
- LLM-Generated Ads: From Personalization Parity to Persuasion Superiority : Abstract: As large language models (LLMs) become increasingly capable of generating persuasive content, understanding their effectiveness across different advertising strategies becomes critical. This...
- Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models : Abstract: Recent large vision-language models (LVLMs) have been applied to diverse VQA tasks. However, achieving practical performance typically requires task-specific fine-tuning with large numbers o...
- Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits : Abstract: In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. T...
- NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation : Abstract: The Segment Anything Model (SAM) has emerged as a powerful visual foundation model for image segmentation. However, adapting SAM to specific downstream tasks, such as medical and agricultura...
- M3DR: Towards Universal Multilingual Multimodal Document Retrieval : Abstract: Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric,...
- CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding : Abstract: The rise of Visual-Language Models (LVLMs) has unlocked new possibilities for seamlessly integrating visual and textual information. However, their ability to interpret cartographic maps rem...
- Thinking with Programming Vision: Towards a Unified View for Thinking with Images : Abstract: Multimodal large language models (MLLMs) that think with images can interactively use tools to reason about visual inputs, but current approaches often rely on a narrow set of tools with lim...
- Stable Signer: Hierarchical Sign Language Generative Model : Abstract: Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose, Pose2Vid stages, and some con...
- A Group Fairness Lens for Large Language Models : Abstract: The need to assess LLMs for bias and fairness is critical, with current evaluations often being narrow, missing a broad categorical view. In this paper, we propose evaluating the bias and fa...
- IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web : Abstract: Recently advancements in large multimodal models have led to significant strides in image comprehension capabilities. Despite these advancements, there is a lack of the robust benchmark spec...
- DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception : Abstract: Robust semantic perception for autonomous vehicles relies on effectively combining multiple sensors with complementary strengths and weaknesses. State-of-the-art sensor fusion approaches to ...
- Alleviating Choice Supportive Bias in LLM with Reasoning Dependency Generation : Abstract: Recent studies have demonstrated that some Large Language Models exhibit choice-supportive bias (CSB) when performing evaluations, systematically favoring their chosen options and potentiall...
- InvertiTune: High-Quality Data Synthesis for Cost-Effective Single-Shot Text-to-Knowledge Graph Generation : Abstract: Large Language Models (LLMs) have revolutionized the ability to understand and generate text, enabling significant progress in automatic knowledge graph construction from text (Text2KG). Man...
- Identifying attributions of causality in political text : Abstract: Explanations are a fundamental element of how people make sense of the political world. Citizens routinely ask and answer questions about why events happen, who is responsible, and what coul...
- Modeling Topics and Sociolinguistic Variation in Code-Switched Discourse: Insights from Spanish-English and Spanish-Guaran\'i : Abstract: This study presents an LLM-assisted annotation pipeline for the sociolinguistic and topical analysis of bilingual discourse in two typologically distinct contexts: Spanish-English and Spanis...
- PERCS: Persona-Guided Controllable Biomedical Summarization Dataset : Abstract: Automatic medical text simplification plays a key role in improving health literacy by making complex biomedical research accessible to diverse readers. However, most existing resources assu...
- Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning : Abstract: Autoregressive Language Models (LLMs) trained on Next-Token Prediction (NTP) often suffer from ``Topic Drift'' where the generation wanders away from the initial prompt due to a reliance on ...
- From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation : Abstract: Logical reasoning is a core challenge in natural language understanding and a fundamental capability of artificial intelligence, underpinning scientific discovery, mathematical theorem provi...
- Nexus: Higher-Order Attention Mechanisms in Transformers : Abstract: Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies. However, the standard first-order attention mechanism is often limit...
- Characterizing Language Use in a Collaborative Situated Game : Abstract: Cooperative video games, where multiple participants must coordinate by communicating and reasoning under uncertainty in complex environments, yield a rich source of language data. We collec...
- Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates : Abstract: Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream ta...
- PretrainZero: Reinforcement Active Pretraining : Abstract: Mimicking human behavior to actively learning from general experience and achieve artificial general intelligence has always been a human dream. Recent reinforcement learning (RL) based larg...
- Understanding LLM Reasoning for Abstractive Summarization : Abstract: While the reasoning capabilities of Large Language Models (LLMs) excel in analytical tasks such as mathematics and code generation, their utility for abstractive summarization remains widely...
- Fine-grained Narrative Classification in Biased News Articles : Abstract: Narratives are the cognitive and emotional scaffolds of propaganda. They organize isolated persuasive techniques into coherent stories that justify actions, attribute blame, and evoke identi...
- AlignCheck: a Semantic Open-Domain Metric for Factual Consistency Assessment : Abstract: Large Language Models have significantly advanced natural language processing tasks, but remain prone to generating incorrect or misleading but plausible arguments. This issue, known as hall...
- Generative AI Practices, Literacy, and Divides: An Empirical Analysis in the Italian Context : Abstract: The rise of Artificial Intelligence (AI) language technologies, particularly generative AI (GenAI) chatbots accessible via conversational interfaces, is transforming digital interactions. Wh...
- Evaluating Hydro-Science and Engineering Knowledge of Large Language Models : Abstract: Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters....
- Different types of syntactic agreement recruit the same units within large language models : Abstract: Large language models (LLMs) can reliably distinguish grammatical from ungrammatical sentences, but how grammatical knowledge is represented within the models remains an open question. We in...
- AITutor-EvalKit: Exploring the Capabilities of AI Tutors : Abstract: We present AITutor-EvalKit, an application that uses language technology to evaluate the pedagogical quality of AI tutors, provides software for demonstration and evaluation, as well as mode...
- DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue : Abstract: Long-context dialogue systems suffer from State Inertia, where static constraints prevent models from resolving conflicts between evolving user intents and established historical context. To...
- TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design : Abstract: Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary fra...
- Run-Time Monitoring of ERTMS/ETCS Control Flow by Process Mining : Abstract: Ensuring the resilience of computer-based railways is increasingly crucial to account for uncertainties and changes due to the growing complexity and criticality of those systems. Although t...
- All that structure matches does not glitter : Abstract: Generative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends on...
- Variational Inference of Parameters in Opinion Dynamics Models : Abstract: Despite the frequent use of agent-based models (ABMs) for studying social phenomena, parameter estimation remains a challenge, often relying on costly simulation-based heuristics. This work ...
- Are you a robot? Detecting Autonomous Vehicles from Behavior Analysis : Abstract: The tremendous hype around autonomous driving is eagerly calling for emerging and novel technologies to support advanced mobility use cases. As car manufactures keep developing SAE level 3+ ...
- SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving : Abstract: As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns. Inference dominates LLM...
- From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing : Abstract: Remote sensing has evolved from simple image acquisition to complex systems capable of integrating and processing visual and textual data. This review examines the development and applicatio...
- Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning : Abstract: Imitation learning (IL) is a paradigm for learning sequential decision making policies from experts, leveraging offline demonstrations, interactive annotations, or both. Recent advances show...
- Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing : Abstract: The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for con...
- Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control : Abstract: Reinforcement learning (RL) offers transformative potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, t...
- Cross-embodied Co-design for Dexterous Hands : Abstract: Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how sh...
- Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective : Abstract: Reinforcement Learning (RL) has proven highly effective for autoregressive language models, but adapting these methods to diffusion large language models (dLLMs) presents fundamental challen...
- In-Context Representation Hijacking : Abstract: We introduce \textbf{Doublespeak}, a simple \emph{in-context representation hijacking} attack against large language models (LLMs). The attack works by systematically replacing a harmful key...
- AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition : Abstract: Vision-Language Models (VLMs) have achieved remarkable success in visual question answering tasks, but their reliance on large numbers of visual tokens introduces significant computational o...
- HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English : Abstract: Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single...
- Comparison of neural network training strategies for the simulation of dynamical systems : Abstract: Neural networks have become a widely adopted tool for modeling nonlinear dynamical systems from data. However, the choice of training strategy remains a key design decision, particularly for...
- OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance : Abstract: Dexterous grasp generation aims to produce grasp poses that align with task requirements and human interpretable grasp semantics. However, achieving semantically controllable dexterous grasp...
- Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning : Abstract: Active suspension systems are critical for enhancing vehicle comfort, safety, and stability, yet their performance is often limited by fixed hardware designs and control strategies that cann...
- Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware : Abstract: We present an end-to-end pipeline for deploying reinforcement learning (RL) trained Artificial Neural Networks (ANNs) on neuromorphic hardware by converting them into spiking Sigma-Delta Neu...
- A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models : Abstract: In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load ba...
- Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction : Abstract: Deep Image Prior (DIP) has recently emerged as a promising one-shot neural-network based image reconstruction method. However, DIP has seen limited application to 3D image reconstruction pro...
- Refining Machine Learning Potentials through Thermodynamic Theory of Phase Transitions : Abstract: Foundational Machine Learning Potentials can resolve the accuracy and transferability limitations of classical force fields. They enable microscopic insights into material behavior through M...
- Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding : Abstract: The application of Large Multimodal Models (LMMs) to long-form video understanding is constrained by limited context lengths and the computationally prohibitive cost of processing dense vide...
- PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation : Abstract: Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the development of efficient atten...
- Fast & Efficient Normalizing Flows and Applications of Image Generative Models : Abstract: This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-wor...
- Closing the problem of which causal structures of up to six total nodes have a classical-quantum gap : Abstract: The discovery of Bell that there exist quantum correlations that cannot be reproduced classically is one of the most important in the foundations of quantum mechanics, as well as having prac...
- Marginalize, Rather than Impute: Probabilistic Wind Power Forecasting with Incomplete Data : Abstract: Machine learning methods are widely and successfully used for probabilistic wind power forecasting, yet the pervasive issue of missing values (e.g., due to sensor faults or communication out...
- Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective : Abstract: The Rectified Power Unit (RePU) activation function, a differentiable generalization of the Rectified Linear Unit (ReLU), has shown promise in constructing neural networks due to its smoothn...
- Concentration of Cumulative Reward in Markov Decision Processes : Abstract: In this paper, we investigate the concentration properties of cumulative reward in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a u...
- Test-Time Training Scaling Laws for Chemical Exploration in Drug Design : Abstract: Chemical Language Models (CLMs) leveraging reinforcement learning (RL) have shown promise in de novo molecular design, yet often suffer from mode collapse, limiting their exploration capabil...
- Convergence of a class of gradient-free optimisation schemes when the objective function is noisy, irregular, or both : Abstract: We investigate the convergence properties of a class of iterative algorithms designed to minimize a potentially non-smooth and noisy objective function, which may be algebraically intractabl...
- Iterative Tilting for Diffusion Fine-Tuning : Abstract: We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $\exp(λr)$ into $N$ sequ...
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy : Abstract: High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will so...
- Novelty detection on path space : Abstract: We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we ob...
- Learning Network Sheaves for AI-native Semantic Communication : Abstract: Recent advances in AI call for a paradigm shift from bit-centric communication to goal- and semantics-oriented architectures, paving the way for AI-native 6G networks. In this context, we ad...
- PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery : Abstract: Rapid and accurate wildfire detection is crucial for emergency response and environmental management. In airborne and spaceborne missions, real-time algorithms must distinguish between no fi...
- Associating Healthcare Teamwork with Patient Outcomes for Predictive Analysis : Abstract: Cancer treatment outcomes are influenced not only by clinical and demographic factors but also by the collaboration of healthcare teams. However, prior work has largely overlooked the potent...
- Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs : Abstract: The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally ide...
- Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time : Abstract: The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to devel...
- NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction : Abstract: Accurate environmental representations are essential for autonomous driving, providing the foundation for safe and efficient navigation. Traditionally, high-definition (HD) maps are providin...
- When does Gaussian equivalence fail and how to fix it: Non-universal behavior of random features with quadratic scaling : Abstract: A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaus...
- Step-by-step Layered Design Generation : Abstract: Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristi...
- ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography : Abstract: Ejection fraction (EF) is a crucial metric for assessing cardiac function and diagnosing conditions such as heart failure. Traditionally, EF estimation requires manual tracing and domain exp...
- Comparative algorithm performance evaluation and prediction for the maximum clique problem using instance space analysis : Abstract: The maximum clique problem, a well-known graph-based combinatorial optimization problem, has been addressed through various algorithmic approaches, though systematic analyses of the problem ...
- KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models : Abstract: Understanding and representing the structure of 3D objects in an unsupervised manner remains a core challenge in computer vision and graphics. Most existing unsupervised keypoint methods are...
- GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers : Abstract: Diffusion models have revolutionized video generation, becoming essential tools in creative content generation and physical simulation. Transformer-based architectures (DiTs) and classifier-...
- A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses : Abstract: Decoding imagined speech engages complex neural processes that are difficult to interpret due to uncertainty in timing and the limited availability of imagined-response datasets. In this stu...
- Learning From Limited Data and Feedback for Cell Culture Process Monitoring: A Comparative Study : Abstract: In cell culture bioprocessing, real-time batch process monitoring (BPM) refers to the continuous tracking and analysis of key process variables such as viable cell density, nutrient levels, ...
- A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification : Abstract: Malicious URLs remain a primary vector for phishing, malware, and cyberthreats. This study proposes a hybrid deep learning framework combining \texttt{HashingVectorizer} n-gram analysis, SMO...
- Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis : Abstract: Vision-language models achieve expert-level performance on medical imaging tasks but exhibit significant diagnostic accuracy disparities across demographic groups. We introduce fairness-awar...
- A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention : Abstract: Large Language Models (LLMs) are increasingly prevalent in the field of long-context modeling, however, their inference computational costs have become a critical bottleneck hindering the ad...
- Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation : Abstract: Multimodal Emotion Recognition in Conversation (MERC) aims to predict speakers' emotions by integrating textual, acoustic, and visual cues. Existing approaches either struggle to capture com...
- Machine Learning to Predict Slot Usage in TSCH Wireless Sensor Networks : Abstract: Wireless sensor networks (WSNs) are employed across a wide range of industrial applications where ultra-low power consumption is a critical prerequisite. At the same time, these systems must...
- EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths : Abstract: We introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow l...
- SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting : Abstract: The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fu...
- AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning : Abstract: Transformer-based audio SSL (self-supervised learning) models often treat spectrograms as images, applying convolutional patchification with heavy temporal downsampling. This lowers the effe...
- Optical Context Compression Is Just (Bad) Autoencoding : Abstract: DeepSeek-OCR demonstrates that rendered text can be reconstructed with high fidelity from a small number of vision tokens. This finding has sparked excitement about vision-based context comp...
- Consistent Projection of Langevin Dynamics: Preserving Thermodynamics and Kinetics in Coarse-Grained Models : Abstract: Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a proj...
- Over-the-Air Federated Learning: Rethinking Edge AI Through Signal Processing : Abstract: Over-the-Air Federated Learning (AirFL) is an emerging paradigm that tightly integrates wireless signal processing and distributed machine learning to enable scalable AI at the network edge....
- Colored Markov Random Fields for Probabilistic Topological Modeling : Abstract: Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution ...
- Quantum-Classical Physics-Informed Neural Networks for Solving Reservoir Seepage Equations : Abstract: Solving partial differential equations (PDEs) for reservoir seepage is critical for optimizing oil and gas field development and predicting production performance. Traditional numerical meth...
- Density-Informed VAE (DiVAE): Reliable Log-Prior Probability via Density Alignment Regularization : Abstract: We introduce Density-Informed VAE (DiVAE), a lightweight, data-driven regularizer that aligns the VAE log-prior probability $\log p_Z(z)$ with a log-density estimated from data. Standard VAE...
- Technical Report on Text Dataset Distillation : Abstract: In the vision domain, dataset distillation arises as a technique to condense a large dataset into a smaller synthetic one that exhibits a similar result in the training process. While image ...
- Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning : Abstract: Offline reinforcement learning often relies on behavior regularization that enforces policies to remain close to the dataset distribution. However, such approaches fail to distinguish betwee...
- Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs : Abstract: Aligning proprietary large language models (LLMs) with internal organizational policies has become an urgent priority as organizations increasingly deploy LLMs in sensitive domains such as l...
- Physics-Embedded Gaussian Process for Traffic State Estimation : Abstract: Traffic state estimation (TSE) becomes challenging when probe-vehicle penetration is low and observations are spatially sparse. Pure data-driven methods lack physical explanations and have p...
- Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics : Abstract: Cross-entropy (CE) training loss dominates deep learning practice, yet existing theory often relies on simplifications, either replacing it with squared loss or restricting to convex models,...
- Efficient Public Verification of Private ML via Regularization : Abstract: Training with differential privacy (DP) provides a guarantee to members in a dataset that they cannot be identified by users of the released model. However, those data providers, and, in gen...
- Domain Feature Collapse: Implications for Out-of-Distribution Detection and Solutions : Abstract: Why do state-of-the-art OOD detection methods exhibit catastrophic failure when models are trained on single-domain datasets? We provide the first theoretical explanation for this phenomenon...
- MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking : Abstract: Watermarking aims to embed hidden signals in generated text that can be reliably detected when given access to a secret key. Open-weight language models pose acute challenges for such waterm...
- Convergence for Discrete Parameter Updates : Abstract: Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in...
- Eval Factsheets: A Structured Framework for Documenting AI Evaluations : Abstract: The rapid proliferation of benchmarks has created significant challenges in reproducibility, transparency, and informed decision-making. However, unlike datasets and models -- which benefit ...
- Fare Comparison App of Uber, Ola and Rapido : Abstract: In todays increasing world, it is very important to have good hailing services like Ola, Uber, and Rapido as it is very essential for our daily transportation. Users often face difficulties ...
- Learning Steerable Clarification Policies with Collaborative Self-play : Abstract: To handle underspecified or ambiguous queries, AI assistants need a policy for managing their uncertainty to determine (a) when to guess the user intent and answer directly, (b) when to enum...
- Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models : Abstract: Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak attacks, and slow degradation of alignme...
- Exploring Syntropic Frameworks in AI Alignment: A Philosophical Investigation : Abstract: I argue that AI alignment should be reconceived as architecting syntropic, reasons-responsive agents through process-based, multi-agent, developmental mechanisms rather than encoding fixed h...
- A note on the impossibility of conditional PAC-efficient reasoning in large language models : Abstract: We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency ...
- Watermarks for Embeddings-as-a-Service Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-a...
- Calibrating Geophysical Predictions under Constrained Probabilistic Distributions : Abstract: Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive depend...
- Password-Activated Shutdown Protocols for Misaligned Frontier Agents : Abstract: Frontier AI developers may fail to align or control highly-capable AI agents. In many cases, it could be useful to have emergency shutdown mechanisms which effectively prevent misaligned age...
- Performance Analysis of Quantum Support Vector Classifiers and Quantum Neural Networks : Abstract: This study explores the performance of Quantum Support Vector Classifiers (QSVCs) and Quantum Neural Networks (QNNs) in comparison to classical models for machine learning tasks. By evaluati...
- Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare : Abstract: The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also deployed as multi-agent teams to assist AI d...
- An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System : Abstract: The rapid growth of Artificial Intelligence (AI) in healthcare has sparked interest in Trustworthy AI and AI Implementation Science, both of which are essential for accelerating clinical ado...
- QGShap: Quantum Acceleration for Faithful GNN Explanations : Abstract: Graph Neural Networks (GNNs) have become indispensable in critical domains such as drug discovery, social network analysis, and recommendation systems, yet their black-box nature hinders dep...
- A Discrete Neural Operator with Adaptive Sampling for Surrogate Modeling of Parametric Transient Darcy Flows in Porous Media : Abstract: This study proposes a new discrete neural operator for surrogate modeling of transient Darcy flow fields in heterogeneous porous media with random parameters. The new method integrates tempo...
- Drainage: A Unifying Framework for Addressing Class Uncertainty : Abstract: Modern deep learning faces significant challenges with noisy labels, class ambiguity, as well as the need to robustly reject out-of-distribution or corrupted samples. In this work, we propos...
- In Situ Quantum Analog Pulse Characterization via Structured Signal Processing : Abstract: Analog quantum simulators can directly emulate time-dependent Hamiltonian dynamics, enabling the exploration of diverse physical phenomena such as phase transitions, quench dynamics, and non...
- GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding : Abstract: Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling ...
- Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESCO and EQF taxonomies : Abstract: This study investigates the potential of language models to improve the classification of labor market information by linking job vacancy texts to two major European frameworks: the European...
- Ultra-Strong Gradient Diffusion MRI with Self-Supervised Learning for Prostate Cancer Characterization : Abstract: Diffusion MRI (dMRI) enables non-invasive assessment of prostate microstructure but conventional metrics such as the Apparent Diffusion Coefficient in multiparametric MRI lack specificity to...
- Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback : Abstract: We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback ...
- Flux4D: Flow-based Unsupervised 4D Reconstruction : Abstract: Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent...
- ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms : Abstract: Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We intro...
- Modal Logical Neural Networks : Abstract: We propose Modal Logical Neural Networks (MLNNs), a neurosymbolic framework that integrates deep learning with the formal semantics of modal logic, enabling reasoning about necessity and pos...
- Physics-Driven Learning Framework for Tomographic Tactile Sensing : Abstract: Electrical impedance tomography (EIT) provides an attractive solution for large-area tactile sensing due to its minimal wiring and shape flexibility, but its nonlinear inverse problem often ...
- Adaptive sampling using variational autoencoder and reinforcement learning : Abstract: Compressed sensing enables sparse sampling but relies on generic bases and random measurements, limiting efficiency and reconstruction quality. Optimal sensor placement uses historcal data t...
- Parameter-Efficient Augment Plugin for Class-Incremental Learning : Abstract: Existing class-incremental learning (CIL) approaches based on replay or knowledge distillation are often constrained by forgetting or the stability-plasticity dilemma. Some expansion-based a...
- Towards Irreversible Machine Unlearning for Diffusion Models : Abstract: Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machin...
- When, How Long and How Much? Interpretable Neural Networks for Time Series Regression by Learning to Mask and Aggregate : Abstract: Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series. It appears in many domains, including healthcare, finance, en...
- Optimal Transportation and Alignment Between Gaussian Measures : Abstract: Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitou...
- Federated Learning and Trajectory Compression for Enhanced AIS Coverage : Abstract: This paper presents the VesselEdge system, which leverages federated learning and bandwidth-constrained trajectory compression to enhance maritime situational awareness by extending AIS cove...
- Observation-driven correction of numerical weather prediction for marine winds : Abstract: Accurate marine wind forecasts are essential for safe navigation, ship routing, and energy operations, yet they remain challenging because observations over the ocean are sparse, heterogeneo...
- CoGraM: Context-sensitive granular optimization method with rollback for robust model fusion : Abstract: Merging neural networks without retraining is central to federated and distributed learning. Common methods such as weight averaging or Fisher merging often lose accuracy and are unstable ac...
- The promising potential of vision language models for the generation of textual weather forecasts : Abstract: Despite the promising capability of multimodal foundation models, their application to the generation of meteorological products and services remains nascent. To accelerate aspiration and ad...
- Conditional updates of neural network weights for increased out of training performance : Abstract: This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and re...
- Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy Forecasting : Abstract: Accurate electricity consumption forecasting is essential for demand management and smart grid operations. This paper introduces a unified deep learning framework that integrates cyclical te...
- Dynamically Scaled Activation Steering : Abstract: Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply ...
- Feature-aware Modulation for Learning from Temporal Tabular Data : Abstract: While tabular machine learning has achieved remarkable success, temporal distribution shifts pose significant challenges in real-world deployment, as the relationships between features and l...
- Quantum Topological Graph Neural Networks for Detecting Complex Fraud Patterns : Abstract: We propose a novel QTGNN framework for detecting fraudulent transactions in large-scale financial networks. By integrating quantum embedding, variational graph convolutions, and topological ...
- Unlocking the Invisible Urban Traffic Dynamics under Extreme Weather: A New Physics-Constrained Hamiltonian Learning Algorithm : Abstract: Urban transportation systems face increasing resilience challenges from extreme weather events, but current assessment methods rely on surface-level recovery indicators that miss hidden stru...
- Universally Converging Representations of Matter Across Scientific Foundation Models : Abstract: Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether ...
- Origin-Conditional Trajectory Encoding: Measuring Urban Configurational Asymmetries through Neural Decomposition : Abstract: Urban analytics increasingly relies on AI-driven trajectory analysis, yet current approaches suffer from methodological fragmentation: trajectory learning captures movement patterns but igno...
- Deep Unfolding: Recent Developments, Theory, and Design Guidelines : Abstract: Optimization methods play a central role in signal processing, serving as the mathematical foundation for inference, estimation, and control. While classical iterative optimization algorithm...
- Forensic Activity Classification Using Digital Traces from iPhones: A Machine Learning-based Approach : Abstract: Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedd...
- Adaptive Identification and Modeling of Clinical Pathways with Process Mining : Abstract: Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereb...
- EfficientECG: Cross-Attention with Feature Fusion for Efficient Electrocardiogram Classification : Abstract: Electrocardiogram is a useful diagnostic signal that can detect cardiac abnormalities by measuring the electrical activity generated by the heart. Due to its rapid, non-invasive, and richly ...
- Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($\lambda$,$\lambda$))-GA : Abstract: Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies have leveraged the robustness of de...
- Log Probability Tracking of LLM APIs : Abstract: When using an LLM through an API provider, users expect the served model to remain consistent over time, a property crucial for the reliability of downstream applications and the reproducibi...
- Transmit Weights, Not Features: Orthogonal-Basis Aided Wireless Point-Cloud Transmission : Abstract: The widespread adoption of depth sensors has substantially lowered the barrier to point-cloud acquisition. This letter proposes a semantic wireless transmission framework for three dimension...
- DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training : Abstract: Reinforcement learning (RL) has shown strong performance in LLM post-training, but real-world deployment often involves noisy or incomplete supervision. In such settings, complex and unrelia...
- Scalable Decision Focused Learning via Online Trainable Surrogates : Abstract: Decision support systems often rely on solving complex optimization problems that may require to estimate uncertain parameters beforehand. Recent studies have shown how using traditionally t...
- Hyperdimensional Computing for Sustainable Manufacturing: An Initial Assessment : Abstract: Smart manufacturing can significantly improve efficiency and reduce energy consumption, yet the energy demands of AI models may offset these gains. This study utilizes in-situ sensing-based ...
- Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models : Abstract: Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on b...
- Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction : Abstract: Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tool...
- Contrastive Deep Learning for Variant Detection in Wastewater Genomic Sequencing : Abstract: Wastewater-based genomic surveillance has emerged as a powerful tool for population-level viral monitoring, offering comprehensive insights into circulating viral variants across entire comm...
- Plantain: Plan-Answer Interleaved Reasoning : Abstract: Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning...
- Neighborhood density estimation using space-partitioning based hashing schemes : Abstract: This work introduces FiRE/FiRE.1, a novel sketching-based algorithm for anomaly detection to quickly identify rare cell sub-populations in large-scale single-cell RNA sequencing data. This m...
- Scaling Internal-State Policy-Gradient Methods for POMDPs : Abstract: Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments. They have shown promise for problems admitting mem...
- A Multi-Agent, Policy-Gradient approach to Network Routing : Abstract: Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, ...
- Perch 2.0 transfers 'whale' to underwater tasks : Abstract: Perch 2.0 is a supervised bioacoustics foundation model pretrained on 14,597 species, including birds, mammals, amphibians, and insects, and has state-of-the-art performance on multiple benc...
- SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning : Abstract: Process reward models (PRMs) that provide dense, step-level feedback have shown promise for reinforcement learning, yet their adoption remains limited by the need for expensive step-level an...
- Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval : Abstract: Training vision language models (VLMs) aims to align visual representations from a vision encoder with the textual representations of a pretrained large language model (LLM). However, many V...
- BlendedNet++: A Large-Scale Blended Wing Body Aerodynamics Dataset and Benchmark : Abstract: Despite progress in machine learning-based aerodynamic surrogates, the scarcity of large, field-resolved datasets limits progress on accurate pointwise prediction and reproducible inverse de...
- Multi-Frequency Federated Learning for Human Activity Recognition Using Head-Worn Sensors : Abstract: Human Activity Recognition (HAR) benefits various application domains, including health and elderly care. Traditional HAR involves constructing pipelines reliant on centralized user data, wh...
- ASPEN: An Adaptive Spectral Physics-Enabled Network for Ginzburg-Landau Dynamics : Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a powerful, mesh-free paradigm for solving partial differential equations (PDEs). However, they notoriously struggle with stiff, mult...
- Adaptive Regime-Switching Forecasts with Distribution-Free Uncertainty: Deep Switching State-Space Models Meet Conformal Prediction : Abstract: Regime transitions routinely break stationarity in time series, making calibrated uncertainty as important as point accuracy. We study distribution-free uncertainty for regime-switching fore...
- HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction : Abstract: Deep learning models have shown promise in reservoir inflow prediction, yet their performance often deteriorates when applied to different reservoirs due to distributional differences, refer...
- Robust Tabular Foundation Models : Abstract: The development of tabular foundation models (TFMs) has accelerated in recent years, showing strong potential to outperform traditional ML methods for structured data. A key finding is that ...
- Retrofitting Earth System Models with Cadence-Limited Neural Operator Updates : Abstract: Coarse resolution, imperfect parameterizations, and uncertain initial states and forcings limit Earth-system model (ESM) predictions. Traditional bias correction via data assimilation improv...
- Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs : Abstract: Memory and computation remain core bottlenecks in long-horizon LLM inference due to the quadratic cost of self-attention and the ever-growing key-value (KV) cache. Existing strategies for me...
- Single-Round Scalable Analytic Federated Learning : Abstract: Federated Learning (FL) is plagued by two key challenges: high communication overhead and performance collapse on heterogeneous (non-IID) data. Analytic FL (AFL) provides a single-round, dat...
- Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions : Abstract: Online A/B testing, the gold standard for evaluating new advertising policies, consumes substantial engineering resources and risks significant revenue loss from deploying underperforming va...
- A2G-QFL: Adaptive Aggregation with Two Gains in Quantum Federated learning : Abstract: Federated learning (FL) deployed over quantum enabled and heterogeneous classical networks faces significant performance degradation due to uneven client quality, stochastic teleportation fi...
- MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems : Abstract: Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. W...
- UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs : Abstract: Deploying large language model (LLM) models on mobile platforms faces significant challenges due to the limited memory and shared computational resources of the device. Resource availability...
- Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization : Abstract: Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods like multiple measurement vectors or...
- VS-Graph: Scalable and Efficient Graph Classification Using Hyperdimensional Computing : Abstract: Graph classification is a fundamental task in domains ranging from molecular property prediction to materials design. While graph neural networks (GNNs) achieve strong performance by learnin...
- Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value : Abstract: Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intent...
- Better World Models Can Lead to Better Post-Training Performance : Abstract: In this work we study how explicit world-modeling objectives affect the internal representations and downstream capability of Transformers across different training stages. We use a controll...
- GaussDetect-LiNGAM:Causal Direction Identification without Gaussianity test : Abstract: We propose GaussDetect-LiNGAM, a novel approach for bivariate causal discovery that eliminates the need for explicit Gaussianity tests by leveraging a fundamental equivalence between noise G...
- Grokked Models are Better Unlearners : Abstract: Grokking-delayed generalization that emerges well after a model has fit the training data-has been linked to robustness and representation quality. We ask whether this training regime also h...
- Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Modal Attention : Abstract: In recent years, financial sentiment analysis of public opinion has become increasingly important for market forecasting and risk assessment. However, existing methods often struggle to effe...
- Bayesian Event-Based Model for Disease Subtype and Stage Inference : Abstract: Chronic diseases often progress differently across patients. Rather than randomly varying, there are typically a small number of subtypes for how a disease progresses across patients. To cap...
- SweetDeep: A Wearable AI Solution for Real-Time Non-Invasive Diabetes Screening : Abstract: The global rise in type 2 diabetes underscores the need for scalable and cost-effective screening methods. Current diagnosis requires biochemical assays, which are invasive and costly. Advan...
- Joint Progression Modeling (JPM): A Probabilistic Framework for Mixed-Pathology Progression : Abstract: Event-based models (EBMs) infer disease progression from cross-sectional data, and standard EBMs assume a single underlying disease per individual. In contrast, mixed pathologies are common ...
- Physics-Informed Machine Learning for Steel Development: A Computational Framework and CCT Diagram Modelling : Abstract: Machine learning (ML) has emerged as a powerful tool for accelerating the computational design and production of materials. In materials science, ML has primarily supported large-scale disco...
- Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation : Abstract: We show for invertible problems that transform data from a source domain (for example, Logic Condition Tables (LCTs)) to a destination domain (for example, Hardware Description Language (HDL...
- Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research : Abstract: Federated Learning (FL) holds the potential to advance equality in health by enabling diverse institutions to collaboratively train deep learning (DL) models, even with limited data. However...
- Physics-informed self-supervised learning for predictive modeling of coronary artery digital twins : Abstract: Cardiovascular disease is the leading global cause of mortality, with coronary artery disease (CAD) as its most prevalent form, necessitating early risk prediction. While 3D coronary artery ...
- Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models : Abstract: Diffusion models like Stable Diffusion (SD) drive a vibrant open-source ecosystem including fully fine-tuned checkpoints and parameter-efficient adapters such as LoRA, LyCORIS, and ControlNe...
- Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding : Abstract: This paper investigates the dynamical properties of tokens in pre-trained Transformer models and explores their application to improving Transformers. To this end, we analyze the dynamical s...
- Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL : Abstract: The integration of Electric Buses (EBs) with renewable energy sources such as photovoltaic (PV) panels is a promising approach to promote sustainable and low-carbon public transportation. Ho...
- A Large Scale Heterogeneous Treatment Effect Estimation Framework and Its Applications of Users' Journey at Snap : Abstract: Heterogeneous Treatment Effect (HTE) and Conditional Average Treatment Effect (CATE) models relax the assumption that treatment effects are the same for every user. We present a large scale ...
- Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing : Abstract: Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising...
- Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning : Abstract: Generative AI agents in life sciences face a critical challenge: determining the optimal approach for diverse queries ranging from simple factoid questions to complex mechanistic reasoning. ...
- Hierarchical clustering of complex energy systems using pretopology : Abstract: This article attempts answering the following problematic: How to model and classify energy consumption profiles over a large distributed territory to optimize the management of buildings' c...
- Mixed Data Clustering Survey and Challenges : Abstract: The advent of the big data paradigm has transformed how industries manage and analyze information, ushering in an era of unprecedented data volume, velocity, and variety. Within this landsca...
- PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering : Abstract: This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive No...
- Model-Agnostic Fairness Regularization for GNNs with Incomplete Sensitive Information : Abstract: Graph Neural Networks (GNNs) have demonstrated exceptional efficacy in relational learning tasks, including node classification and link prediction. However, their application raises signifi...
- Risk-Entropic Flow Matching : Abstract: Tilted (entropic) risk, obtained by applying a log-exponential transform to a base loss, is a well established tool in statistics and machine learning for emphasizing rare or high loss event...
- ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification : Abstract: The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed...
- Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration : Abstract: In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited...
- Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92% : Abstract: Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucinat...
- E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing : Abstract: Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have develop...
- Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability : Abstract: Shapley values, a gold standard for feature attribution in Explainable AI, face two primary challenges. First, the canonical Shapley framework assumes that the worth function is additive, ye...
- Temporal Graph Neural Networks for Early Anomaly Detection and Performance Prediction via PV System Monitoring Data : Abstract: The rapid growth of solar photovoltaic (PV) systems necessitates advanced methods for performance monitoring and anomaly detection to ensure optimal operation. In this study, we propose a no...
- Real-Time Structural Health Monitoring with Bayesian Neural Networks: Distinguishing Aleatoric and Epistemic Uncertainty for Digital Twin Frameworks : Abstract: Reliable real-time analysis of sensor data is essential for structural health monitoring (SHM) of high-value assets, yet a major challenge is to obtain spatially resolved full-field aleatori...
- Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models : Abstract: Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks i...
- Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra : Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural produc...
Research Sources: 336 | Generated: 12/4/2025
