AI RESEARCH PAPERS & ACADEMIC SOURCES
- P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty : Abstract: This paper presents P2U-SLAM, a visual Simultaneous Localization And Mapping (SLAM) system with a wide Field of View (FoV) camera, which utilizes pose uncertainty and point uncertainty. Whil...
- Energy Propagation in Scattering Convolution Networks Can Be Arbitrarily Slow : Abstract: We analyze energy decay for deep convolutional neural networks employed as feature extractors, including Mallat's wavelet scattering transform. For time-frequency scattering transforms based...
- Enhancing Blind Video Quality Assessment with Rich Quality-aware Features : Abstract: Blind video quality assessment (BVQA) is a highly challenging task due to the intrinsic complexity of video content and visual distortions, especially given the high popularity of social med...
- COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability : Abstract: Recently, neural network (NN)-based image compression studies have actively been made and has shown impressive performance in comparison to traditional methods. However, most of the works ha...
- RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning : Abstract: Remote sensing (RS) images from multiple modalities and platforms exhibit diverse details due to differences in sensor characteristics and imaging perspectives. Existing vision-language rese...
- Point Cloud to Mesh Reconstruction: Methods, Trade-offs, and Implementation Guide : Abstract: Reconstructing meshes from point clouds is a fundamental task in computer vision with applications spanning robotics, autonomous systems, and medical imaging. Selecting an appropriate learni...
- Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation : Abstract: Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as ...
- SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection : Abstract: Despite significant advances in vision-language understanding, implementing image segmentation within multimodal architectures remains a fundamental challenge in modern artificial intelligen...
- AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans : Abstract: Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on stati...
- MotionCharacter: Fine-Grained Motion Controllable Human Video Generation : Abstract: Recent advancements in personalized Text-to-Video (T2V) generation have made significant strides in synthesizing character-specific content. However, these methods face a critical limitation...
- RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations : Abstract: Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stab...
- Towards Vision-Language Geo-Foundation Model: A Survey : Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual gro...
- Training-Free Video Editing via Optical Flow-Enhanced Score Distillation : Abstract: The rapid advancement in visual generation, particularly the emergence of pre-trained text-to-image and text-to-video models, has catalyzed growing interest in training-free video editing re...
- Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry : Abstract: Reconstructing accurate surfaces from sparse multi-view images remains challenging due to severe geometric ambiguity and occlusions. Existing generalizable neural surface reconstruction meth...
- PrevMatch: Revisiting and Maximizing Temporal Knowledge in Semi-Supervised Semantic Segmentation : Abstract: In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high per...
- RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation : Abstract: Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinic...
- Attire-Based Anomaly Detection in Restricted Areas Using YOLOv8 for Enhanced CCTV Security : Abstract: This research introduces an innovative security enhancement approach, employing advanced image analysis and soft computing. The focus is on an intelligent surveillance system that detects un...
- Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering : Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been...
- Dancing Points: Synthesizing Ballroom Dancing with Three-Point Inputs : Abstract: Ballroom dancing is a structured yet expressive motion category. Its highly diverse movement and complex interactions between leader and follower dancers make the understanding and synthesis...
- SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes : Abstract: Physics simulation of slender elastic objects often requires discretization as a polyline. However, constructing a polyline from Gaussian splatting is challenging as Gaussian splatting lacks...
- DisCo-FLoc: Using Dual-Level Visual-Geometric Contrasts to Disambiguate Depth-Aware Visual Floorplan Localization : Abstract: Since floorplan data is readily available, long-term persistent, and robust to changes in visual appearance, visual Floorplan Localization (FLoc) has garnered significant attention. Existing...
- AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving : Abstract: End-to-end autonomous driving has rapidly progressed, enabling joint perception and planning in complex environments. In the planning stage, state-of-the-art (SOTA) end-to-end autonomous dri...
- OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs : Abstract: The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benc...
- Sim2Real SAR Image Restoration: Metadata-Driven Models for Joint Despeckling and Sidelobes Reduction : Abstract: Synthetic aperture radar (SAR) provides valuable information about the Earth's surface under all weather and illumination conditions. However, the inherent phenomenon of speckle and the pres...
- Image Synthesis Using Spintronic Deep Convolutional Generative Adversarial Network : Abstract: The computational requirements of generative adversarial networks (GANs) exceed the limit of conventional Von Neumann architectures, necessitating energy efficient alternatives such as neuro...
- An Energy-Efficient Smart Bus Transport Management System with Blind-Spot Collision Detection Ability : Abstract: Public bus transport systems in developing countries often suffer from a lack of real-time location updates and for users, making commuting inconvenient and unreliable for passengers. Furthe...
- DST-Calib: A Dual-Path, Self-Supervised, Target-Free LiDAR-Camera Extrinsic Calibration Network : Abstract: LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g...
- YODA: Yet Another One-step Diffusion-based Video Compressor : Abstract: While one-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited. Prior efforts typically rely on pretrained 2D autoencoders...
- Uncertainty-Calibrated Explainable AI for Fetal Ultrasound Plane Classification : Abstract: Fetal ultrasound standard-plane classification underpins reliable prenatal biometry and anomaly screening, yet real-world deployment is limited by domain shift, image noise, and poor calibra...
- Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions : Abstract: Magnetic Resonance Imaging (MRI) is a well-established modality for pre-operative planning and is also explored for intra-operative guidance of procedures such as intravascular interventions...
- MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation : Abstract: Semantic segmentation is crucial for medical image analysis, enabling precise disease diagnosis and treatment planning. However, many advanced models employ complex architectures, limiting t...
- ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors : Abstract: Detecting unknown deepfake manipulations remains one of the most challenging problems in face forgery detection. Current state-of-the-art approaches fail to generalize to unseen manipulation...
- VINO: A Unified Visual Generator with Interleaved OmniModal Context : Abstract: We present VINO, a unified visual generator that performs image and video generation and editing within a single framework. Instead of relying on task-specific models or independent modules ...
- Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes : Abstract: We introduce Talk2Move, a reinforcement learning (RL) based diffusion framework for text-instructed spatial transformation of objects within scenes. Spatially manipulating objects in a scene...
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding : Abstract: Recent works propose extending 3DGS with semantic feature vectors for simultaneous semantic segmentation and image rendering. However, these methods often treat the semantic and rendering br...
- BEDS: Bayesian Emergent Dissipative Structures : Abstract: We present BEDS (Bayesian Emergent Dissipative Structures), a theoretical framework that unifies concepts from non-equilibrium thermodynamics, Bayesian inference, information geometry, and m...
- Fusion2Print: Deep Flash-Non-Flash Fusion for Contactless Fingerprint Matching : Abstract: Contactless fingerprint recognition offers a hygienic and convenient alternative to contact-based systems, enabling rapid acquisition without latent prints, pressure artifacts, or hygiene ri...
- Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping : Abstract: Geo-Foundation Models (GFMs), have proven effective in diverse downstream applications, including semantic segmentation, classification, and regression tasks. However, in case of flood mappi...
- 360DVO: Deep Visual Odometry for Monocular 360-Degree Camera : Abstract: Monocular omnidirectional visual odometry (OVO) systems leverage 360-degree cameras to overcome field-of-view limitations of perspective VO systems. However, existing methods, reliant on han...
- SortWaste: A Densely Annotated Dataset for Object Detection in Industrial Waste Sorting : Abstract: The increasing production of waste, driven by population growth, has created challenges in managing and recycling materials effectively. Manual waste sorting is a common practice; however, i...
- Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery : Abstract: Self-supervised learning (SSL) has become a powerful paradigm for learning from large, unlabeled datasets, particularly in computer vision (CV). However, applying SSL to multispectral remote...
- InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams : Abstract: The grand vision of enabling persistent, large-scale 3D visual geometry understanding is shackled by the irreconcilable demands of scalability and long-term stability. While offline models l...
- DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies : Abstract: Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models' training, while synthetic data ...
- SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection : Abstract: Multimodal object detection leveraging RGB and Infrared (IR) images is pivotal for robust perception in all-weather scenarios. While recent adapter-based approaches efficiently transfer RGB-...
- FMVP: Masked Flow Matching for Adversarial Video Purification : Abstract: Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regr...
- Prior-Guided DETR for Ultrasound Nodule Detection : Abstract: Accurate detection of ultrasound nodules is essential for the early diagnosis and treatment of thyroid and breast cancers. However, this task remains challenging due to irregular nodule shap...
- Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion : Abstract: Recent breakthroughs of transformer-based diffusion models, particularly with Multimodal Diffusion Transformers (MMDiT) driven models like FLUX and Qwen Image, have facilitated thrilling exp...
- Parameter-Efficient Domain Adaption for CSI Crowd-Counting via Self-Supervised Learning with Adapter Modules : Abstract: Device-free crowd-counting using WiFi Channel State Information (CSI) is a key enabling technology for a new generation of privacy-preserving Internet of Things (IoT) applications. However, ...
- Why Commodity WiFi Sensors Fail at Multi-Person Gait Identification: A Systematic Analysis Using ESP32 : Abstract: WiFi Channel State Information (CSI) has shown promise for single-person gait identification, with numerous studies reporting high accuracy. However, multi-person identification remains larg...
- Efficient Unrolled Networks for Large-Scale 3D Inverse Problems : Abstract: Deep learning-based methods have revolutionized the field of imaging inverse problems, yielding state-of-the-art performance across various imaging domains. The best performing networks inco...
- Beyond Segmentation: An Oil Spill Change Detection Framework Using Synthetic SAR Imagery : Abstract: Marine oil spills are urgent environmental hazards that demand rapid and reliable detection to minimise ecological and economic damage. While Synthetic Aperture Radar (SAR) imagery has becom...
- MagicFight: Personalized Martial Arts Combat Video Generation : Abstract: Amid the surge in generic text-to-video generation, the field of personalized human video generation has witnessed notable advancements, primarily concentrated on single-person scenarios. Ho...
- HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures : Abstract: Recent 3D-aware head generative models based on 3D Gaussian Splatting achieve real-time, photorealistic and view-consistent head synthesis. However, a fundamental limitation persists: the de...
- 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images : Abstract: 3D scene reconstruction is fundamental for spatial intelligence applications such as AR, robotics, and digital twins. Traditional multi-view stereo struggles with sparse viewpoints or low-te...
- InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting : Abstract: Reconstructing complete and animatable 3D human avatars from monocular videos remains challenging, particularly under severe occlusions. While 3D Gaussian Splatting has enabled photorealisti...
- MCD-Net: A Lightweight Deep Learning Baseline for Optical-Only Moraine Segmentation : Abstract: Glacial segmentation is essential for reconstructing past glacier dynamics and evaluating climate-driven landscape change. However, weak optical contrast and the limited availability of high...
- PhysSFI-Net: Physics-informed Geometric Learning of Skeletal and Facial Interactions for Orthognathic Surgical Outcome Prediction : Abstract: Orthognathic surgery repositions jaw bones to restore occlusion and enhance facial aesthetics. Accurate simulation of postoperative facial morphology is essential for preoperative planning. ...
- AlignVTOFF: Texture-Spatial Feature Alignment for High-Fidelity Virtual Try-Off : Abstract: Virtual Try-Off (VTOFF) is a challenging multimodal image generation task that aims to synthesize high-fidelity flat-lay garments under complex geometric deformation and rich high-frequency ...
- Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding : Abstract: This paper presents a novel 3D semantic segmentation method for large-scale point cloud data that does not require annotated 3D training data or paired RGB images. The proposed approach proj...
- Adapting Depth Anything to Adverse Imaging Conditions with Events : Abstract: Robust depth estimation under dynamic and adverse lighting conditions is essential for robotic systems. Currently, depth foundation models, such as Depth Anything, achieve great success in i...
- Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement : Abstract: Segment Anything Models (SAMs), known for their exceptional zero-shot segmentation performance, have garnered significant attention in the research community. Nevertheless, their performance...
- Nighttime Hazy Image Enhancement via Progressively and Mutually Reinforcing Night-Haze Priors : Abstract: Enhancing the visibility of nighttime hazy images is challenging due to the complex degradation distributions. Existing methods mainly address a single type of degradation (e.g., haze or low...
- API: Empowering Generalizable Real-World Image Dehazing via Adaptive Patch Importance Learning : Abstract: Real-world image dehazing is a fundamental yet challenging task in low-level vision. Existing learning-based methods often suffer from significant performance degradation when applied to com...
- Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation : Abstract: Spatial reasoning -- the ability to perceive and reason about relationships in space -- advances vision-language models (VLMs) from visual perception toward spatial semantic understanding. E...
- AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing : Abstract: Large Vision-Language Models (LVLMs) have achieved substantial progress in cross-modal tasks. However, due to language bias, LVLMs are susceptible to object hallucination, which can be prima...
- MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization : Abstract: Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and ...
- Face Normal Estimation from Rags to Riches : Abstract: Although recent approaches to face normal estimation have achieved promising results, their effectiveness heavily depends on large-scale paired data for training. This paper concentrates on ...
- MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering : Abstract: Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has...
- AR-MOT: Autoregressive Multi-object Tracking : Abstract: As multi-object tracking (MOT) tasks continue to evolve toward more general and multi-modal scenarios, the rigid and task-specific architectures of existing MOT methods increasingly hinder t...
- TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing : Abstract: Thanks to the powerful language comprehension capabilities of Large Language Models (LLMs), existing instruction-based image editing methods have introduced Multimodal Large Language Models ...
- Learning Action Hierarchies via Hybrid Geometric Diffusion : Abstract: Temporal action segmentation is a critical task in video understanding, where the goal is to assign action labels to each frame in a video. While recent advances leverage iterative refinemen...
- Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems : Abstract: The paradigm of Earth Observation analysis is shifting from static deep learning models to autonomous agentic AI. Although recent vision foundation models and multimodal large language model...
- Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion : Abstract: Existing text-driven infrared and visible image fusion approaches often rely on textual information at the sentence level, which can lead to semantic noise from redundant text and fail to fu...
- RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations : Abstract: With the growing demand for real-time video enhancement in live applications, existing methods often struggle to balance speed and effective exposure control, particularly under uneven light...
- GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection : Abstract: Feature-based anomaly detection is widely adopted in industrial inspection due to the strong representational power of large pre-trained vision encoders. While most existing methods focus on...
- ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting : Abstract: Most current audio-driven facial animation research primarily focuses on generating videos with neutral emotions. While some studies have addressed the generation of facial videos driven by ...
- Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning : Abstract: As the demand for analyzing egocentric videos grows, egocentric visual attention prediction, anticipating where a camera wearer will attend, has garnered increasing attention. However, it re...
- Causality-Aware Temporal Projection for Video Understanding in Video-LLMs : Abstract: Recent Video Large Language Models (Video-LLMs) have shown strong multimodal reasoning capabilities, yet remain challenged by video understanding tasks that require consistent temporal order...
- DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization : Abstract: The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, ...
- CTIS-QA: Clinical Template-Informed Slide-level Question Answering for Pathology : Abstract: In this paper, we introduce a clinical diagnosis template-based pipeline to systematically collect and structure pathological information. In collaboration with pathologists and guided by th...
- MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement : Abstract: Current audio-driven 3D head generation methods mainly focus on single-speaker scenarios, lacking natural, bidirectional listen-and-speak interaction. Achieving seamless conversational behav...
- Point-SRA: Self-Representation Alignment for 3D Representation Learning : Abstract: Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ...
- FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing : Abstract: First-Frame Propagation (FFP) offers a promising paradigm for controllable video editing, but existing methods are hampered by a reliance on cumbersome run-time guidance. We identify the roo...
- Real-Time Lane Detection via Efficient Feature Alignment and Covariance Optimization for Low-Power Embedded Systems : Abstract: Real-time lane detection in embedded systems encounters significant challenges due to subtle and sparse visual signals in RGB images, often constrained by limited computational resources and...
- Learnability-Driven Submodular Optimization for Active Roadside 3D Detection : Abstract: Roadside perception datasets are typically constructed via cooperative labeling between synchronized vehicle and roadside frame pairs. However, real deployment often requires annotation of r...
- Mitigating Longitudinal Performance Degradation in Child Face Recognition Using Synthetic Data : Abstract: Longitudinal face recognition in children remains challenging due to rapid and nonlinear facial growth, which causes template drift and increasing verification errors over time. This work in...
- Evaluating Deep Learning-Based Face Recognition for Infants and Toddlers: Impact of Age Across Developmental Stages : Abstract: Face recognition for infants and toddlers presents unique challenges due to rapid facial morphology changes, high inter-class similarity, and limited dataset availability. This study evaluat...
- Trustworthy Data-Driven Wildfire Risk Prediction and Understanding in Western Canada : Abstract: In recent decades, the intensification of wildfire activity in western Canada has resulted in substantial socio-economic and environmental losses. Accurate wildfire risk prediction is hinder...
- LabelAny3D: Label Any Object 3D in the Wild : Abstract: Detecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving...
- Animated 3DGS Avatars in Diverse Scenes with Consistent Lighting and Shadows : Abstract: We present a method for consistent lighting and shadows when animated 3D Gaussian Splatting (3DGS) avatars interact with 3DGS scenes or with dynamic objects inserted into otherwise static sc...
- An Empirical Study of Monocular Human Body Measurement Under Weak Calibration : Abstract: Estimating human body measurements from monocular RGB imagery remains challenging due to scale ambiguity, viewpoint sensitivity, and the absence of explicit depth information. This work pres...
- CAP-IQA: Context-Aware Prompt-Guided CT Image Quality Assessment : Abstract: Prompt-based methods, which encode medical priors through descriptive text, have been only minimally explored for CT Image Quality Assessment (IQA). While such prompts can embed prior knowle...
- Guiding Token-Sparse Diffusion Models : Abstract: Diffusion models deliver high quality in image synthesis but remain expensive during training and inference. Recent works have leveraged the inherent redundancy in visual content to make tra...
- Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation : Abstract: Manual font design is an intricate process that transforms a stylistic visual concept into a coherent glyph set. This challenge persists in automated Few-shot Font Generation (FFG), where mo...
- FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition : Abstract: To enhance the generalization performance of Multi-Task Networks (MTN) in Face Attribute Recognition (FAR), it is crucial to share relevant information across multiple related prediction tas...
- Improving Flexible Image Tokenizers for Autoregressive Image Generation : Abstract: Flexible image tokenizers aim to represent an image using an ordered 1D variable-length token sequence. This flexible tokenization is typically achieved through nested dropout, where a porti...
- BARE: Towards Bias-Aware and Reasoning-Enhanced One-Tower Visual Grounding : Abstract: Visual Grounding (VG), which aims to locate a specific region referred to by expressions, is a fundamental yet challenging task in the multimodal understanding fields. While recent grounding...
- DiffKD-DCIS: Predicting Upgrade of Ductal Carcinoma In Situ with Diffusion Augmentation and Knowledge Distillation : Abstract: Accurately predicting the upgrade of ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC) is crucial for surgical planning. However, traditional deep learning methods face chal...
- Higher-Order Domain Generalization in Magnetic Resonance-Based Assessment of Alzheimer's Disease : Abstract: Despite progress in deep learning for Alzheimer's disease (AD) diagnostics, models trained on structural magnetic resonance imaging (sMRI) often do not perform well when applied to new cohor...
- Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization : Abstract: Parallel test-time scaling typically trains separate generation and verification models, incurring high training and inference costs. We propose Advantage Decoupled Preference Optimization (...
- Robust Ship Detection and Tracking Using Modified ViBe and Backwash Cancellation Algorithm : Abstract: In this paper, we propose a robust real time detection and tracking method for detecting ships in a coastal video sequences. Since coastal scenarios are unpredictable and scenes have dynamic...
- Domain Adaptation of Carotid Ultrasound Images using Generative Adversarial Network : Abstract: Deep learning has been extensively used in medical imaging applications, assuming that the test and training datasets belong to the same probability distribution. However, a common challenge...
- Language as Prior, Vision as Calibration: Metric Scale Recovery for Monocular Depth Estimation : Abstract: Relative-depth foundation models transfer well, yet monocular metric depth remains ill-posed due to unidentifiable global scale and heightened domain-shift sensitivity. Under a frozen-backbo...
- PartImageNet++ Dataset: Enhancing Visual Models with High-Quality Part Annotations : Abstract: To address the scarcity of high-quality part annotations in existing datasets, we introduce PartImageNet++ (PIN++), a dataset that provides detailed part annotations for all categories in Im...
- In defense of the two-stage framework for open-set domain adaptive semantic segmentation : Abstract: Open-Set Domain Adaptation for Semantic Segmentation (OSDA-SS) presents a significant challenge, as it requires both domain adaptation for known classes and the distinction of unknowns. Exis...
- EdgeNeRF: Edge-Guided Regularization for Neural Radiance Fields from Sparse Views : Abstract: Neural Radiance Fields (NeRF) achieve remarkable performance in dense multi-view scenarios, but their reconstruction quality degrades significantly under sparse inputs due to geometric artif...
- DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer : Abstract: Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic i...
- AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval : Abstract: Despite notable advancements in remote sensing vision-language models (VLMs), existing models often struggle with spatial understanding, limiting their effectiveness in real-world applicatio...
- Mask-Guided Multi-Task Network for Face Attribute Recognition : Abstract: Face Attribute Recognition (FAR) plays a crucial role in applications such as person re-identification, face retrieval, and face editing. Conventional multi-task attribute recognition method...
- Evaluation of Convolutional Neural Network For Image Classification with Agricultural and Urban Datasets : Abstract: This paper presents the development and evaluation of a custom Convolutional Neural Network (CustomCNN) created to study how architectural design choices affect multi-domain image classifica...
- Unsupervised SE(3) Disentanglement for in situ Macromolecular Morphology Identification from Cryo-Electron Tomography : Abstract: Cryo-electron tomography (cryo-ET) provides direct 3D visualization of macromolecules inside the cell, enabling analysis of their in situ morphology. This morphology can be regarded as an SE...
- Garment Inertial Denoiser (GID): Endowing Accurate Motion Capture via Loose IMU Denoiser : Abstract: Wearable inertial motion capture (MoCap) provides a portable, occlusion-free, and privacy-preserving alternative to camera-based systems, but its accuracy depends on tightly attached sensors...
- Advanced Machine Learning Approaches for Enhancing Person Re-Identification Performance : Abstract: Person re-identification (ReID) plays a critical role in intelligent surveillance systems by linking identities across multiple cameras in complex environments. However, ReID faces significa...
- Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning : Abstract: Understanding neural responses to visual stimuli remains challenging due to the inherent complexity of brain representations and the modality gap between neural data and visual inputs. Exist...
- VReID-XFD: Video-based Person Re-identification at Extreme Far Distance Challenge Results : Abstract: Person re-identification (ReID) across aerial and ground views at extreme far distances introduces a distinct operating regime where severe resolution degradation, extreme viewpoint changes,...
- S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss : Abstract: Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deploymen...
- RFAssigner: A Generic Label Assignment Strategy for Dense Object Detection : Abstract: Label assignment is a critical component in training dense object detectors. State-of-the-art methods typically assign each training sample a positive and a negative weight, optimizing the a...
- HyDRA: Hybrid Denoising Regularization for Measurement-Only DEQ Training : Abstract: Solving image reconstruction problems of the form \(\mathbf{A} \mathbf{x} = \mathbf{y}\) remains challenging due to ill-posedness and the lack of large-scale supervised datasets. Deep Equili...
- UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass : Abstract: We present UniSH, a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction. A key challenge in this domain is the scarcity of large-scale, annotated real-wo...
- Real-Time LiDAR Point Cloud Densification for Low-Latency Spatial Data Transmission : Abstract: To realize low-latency spatial transmission system for immersive telepresence, there are two major problems: capturing dynamic 3D scene densely and processing them in real time. LiDAR sensor...
- XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression : Abstract: Learning-based 3D visual geometry models have benefited substantially from large-scale transformers. Among these, StreamVGGT leverages frame-wise causal attention for strong streaming recons...
- MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity : Abstract: The unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA), particularly in establishing accurate perceptual feature correspo...
- Crowded Video Individual Counting Informed by Social Grouping and Spatial-Temporal Displacement Priors : Abstract: Video Individual Counting (VIC) is a recently introduced task aiming to estimate pedestrian flux from a video. It extends Video Crowd Counting (VCC) beyond the per-frame pedestrian count. In...
- GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation : Abstract: Conceal dense prediction (CDP), especially RGB-D camouflage object detection and open-vocabulary camouflage object segmentation, plays a crucial role in advancing the understanding and reaso...
- CardioMOD-Net: A Modal Decomposition-Neural Network Framework for Diagnosis and Prognosis of HFpEF from Echocardiography Cine Loops : Abstract: Introduction: Heart failure with preserved ejection fraction (HFpEF) arises from diverse comorbidities and progresses through prolonged subclinical stages, making early diagnosis and prognos...
- Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation : Abstract: Semantic segmentation is a fundamental problem in computer vision and it requires high-resolution feature maps for dense prediction. Current coordinate-guided low-resolution feature interpol...
- Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization : Abstract: We present HAQAGen, a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity. The proposed model introduces (i) a ...
- A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields : Abstract: We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery...
- Efficient Hyperspectral Image Reconstruction Using Lightweight Separate Spectral Transformers : Abstract: Hyperspectral imaging (HSI) is essential across various disciplines for its capacity to capture rich spectral information. However, efficiently reconstructing hyperspectral images from compr...
- Deepfake Detection with Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking : Abstract: Deepfake detection still faces significant challenges in cross-dataset and real-world complex scenarios. The root cause lies in the high diversity of artifact distributions introduced by dif...
- Mono3DV: Monocular 3D Object Detection with 3D-Aware Bipartite Matching and Variational Query DeNoising : Abstract: While DETR-like architectures have demonstrated significant potential for monocular 3D object detection, they are often hindered by a critical limitation: the exclusion of 3D attributes from...
- Lightweight Channel Attention for Efficient CNNs : Abstract: Attention mechanisms have become integral to modern convolutional neural networks (CNNs), delivering notable performance improvements with minimal computational overhead. However, the effici...
- DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models : Abstract: Remote sensing (RS) large vision-language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referri...
- UnrealPose: Leveraging Game Engine Kinematics for Large-Scale Synthetic Human Pose Data : Abstract: Diverse, accurately labeled 3D human pose data is expensive and studio-bound, while in-the-wild datasets lack known ground truth. We introduce UnrealPose-Gen, an Unreal Engine 5 pipeline bui...
- Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency Loss : Abstract: We introduce a novel FSVOS model that employs a local matching strategy to restrict the search space to the most relevant neighboring pixels. Rather than relying on inefficient standard im2c...
- A Deep Learning Approach for Automated Skin Lesion Diagnosis with Explainable AI : Abstract: Skin cancer is also one of the most common and dangerous types of cancer in the world that requires timely and precise diagnosis. In this paper, a deep-learning architecture of the multi-cla...
- PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education : Abstract: Generative AI models, particularly Text-to-Video (T2V) systems, offer a promising avenue for transforming science education by automating the creation of engaging and intuitive visual explan...
- Learning to Segment Liquids in Real-world Images : Abstract: Different types of liquids such as water, wine and medicine appear in all aspects of daily life. However, limited attention has been given to the task, hindering the ability of robots to avo...
- ShadowGS: Shadow-Aware 3D Gaussian Splatting for Satellite Imagery : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a novel paradigm for 3D reconstruction from satellite imagery. However, in multi-temporal satellite images, prevalent shadows exhibit significant ...
- Four-Stage Alzheimer's Disease Classification from MRI Using Topological Feature Extraction, Feature Selection, and Ensemble Learning : Abstract: Accurate and efficient classification of Alzheimer's disease (AD) severity from brain magnetic resonance imaging (MRI) remains a critical challenge, particularly when limited data and model ...
- Comparative Evaluation of CNN Architectures for Neural Style Transfer in Indonesian Batik Motif Generation: A Comprehensive Study : Abstract: Neural Style Transfer (NST) provides a computational framework for the digital preservation and generative exploration of Indonesian batik motifs; however, existing approaches remain largely...
- VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition : Abstract: Reinforcement Learning (RL) is crucial for empowering VideoLLMs with complex spatiotemporal reasoning. However, current RL paradigms predominantly rely on random data shuffling or naive curr...
- VL-OrdinalFormer: Vision Language Guided Ordinal Transformers for Interpretable Knee Osteoarthritis Grading : Abstract: Knee osteoarthritis (KOA) is a leading cause of disability worldwide, and accurate severity assessment using the Kellgren Lawrence (KL) grading system is critical for clinical decision makin...
- Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge : Abstract: We propose Motion-Compensated Latent Semantic Canvases (MCLSC) for visual situational awareness on resource-constrained edge devices. The core idea is to maintain persistent semantic metadat...
- Unified Review and Benchmark of Deep Segmentation Architectures for Cardiac Ultrasound on CAMUS : Abstract: Several review papers summarize cardiac imaging and DL advances, few works connect this overview to a unified and reproducible experimental benchmark. In this study, we combine a focused rev...
- Can Generative Models Actually Forge Realistic Identity Documents? : Abstract: Generative image models have recently shown significant progress in image realism, leading to public concerns about their potential misuse for document forgery. This paper explores whether c...
- From Bench to Bedside: A Review of Clinical Trials in Drug Discovery and Development : Abstract: Clinical trials are an indispensable part of the drug development process, bridging the gap between basic research and clinical application. During the development of new drugs, clinical tri...
- RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers : Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (...
- RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs in Medicine : Abstract: Answering complex real-world questions in the medical domain often requires accurate retrieval from medical Textual Knowledge Graphs (medical TKGs), as the relational path information from T...
- Context-aware Decoding Reduces Hallucination in Query-focused Summarization : Abstract: Query-focused summarization (QFS) aims to provide a summary of a single document/multi documents that can satisfy the information needs of a given query. It is useful for various real-world ...
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum : Abstract: While dense retrieval models have become the standard for state-of-the-art information retrieval, their deployment is often constrained by high memory requirements and reliance on GPU accele...
- The Gray Area: Characterizing Moderator Disagreement on Reddit : Abstract: Volunteer moderators play a crucial role in sustaining online dialogue, but they often disagree about what should or should not be allowed. In this paper, we study the complexity of content ...
- SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving : Abstract: We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods tha...
- SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning : Abstract: Existing fraud detection methods predominantly rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context. This limits th...
- Entity-Aware and Secure Query Optimization in Database Using Named Entity Recognition : Abstract: Cloud storage has become the backbone of modern data infrastructure, yet privacy and efficient data retrieval remain significant challenges. Traditional privacy-preserving approaches primari...
- 600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script : Abstract: This technical report presents the 600K-KS-OCR Dataset, a large-scale synthetic corpus comprising approximately 602,000 word-level segmented images designed for training and evaluating optic...
- Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling : Abstract: Toxicity detection is inherently subjective, shaped by the diverse perspectives and social priors of different demographic groups. While ``pluralistic'' modeling as used in economics and the...
- Estimating Text Temperature : Abstract: Autoregressive language models typically use temperature parameter at inference to shape the probability distribution and control the randomness of the text generated. After the text was gen...
- Classifying several dialectal Nawatl varieties : Abstract: Mexico is a country with a large number of indigenous languages, among which the most widely spoken is Nawatl, with more than two million people currently speaking it (mainly in North and Ce...
- Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs) : Abstract: In Large Language Models (LLMs), the number of parameters has grown exponentially in the past few years, e.g., from 1.5 billion parameters in GPT-2 to 175 billion in GPT-3 to possibly more t...
- CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models : Abstract: Autoregressive large language models achieve strong results on many benchmarks, but decoding remains fundamentally latency-limited by sequential dependence on previously generated tokens. Di...
- From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality : Abstract: Explainable AI (XAI) methods like SHAP and LIME produce numerical feature attributions that remain inaccessible to non expert users. Prior work has shown that Large Language Models (LLMs) ca...
- ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging : Abstract: The Arabic language is characterized by a rich tapestry of regional dialects that differ substantially in phonetics and lexicon, reflecting the geographic and cultural diversity of its speak...
- Toward Global Large Language Models in Medicine : Abstract: Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landsc...
- Confidence Estimation for LLMs in Multi-turn Interactions : Abstract: While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research dominantly focuses on single-turn settings. The dynamics ...
- Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation : Abstract: Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical...
- Hidden State Poisoning Attacks against Mamba-based Language Models : Abstract: State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unex...
- CSF: Contrastive Semantic Features for Direct Multilingual Sign Language Generation : Abstract: Sign language translation systems typically require English as an intermediary language, creating barriers for non-English speakers in the global deaf community. We present Canonical Semanti...
- Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents : Abstract: Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows, making effective memory management critical. Existing methods typicall...
- DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs : Abstract: Multimodal Large Language Models (MLLMs) show promise for medical applications, yet progress in dermatology lags due to limited training data, narrow task coverage, and lack of clinically-gr...
- Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment : Abstract: Recent studies have shown that prompting can enable large language models (LLMs) to simulate specific personality traits and produce behaviors that align with those traits. However, there is...
- Towards Automated Lexicography: Generating and Evaluating Definitions for Learner's Dictionaries : Abstract: We study dictionary definition generation (DDG), i.e., the generation of non-contextualized definitions for given headwords. Dictionary definitions are an essential resource for learning wor...
- CSCBench: A PVC Diagnostic Benchmark for Commodity Supply Chain Reasoning : Abstract: Large Language Models (LLMs) have achieved remarkable success in general benchmarks, yet their competence in commodity supply chains (CSCs) -- a domain governed by institutional rule systems...
- BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali : Abstract: Despite its widespread use, Bengali lacks a robust automated International Phonetic Alphabet (IPA) transcription system that effectively supports both standard language and regional dialecta...
- Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation : Abstract: Precisely controlling the length of generated text is a common requirement in real-world applications. However, despite significant advancements in following human instructions, Large Langua...
- A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription : Abstract: Knowledge Tracing (KT) aims to estimate a learner's evolving mastery based on interaction histories. Recent studies have explored Large Language Models (LLMs) for KT via autoregressive natur...
- How Does Prefix Matter in Reasoning Model Tuning? : Abstract: Recent alignment studies commonly remove introductory boilerplate phrases from supervised fine-tuning (SFT) datasets. This work challenges that assumption. We hypothesize that safety- and re...
- Steerability of Instrumental-Convergence Tendencies in LLMs : Abstract: We examine two properties of AI systems: capability (what a system can do) and steerability (how reliably one can shift behavior toward intended outcomes). In our experiments, higher capabil...
- HalluZig: Hallucination Detection using Zigzag Persistence : Abstract: The factual reliability of Large Language Models (LLMs) remains a critical barrier to their adoption in high-stakes domains due to their propensity to hallucinate. Current detection methods ...
- EmoHarbor: Evaluating Personalized Emotional Support by Simulating the User's Internal World : Abstract: Current evaluation paradigms for emotional support conversations tend to reward generic empathetic responses, yet they fail to assess whether the support is genuinely personalized to users' ...
- From Failure to Mastery: Generating Hard Samples for Tool-use Agents : Abstract: The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random samp...
- Can Legislation Be Made Machine-Readable in PROLEG? : Abstract: The anticipated positive social impact of regulatory processes requires both the accuracy and efficiency of their application. Modern artificial intelligence technologies, including natural ...
- Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR : Abstract: The INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM) promotes multilingual conversational ASR with large language models (LLMs). Our previous SHNU-m...
- From Emotion Classification to Emotional Reasoning: Enhancing Emotional Intelligence in Large Language Models : Abstract: This work investigates whether synthetic emotional chain-of-thought data can improve the emotional reasoning abilities of smaller open large language models (LLMs). We design a multi-agent g...
- EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery : Abstract: Current evaluations of mathematical reasoning in large language models (LLMs) are dominated by static benchmarks, either derived from competition-style problems or curated through costly exp...
- FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems : Abstract: Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential ...
- Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems : Abstract: The deployment of Large Language Models (LLMs) in mental health counseling faces the dual challenges of hallucinations and lack of empathy. While the former may be mitigated by RAG (retrieva...
- Racka: Efficient Hungarian LLM Adaptation on Academic Infrastructure : Abstract: We present Racka, a lightweight, continually pretrained large language model designed to bridge the resource gap between Hungarian and high-resource languages such as English and German. Rac...
- Almost Clinical: Linguistic properties of synthetic electronic health records : Abstract: This study evaluates the linguistic and clinical suitability of synthetic electronic health records (EHRs) in the field of mental health. First, we describe the rationale and the methodology...
- DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models : Abstract: Large language models (LLMs) frequently produce inaccurate or fabricated information, known as "hallucinations," which compromises their reliability. Existing approaches often train an "Evil...
- SongSage: A Large Musical Language Model with Lyric Generative Pre-training : Abstract: Large language models have achieved significant success in various domains, yet their understanding of lyric-centric knowledge has not been fully explored. In this work, we first introduce P...
- KOS-TL (Knowledge Operation System Type Logic) : Abstract: This paper introduces KOS-TL (Knowledge Operation System Type Logic), a novel constructive framework designed to provide a rigorous logical foundation for autonomous and executable knowledge...
- RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution : Abstract: We present RoboPhD, a system where AI agents autonomously conduct research to improve Text-to-SQL performance. RoboPhD implements a closed-loop evolution cycle with two coordinated component...
- Listen, Attend, Understand: a Regularization Technique for Stable E2E Speech Translation Training on High Variance labels : Abstract: End-to-End Speech Translation often shows slower convergence and worse performance when target transcriptions exhibit high variance and semantic ambiguity. We propose Listen, Attend, Underst...
- EmoLoom-2B: Fast Base-Model Screening for Emotion Classification and VAD with Lexicon-Weak Supervision and KV-Off Evaluation : Abstract: We introduce EmoLoom-2B, a lightweight and reproducible pipeline that turns small language models under 2B parameters into fast screening candidates for joint emotion classification and Vale...
- Unsupervised Text Style Transfer for Controllable Intensity : Abstract: Unsupervised Text Style Transfer (UTST) aims to build a system to transfer the stylistic properties of a given text without parallel text pairs. Compared with text transfer between style pol...
- KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs : Abstract: While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent co...
- HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery : Abstract: As a pivotal task in data lake management, joinable table discovery has attracted widespread interest. While existing language model-based methods achieve remarkable performance by combining...
- Rate-Distortion Analysis of Compressed Query Delegation with Low-Rank Riemannian Updates : Abstract: Bounded-context agents fail when intermediate reasoning exceeds an effective working-memory budget. We study compressed query delegation (CQD): (i) compress a high-dimensional latent reasoni...
- Causal Multi-fidelity Surrogate Forward and Inverse Models for ICF Implosions : Abstract: Continued progress in inertial confinement fusion (ICF) requires solving inverse problems relating experimental observations to simulation input parameters, followed by design optimization. ...
- Bayesian uncertainty-aware deep learning with noisy labels: Tackling annotation ambiguity in EEG seizure detection : Abstract: Deep learning is advancing EEG processing for automated epileptic seizure detection and onset zone localization, yet its performance relies heavily on high-quality annotated training data. H...
- Consistency for Large Neural Networks: Regression and Classification : Abstract: Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The we...
- Design and Scheduling of an AI-based Queueing System : Abstract: To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other j...
- On the social bias of speech self-supervised models : Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant conce...
- Matrix Manifold Neural Networks++ : Abstract: Deep neural networks (DNNs) on Riemannian manifolds have garnered increasing interest in various applied areas. For instance, DNNs on spherical and hyperbolic manifolds have been designed to...
- Development of a high-resolution indoor radon map using a new machine learning-based probabilistic model and German radon survey data : Abstract: Accurate knowledge of indoor radon concentration is crucial for assessing radon-related health effects or identifying radon-prone areas. Indoor radon concentration at the national scale is u...
- ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking : Abstract: Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep lear...
- Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia : Abstract: Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can le...
- Dynamic Graph Neural Networks for Physiological Based Pharmacokinetic Modeling: A Novel Data Driven Approach to Drug Concentration Prediction : Abstract: Physiologically Based Pharmacokinetic (PBPK) modeling is a key tool in drug development for predicting drug concentration dynamics across organs. Traditional PBPK approaches rely on ordinary...
- ManiBox: Enhancing Embodied Spatial Generalization via Scalable Simulation Data Generations : Abstract: Embodied agents require robust spatial intelligence to execute precise real-world manipulations. However, this remains a significant challenge, as current methods often struggle to accuratel...
- Echo State Networks for Spatio-Temporal Area-Level Data : Abstract: Spatio-temporal area-level datasets play a critical role in official statistics, providing valuable insights for policy-making and regional planning. Accurate modeling and forecasting of the...
- Stochastic Online Optimization for Cyber-Physical and Robotic Systems : Abstract: We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our pr...
- Sample Path Regularity of Gaussian Processes from the Covariance Kernel : Abstract: Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of...
- Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices : Abstract: Farmers in remote areas need quick and reliable methods for identifying plant diseases, yet they often lack access to laboratories or high-performance computing resources. Deep learning mode...
- Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders : Abstract: This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a l...
- Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction : Abstract: Out-of-distribution (OOD) prediction is often approached by restricting models to causal or invariant covariates, avoiding non-causal spurious associations that may be unstable across enviro...
- Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning : Abstract: Polymer-based long-acting injectables (LAIs) have transformed the treatment of chronic diseases by enabling controlled drug delivery, thus reducing dosing frequency and extending therapeutic...
- Improved Accuracy for Private Continual Cardinality Estimation in Fully Dynamic Streams via Matrix Factorization : Abstract: We study differentially-private statistics in the fully dynamic continual observation model, where many updates can arrive at each time step and updates to a stream can involve both insertio...
- VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation : Abstract: Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input struct...
- From Mice to Trains: Amortized Bayesian Inference on Graph Data : Abstract: Graphs arise across diverse domains, from biology and chemistry to social and information networks, as well as in transportation and logistics. Inference on graph-structured data requires me...
- Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models : Abstract: In histopathology, pathologists examine both tissue architecture at low magnification and fine-grained morphology at high magnification. Yet, the performance of pathology foundation models a...
- QuIC: A Quantum-Inspired Interaction Classifier for Revitalizing Shallow CNNs in Fine-Grained Recognition : Abstract: Deploying deep learning models for Fine-Grained Visual Classification (FGVC) on resource-constrained edge devices remains a significant challenge. While deep architectures achieve high accur...
- Feature-based Inversion of 2.5D Controlled Source Electromagnetic Data using Generative Priors : Abstract: In this study, we investigate feature-based 2.5D controlled source marine electromagnetic (mCSEM) data inversion using generative priors. Two-and-half dimensional modeling using finite diffe...
- Car Drag Coefficient Prediction from 3D Point Clouds Using a Slice-Based Surrogate Model : Abstract: The automotive industry's pursuit of enhanced fuel economy and performance necessitates efficient aerodynamic design. However, traditional evaluation methods such as computational fluid dyna...
- MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics : Abstract: Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming task...
- A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk : Abstract: This study evaluates the performance of various classifiers in three distinct models: response, risk, and response-risk, concerning credit card mail campaigns and default prediction. In the ...
- Forget Less by Learning Together through Concept Consolidation : Abstract: Custom Diffusion Models (CDMs) have gained significant attention due to their remarkable ability to personalize generative processes. However, existing CDMs suffer from catastrophic forgetti...
- Efficient temporal prediction of compressible flows in irregular domains using Fourier neural operators : Abstract: This paper investigates the temporal evolution of high-speed compressible fluids in irregular flow fields using the Fourier Neural Operator (FNO). We reconstruct the irregular flow field poi...
- Forget Less by Learning from Parents Through Hierarchical Relationships : Abstract: Custom Diffusion Models (CDMs) offer impressive capabilities for personalization in generative modeling, yet they remain vulnerable to catastrophic forgetting when learning new concepts sequ...
- SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses : Abstract: Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time bu...
- Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits : Abstract: Over-parameterization is commonly used to increase the expressivity of variational quantum circuits (VQCs), yet deeper and more highly parameterized circuits often exhibit poor trainability ...
- Aspect Extraction from E-Commerce Product and Service Reviews : Abstract: Aspect Extraction (AE) is a key task in Aspect-Based Sentiment Analysis (ABSA), yet it remains difficult to apply in low-resource and code-switched contexts like Taglish, a mix of Tagalog an...
- SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines : Abstract: Retrieval-Augmented Generation (RAG) systems often rely on fixed top-k document selection mechanisms that ignore downstream generation quality and impose computational overheads. We propose ...
- Machine learning modularity : Abstract: Based on a transformer based sequence-to-sequence architecture combined with a dynamic batching algorithm, this work introduces a machine learning framework for automatically simplifying com...
- Sparse Convex Biclustering : Abstract: Biclustering is an essential unsupervised machine learning technique for simultaneously clustering rows and columns of a data matrix, with widespread applications in genomics, transcriptomic...
- Latent Space Element Method : Abstract: How can we build surrogate solvers that train on small domains but scale to larger ones without intrusive access to PDE operators? Inspired by the Data-Driven Finite Element Method (DD-FEM) ...
- Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance : Abstract: We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both m...
- Hidden costs for inference with deep network on embedded system devices : Abstract: This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to mea...
- Simplex Deep Linear Discriminant Analysis : Abstract: We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA ...
- Deep Linear Discriminant Analysis Revisited : Abstract: We show that for unconstrained Deep Linear Discriminant Analysis (LDA) classifiers, maximum-likelihood training admits pathological solutions in which class means drift together, covariances...
- Variance-Reduced Diffusion Sampling via Conditional Score Expectation Identity : Abstract: We introduce and prove a \textbf{Conditional Score Expectation (CSE)} identity: an exact relation for the marginal score of affine diffusion processes that links scores across time via a con...
- Identifying recurrent flows in high-dimensional dissipative chaos from low-dimensional embeddings : Abstract: Unstable periodic orbits (UPOs) are the non-chaotic, dynamical building blocks of spatio-temporal chaos, motivating a first-principles based theory for turbulence ever since the discovery of...
- Learning Relationship between Quantum Walks and Underdamped Langevin Dynamics : Abstract: Fast computational algorithms are in constant demand, and their development has been driven by advances such as quantum speedup and classical acceleration. This paper intends to study search...
- A Novel Deep Learning Method for Segmenting the Left Ventricle in Cardiac Cine MRI : Abstract: This research aims to develop a novel deep learning network, GBU-Net, utilizing a group-batch-normalized U-Net framework, specifically designed for the precise semantic segmentation of the l...
- Four Quadrants of Difficulty: A Simple Categorisation and its Limits : Abstract: Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of samples and scheduling them accordingly. In NLP, difficulty is commonly approximated us...
- Modeling Information Blackouts in Missing Not-At-Random Time Series Data : Abstract: Large-scale traffic forecasting relies on fixed sensor networks that often exhibit blackouts: contiguous intervals of missing measurements caused by detector or communication failures. These...
- Segmentation and Processing of German Court Decisions from Open Legal Data : Abstract: The availability of structured legal data is important for advancing Natural Language Processing (NLP) techniques for the German legal system. One of the most widely used datasets, Open Lega...
- iFlip: Iterative Feedback-driven Counterfactual Example Refinement : Abstract: Counterfactual examples are minimal edits to an input that alter a model's prediction. They are widely employed in explainable AI to probe model behavior and in natural language processing (...
- Fast Gibbs Sampling on Bayesian Hidden Markov Model with Missing Observations : Abstract: The Hidden Markov Model (HMM) is a widely-used statistical model for handling sequential data. However, the presence of missing observations in real-world datasets often complicates the appl...
- Efficient Cover Construction for Ball Mapper via Accelerated Range Queries : Abstract: Ball Mapper is an widely used tool in topological data analysis for summarizing the structure of high-dimensional data through metric-based coverings and graph representations. A central com...
- LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs : Abstract: Large Language Models have revolutionized information processing, yet their reliability is severely compromised by faithfulness hallucinations. While current approaches attempt to mitigate t...
- Bayesian Negative Binomial Regression of Afrobeats Chart Persistence : Abstract: Afrobeats songs compete for attention on streaming platforms, where chart visibility can influence both revenue and cultural impact. This paper examines whether collaborations help songs rem...
- SGD with Dependent Data: Optimal Estimation, Regret, and Inference : Abstract: This work investigates the performance of the final iterate produced by stochastic gradient descent (SGD) under temporally dependent data. We consider two complementary sources of dependence...
- Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning : Abstract: Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing adva...
- A New Framework for Explainable Rare Cell Identification in Single-Cell Transcriptomics Data : Abstract: The detection of rare cell types in single-cell transcriptomics data is crucial for elucidating disease pathogenesis and tissue development dynamics. However, a critical gap that persists in...
- FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness : Abstract: Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing t...
- AppellateGen: A Benchmark for Appellate Legal Judgment Generation : Abstract: Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on sta...
- Concave Certificates: Geometric Framework for Distributionally Robust Risk and Complexity Analysis : Abstract: Distributionally Robust (DR) optimization aims to certify worst-case risk within a Wasserstein uncertainty set. Current certifications typically rely either on global Lipschitz bounds, which...
- Making MoE based LLM inference resilient with Tarragon : Abstract: Mixture-of-Experts (MoE) models are increasingly used to serve LLMs at scale, but failures become common as deployment scale grows. Existing systems exhibit poor failure resilience: even a s...
- Stochastic Control Methods for Optimization : Abstract: In this work, we investigate a stochastic control framework for global optimization over both finite-dimensional Euclidean spaces and the Wasserstein space of probability measures. In the Eu...
- Evidence Slopes and Effective Dimension in Singular Linear Models : Abstract: Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters...
- NeuroSSM: Multiscale Differential State-Space Modeling for Context-Aware fMRI Analysis : Abstract: Accurate fMRI analysis requires sensitivity to temporal structure across multiple scales, as BOLD signals encode cognitive processes that emerge from fast transient dynamics to slower, large...
- Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation : Abstract: Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentine...
- Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity : Abstract: This paper deals with stochastic optimization problems involving Markovian noise with a zero-order oracle. We present and analyze a novel derivative-free method for solving such problems in ...
- Conformal Blindness: A Note on $A$-Cryptic change-points : Abstract: Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uni...
- Neural Networks on Symmetric Spaces of Noncompact Type : Abstract: Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian m...
- NarrativeTrack: Evaluating Video Language Models Beyond the Frame : Abstract: Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains und...
- Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics : Abstract: This paper develops the algorithmic and dynamical foundations of recursive ensemble learning driven by Fibonacci-type update flows. In contrast with classical boosting Freund and Schapire (...
- Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure : Abstract: The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating collaborative threat detection mechanisms that pr...
- Evaluating transfer learning strategies for improving dairy cattle body weight prediction in small farms using depth-image and point-cloud data : Abstract: Computer vision provides automated, non-invasive, and scalable tools for monitoring dairy cattle, thereby supporting management, health assessment, and phenotypic data collection. Although t...
- Dynamic Accuracy Estimation in a Wi-Fi-based Positioning System : Abstract: The paper presents a concept of a dynamic accuracy estimation method, in which the localization errors are derived based on the measurement results used by the positioning algorithm. The con...
- Deep Clustering with Associative Memories : Abstract: Deep clustering - joint representation learning and latent space clustering - is a well studied problem especially in computer vision and text processing under the deep learning framework. W...
- Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting produces high-quality scene reconstructions but generates hundreds of thousands of spurious Gaussians (floaters) scattered throughout the environment. These artifacts o...
- Security Hardening Using FABRIC: Implementing a Unified Compliance Aggregator for Linux Servers : Abstract: This paper presents a unified framework for evaluating Linux security hardening on the FABRIC testbed through aggregation of heterogeneous security auditing tools. We deploy three Ubuntu 22....
- Deep Deterministic Nonlinear ICA via Total Correlation Minimization with Matrix-Based Entropy Functional : Abstract: Blind source separation, particularly through independent component analysis (ICA), is widely utilized across various signal processing domains for disentangling underlying components from o...
- Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition : Abstract: As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target recognition facilitates intelligent perception but ...
- Investigation into U.S. Citizen and Non-Citizen Worker Health Insurance and Employment : Abstract: Socioeconomic integration is a critical dimension of social equity, yet persistent disparities remain in access to health insurance, education, and employment across different demographic gr...
- Deep Learning Framework for RNA Inverse Folding with Geometric Structure Potentials : Abstract: RNA's diverse biological functions stem from its structural versatility, yet accurately predicting and designing RNA sequences given a 3D conformation (inverse folding) remains a challenge. ...
- Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics : Abstract: The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This...
- Deep versus Broad Technology Search and the Timing of Innovation Impact : Abstract: This study offers a new perspective on the depth-versus-breadth debate in innovation strategy, by modeling inventive search within dynamic collective knowledge systems, and underscoring the ...
- Physically-Constrained Autoencoder-Assisted Bayesian Optimization for Refinement of High-Dimensional Defect-Sensitive Single Crystalline Structure : Abstract: Physical properties and functionalities of materials are dictated by global crystal structures as well as local defects. To establish a structure-property relationship, not only the crystall...
- Autonomous battery research: Principles of heuristic operando experimentation : Abstract: Unravelling the complex processes governing battery degradation is critical to the energy transition, yet the efficacy of operando characterisation is severely constrained by a lack of Relia...
- Energy-Efficient Eimeria Parasite Detection Using a Two-Stage Spiking Neural Network Architecture : Abstract: Coccidiosis, a disease caused by the Eimeria parasite, represents a major threat to the poultry and rabbit industries, demanding rapid and accurate diagnostic tools. While deep learning mode...
- ChronoPlastic Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) offer a biologically grounded and energy-efficient alternative to conventional neural architectures; however, they struggle with long-range temporal dependenci...
- Heterogeneous Low-Bandwidth Pre-Training of LLMs : Abstract: Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when ...
- Game of Coding: Coding Theory in the Presence of Rational Adversaries, Motivated by Decentralized Machine Learning : Abstract: Coding theory plays a crucial role in enabling reliable communication, storage, and computation. Classical approaches assume a worst-case adversarial model and ensure error correction and da...
- Temporal Kolmogorov-Arnold Networks (T-KAN) for High-Frequency Limit Order Book Forecasting: Efficiency, Interpretability, and Alpha Decay : Abstract: High-Frequency trading (HFT) environments are characterised by large volumes of limit order book (LOB) data, which is notoriously noisy and non-linear. Alpha decay represents a significant c...
- Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck : Abstract: We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings. It has been shown that hidden representations learned by deep models c...
- POSEIDON: Physics-Optimized Seismic Energy Inference and Detection Operating Network : Abstract: Earthquake prediction and seismic hazard assessment remain fundamental challenges in geophysics, with existing machine learning approaches often operating as black boxes that ignore establis...
- Neuro-Channel Networks: A Multiplication-Free Architecture by Biological Signal Transmission : Abstract: The rapid proliferation of Deep Learning is increasingly constrained by its heavy reliance on high-performance hardware, particularly Graphics Processing Units (GPUs). These specialized acce...
- ELLA: Efficient Lifelong Learning for Adapters in Large Language Models : Abstract: Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited:...
- Quantized SO(3)-Equivariant Graph Neural Networks for Efficient Molecular Property Prediction : Abstract: Deploying 3D graph neural networks (GNNs) that are equivariant to 3D rotations (the group SO(3)) on edge devices is challenging due to their high computational cost. This paper addresses the...
- CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents : Abstract: The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challe...
- ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense : Abstract: Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting...
- Learning with Monotone Adversarial Corruptions : Abstract: We study the extent to which standard machine learning algorithms rely on exchangeability and independence of data by introducing a monotone adversarial corruption model. In this model, an a...
- Edge-aware GAT-based protein binding site prediction : Abstract: Accurate identification of protein binding sites is crucial for understanding biomolecular interaction mechanisms and for the rational design of drug targets. Traditional predictive methods ...
- Prototype-Based Learning for Healthcare: A Demonstration of Interpretable AI : Abstract: Despite recent advances in machine learning and explainable AI, a gap remains in personalized preventive healthcare: predictions, interventions, and recommendations should be both understand...
- Horizon Activation Mapping for Neural Networks in Time Series Forecasting : Abstract: Neural networks for time series forecasting have relied on error metrics and architecture-specific interpretability approaches for model selection that don't apply across models of different...
- A Differentiable Adversarial Framework for Task-Aware Data Subsampling : Abstract: The proliferation of large-scale datasets poses a major computational challenge to model training. The traditional data subsampling method works as a static, task independent preprocessing s...
- Explore the Ideology of Deep Learning in ENSO Forecasts : Abstract: The El Ni{~n}o-Southern Oscillation (ENSO) exerts profound influence on global climate variability, yet its prediction remains a grand challenge. Recent advances in deep learning have signif...
- Multivariate Time-series Anomaly Detection via Dynamic Model Pool & Ensembling : Abstract: Multivariate time-series (MTS) anomaly detection is critical in domains such as service monitor, IoT, and network security. While multi-model methods based on selection or ensembling outperf...
- GDRO: Group-level Reward Post-training Suitable for Diffusion Models : Abstract: Recent advancements adopt online reinforcement learning (RL) from LLMs to text-to-image rectified flow diffusion models for reward alignment. The use of group-level rewards successfully alig...
- Prior Diffusiveness and Regret in the Linear-Gaussian Bandit : Abstract: We prove that Thompson sampling exhibits $\tilde{O}(σd \sqrt{T} + d r \sqrt{\mathrm{Tr}(Σ_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(μ_0, Σ_0)$ prior distributio...
- SerpentFlow: Generative Unpaired Domain Alignment via Shared-Structure Decomposition : Abstract: Domain alignment refers broadly to learning correspondences between data distributions from distinct domains. In this work, we focus on a setting where domains share underlying structural pa...
- SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling : Abstract: We present SynRXN, a unified benchmarking framework and open-data resource for computer-aided synthesis planning (CASP). SynRXN decomposes end-to-end synthesis planning into five task famili...
- Distorted Distributional Policy Evaluation for Offline Reinforcement Learning : Abstract: While Distributional Reinforcement Learning (DRL) methods have demonstrated strong performance in online settings, its success in offline scenarios remains limited. We hypothesize that a key...
- TT-FSI: Scalable Faithful Shapley Interactions via Tensor-Train : Abstract: The Faithful Shapley Interaction (FSI) index uniquely satisfies the faithfulness axiom among Shapley interaction indices, but computing FSI requires $O(d^\ell \cdot 2^d)$ time and existing i...
- FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data : Abstract: Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitiv...
- High-Order Epistasis Detection Using Factorization Machine with Quadratic Optimization Annealing and MDR-Based Evaluation : Abstract: Detecting high-order epistasis is a fundamental challenge in genetic association studies due to the combinatorial explosion of candidate locus combinations. Although multifactor dimensionali...
- Tackling Resource-Constrained and Data-Heterogeneity in Federated Learning with Double-Weight Sparse Pack : Abstract: Federated learning has drawn widespread interest from researchers, yet the data heterogeneity across edge clients remains a key challenge, often degrading model performance. Existing methods...
- FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks : Abstract: Federated Learning (FL) enables multiple clients to collaboratively train a shared model without exposing local data. However, backdoor attacks pose a significant threat to FL. These attacks...
- RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data : Abstract: Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottl...
- Distributed Federated Learning by Alternating Periods of Training : Abstract: Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, ...
- UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk : Abstract: The ever-increasing adoption of Large Language Models in critical sectors like finance, healthcare, and government raises privacy concerns regarding the handling of sensitive Personally Iden...
- Context-Free Recognition with Transformers : Abstract: Transformers excel on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax....
- Entropy-Aligned Decoding of LMs for Better Writing and Reasoning : Abstract: Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decod...
- Enhanced Multi-model Online Conformal Prediction : Abstract: Conformal prediction is a framework for uncertainty quantification that constructs prediction sets for previously unseen data, guaranteeing coverage of the true label with a specified probab...
- DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors : Abstract: Model stealing attacks pose an existential threat to Machine Learning as a Service (MLaaS), allowing adversaries to replicate proprietary models for a fraction of their training cost. While ...
- HeurekaBench: A Benchmarking Framework for AI Co-scientist : Abstract: LLM-based reasoning models have enabled the development of agentic systems that act as co-scientists, assisting in multi-step scientific analysis. However, evaluating these systems is challe...
- Who is the Winning Algorithm? Rank Aggregation for Comparative Studies : Abstract: Consider a collection of m competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, w...
- Communication-Efficient Federated AUC Maximization with Cyclic Client Participation : Abstract: Federated AUC maximization is a powerful approach for learning from imbalanced data in federated learning (FL). However, existing methods typically assume full client availability, which is ...
- Real Time NILM Based Power Monitoring of Identical Induction Motors Representing Cutting Machines in Textile Industry : Abstract: The textile industry in Bangladesh is one of the most energy-intensive sectors, yet its monitoring practices remain largely outdated, resulting in inefficient power usage and high operationa...
- Advanced Global Wildfire Activity Modeling with Hierarchical Graph ODE : Abstract: Wildfires, as an integral component of the Earth system, are governed by a complex interplay of atmospheric, oceanic, and terrestrial processes spanning a vast range of spatiotemporal scales...
- Accelerating Decentralized Optimization via Overlapping Local Steps : Abstract: Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, ...
- SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines : Abstract: Knowledge Distillation (KD) is a central paradigm for transferring knowledge from a large teacher network to a typically smaller student model, often by leveraging soft probabilistic outputs...
- Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts : Abstract: Recently, diffusion models have achieved a great performance with a small dataset of size $n$ and a fast optimization process. However, the estimation error of diffusion models suffers from ...
- Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD : Abstract: Information-theoretic (IT) generalization bounds have been used to study the generalization of learning algorithms. These bounds are intrinsically data- and algorithm-dependent so that one c...
- Unveiling the Heart-Brain Connection: An Analysis of ECG in Cognitive Performance : Abstract: Understanding the interaction of neural and cardiac systems during cognitive activity is critical to advancing physiological computing. Although EEG has been the gold standard for assessing ...
- A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory : Abstract: We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $Ω\big(d^{1+\frac{1}{2^{k-2}-1...
- Causal discovery for linear causal model with correlated noise: an Adversarial Learning Approach : Abstract: Causal discovery from data with unmeasured confounding factors is a challenging problem. This paper proposes an approach based on the f-GAN framework, learning the binary causal structure in...
- Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows : Abstract: The rapid evolution of large language models (LLMs) is transforming artificial intelligence into autonomous research partners, yet a critical gap persists in complex scientific domains such ...
- Spectral-Window Hybrid (SWH) : Abstract: Scaling sequence modeling to extreme contexts requires balancing computational efficiency with representational expressivity. While Transformers provide precise retrieval via the attention m...
- Towards a Principled Muon under $\mu\mathsf{P}$: Ensuring Spectral Conditions throughout Training : Abstract: The $μ$-parameterization ($μ$P) provides a principled foundation for large language model (LLM) training by prescribing width-independent learning dynamics, which in turn enables predictable...
- Sobolev Approximation of Deep ReLU Network in Log-weighted Barron Space : Abstract: Universal approximation theorems show that neural networks can approximate any continuous function; however, the number of parameters may grow exponentially with the ambient dimension, so th...
- The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification : Abstract: In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do n...
- Accelerated Full Waveform Inversion by Deep Compressed Learning : Abstract: We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as computational cost mitigation approach. Given modern seismic acquisition systems, the dat...
- The Dependency Divide: An Interpretable Machine Learning Framework for Profiling Student Digital Satisfaction in the Bangladesh Context : Abstract: Background: While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal ...
- Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data : Abstract: Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to j...
- Sparse Bayesian Message Passing under Structural Uncertainty : Abstract: Semi-supervised learning on real-world graphs is frequently challenged by heterophily, where the observed graph is unreliable or label-disassortative. Many existing graph neural networks eit...
- Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification : Abstract: Time series classification is a fundamental machine learning task with broad real-world applications. Although many deep learning methods have proven effective in learning time-series data f...
- Self-Training the Neurochaos Learning Algorithm : Abstract: In numerous practical applications, acquiring substantial quantities of labelled data is challenging and expensive, but unlabelled data is readily accessible. Conventional supervised learnin...
- Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings : Abstract: Early detection of chronic kidney disease (CKD) is essential for preventing progression to end-stage renal disease. However, existing screening tools - primarily developed using populations ...
- Central Dogma Transformer: Towards Mechanism-Oriented AI for Cellular Understanding : Abstract: Understanding cellular mechanisms requires integrating information across DNA, RNA, and protein - the three molecular systems linked by the Central Dogma of molecular biology. While domain-s...
- Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces : Abstract: Quality diversity (QD) optimization searches for a collection of solutions that optimize an objective while attaining diverse outputs of a user-specified, vector-valued measure function. Con...
- Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs : Abstract: Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strat...
- Tiny Machine Learning for Real-Time Aquaculture Monitoring: A Case Study in Morocco : Abstract: Aquaculture, the farming of aquatic organisms, is a rapidly growing industry facing challenges such as water quality fluctuations, disease outbreaks, and inefficient feed management. Traditi...
- Coarse-Grained Kullback--Leibler Control of Diffusion-Based Generative AI : Abstract: Diffusion models and score-based generative models provide a powerful framework for synthesizing high-quality images from noise. However, there is still no satisfactory theory that describes...
- Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning : Abstract: This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (...
- Expanding the Chaos: Neural Operator for Stochastic (Partial) Differential Equations : Abstract: Stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) are fundamental tools for modeling stochastic dynamics across the natural sciences and modern m...
- Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations : Abstract: Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Dif...
- Zero-shot Forecasting by Simulation Alone : Abstract: Zero-shot time-series forecasting holds great promise, but is still in its infancy, hindered by limited and biased data corpora, leakage-prone evaluation, and privacy and licensing constrain...
- Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks : Abstract: The growing reliance on deep learning models in safety-critical domains such as healthcare and autonomous navigation underscores the need for defenses that are both robust to adversarial per...
- Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures : Abstract: The increasing prevalence of sparse Mixture-of-Experts (MoE) architectures in large language models raises important questions regarding their reliability under stochastic decoding. While co...
- Enhanced Data-Driven Product Development via Gradient Based Optimization and Conformalized Monte Carlo Dropout Uncertainty Estimation : Abstract: Data-Driven Product Development (DDPD) leverages data to learn the relationship between product design specifications and resulting properties. To discover improved designs, we train a neura...
- Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles : Abstract: Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate varia...
- Dichotomous Diffusion Policy Optimization : Abstract: Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. Ho...
- When to Ponder: Adaptive Compute Allocation for Code Generation via Test-Time Training : Abstract: Large language models apply uniform computation to all inputs, regardless of difficulty. We propose PonderTTT, a gating strategy using the TTT layer's self-supervised reconstruction loss to ...
- Hierarchical topological clustering : Abstract: Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be...
- FANoS: Friction-Adaptive Nos\'e--Hoover Symplectic Momentum for Stiff Objectives : Abstract: We study a physics-inspired optimizer, \emph{FANoS} (Friction-Adaptive Nosé--Hoover Symplectic momentum), which combines (i) a momentum update written as a discretized second-order dynamical...
- Outlier Detection Using Vector Cosine Similarity by Adding a Dimension : Abstract: We propose a new outlier detection method for multi-dimensional data. The method detects outliers based on vector cosine similarity, using a new dataset constructed by adding a dimension wit...
- Quantum Machine Learning Approaches for Coordinated Stealth Attack Detection in Distributed Generation Systems : Abstract: Coordinated stealth attacks are a serious cybersecurity threat to distributed generation systems because they modify control and measurement signals while remaining close to normal behavior,...
- Distribution Matching for Graph Quantification Under Structural Covariate Shift : Abstract: Graphs are commonly used in machine learning to model relationships between instances. Consider the task of predicting the political preferences of users in a social network; to solve this t...
- Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery : Abstract: We introduce materiomusic as a generative framework linking the hierarchical structures of matter with the compositional logic of music. Across proteins, spider webs and flame dynamics, vibr...
- Universal Battery Degradation Forecasting Driven by Foundation Model Across Diverse Chemistries and Conditions : Abstract: Accurate forecasting of battery capacity fade is essential for the safety, reliability, and long-term efficiency of energy storage systems. However, the strong heterogeneity across cell chem...
- Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks : Abstract: Geospatial foundation models (GFMs) have emerged as a promising approach to overcoming the limitations in existing featurization methods. More recently, Google DeepMind has introduced AlphaE...
- EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference : Abstract: Hallucinations hinder reliable question answering, especially in resource-constrained deployments where frontier-scale models or retrieval pipelines may be impractical. We present EdgeJury, ...
- You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference : Abstract: Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determin...
- SLO-Conditioned Action Routing for Retrieval-Augmented Generation: Objective Ablation and Failure Modes : Abstract: Retrieval-augmented generation (RAG) introduces a practical control problem: retrieval depth and generation behavior must be chosen per query to satisfy service-level objectives (SLOs) such ...
- ShrimpXNet: A Transfer Learning Framework for Shrimp Disease Classification with Augmented Regularization, Adversarial Training, and Explainable AI : Abstract: Shrimp is one of the most widely consumed aquatic species globally, valued for both its nutritional content and economic importance. Shrimp farming represents a significant source of income ...
- Horizon Reduction as Information Loss in Offline Reinforcement Learning : Abstract: Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through...
- Dynamical Mechanisms for Coordinating Long-term Working Memory Based on the Precision of Spike-timing in Cortical Neurons : Abstract: In the last century, most sensorimotor studies of cortical neurons relied on average firing rates. Rate coding is efficient for fast sensorimotor processing that occurs within a few seconds....
- InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference : Abstract: Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can re...
- Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing : Abstract: This paper presents an expanded account of the Holistic Cognitive Development (HCD) framework for reflective and creative learning in computing education. The HCD framework integrates design...
- Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation : Abstract: In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-...
- Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research : Abstract: Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical c...
- Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs : Abstract: Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG...
- Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit : Abstract: Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of t...
- Beyond Expectations: Learning with Stochastic Dominance Made Practical : Abstract: Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally capture...
- HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization : Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to ...
- GRACE: Discriminator-Guided Chain-of-Thought Reasoning : Abstract: In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optim...
- Mem-Rec: Memory Efficient Recommendation System using Alternative Representation : Abstract: Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems ...
- On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective : Abstract: Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of...
- UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models : Abstract: Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings, yet current supervised fine-tuning methods only learn surface teaching patterns...
- Uncertainty Quantification of Surrogate Models using Conformal Prediction : Abstract: Data-driven surrogate models offer quick approximations to complex numerical and experimental systems but typically lack uncertainty quantification, limiting their reliability in safety-crit...
- Geometry-induced Regularization in Deep ReLU Neural Networks : Abstract: Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include prop...
- On the Representation of Pairwise Causal Background Knowledge and Its Applications in Causal Inference : Abstract: Pairwise causal background knowledge about the existence or absence of causal edges and paths is frequently encountered in observational studies. Such constraints allow the shared directed a...
- DARC: Drum accompaniment generation with fine-grained rhythm control : Abstract: In music creation, rapid prototyping is essential for exploring and refining ideas, yet existing generative tools often fall short when users require both structural control and stylistic fl...
- DatBench: Discriminative, Faithful, and Efficient VLM Evaluations : Abstract: Empirical evaluation serves as the primary compass guiding research progress in foundation models. Despite a large body of work focused on training frontier vision-language models (VLMs), ap...
- Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies : Abstract: Training large language models requires distributing computation across many accelerators, yet practitioners select parallelism strategies (data, tensor, pipeline, ZeRO) through trial and er...
- pdfQA: Diverse, Challenging, and Realistic Question Answering over PDFs : Abstract: PDFs are the second-most used document type on the internet (after HTML). Yet, existing QA datasets commonly start from text sources or only address specific domains. In this paper, we prese...
- TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation : Abstract: Foundation segmentation models such as the Segment Anything Model (SAM) exhibit strong zero-shot generalization through large-scale pretraining, but adapting them to domain-specific semantic...
- A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets : Abstract: Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners of...
- VIBE: Visual Instruction Based Editor : Abstract: Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alon...
- LLM-Empowered Functional Safety and Security by Design in Automotive Systems : Abstract: This paper presents LLM-empowered workflow to support Software Defined Vehicle (SDV) software development, covering the aspects of security-aware system topology design, as well as event-dri...
- Seeing the Unseen: Zooming in the Dark with Event Cameras : Abstract: This paper addresses low-light video super-resolution (LVSR), aiming to restore high-resolution videos from low-light, low-resolution (LR) inputs. Existing LVSR methods often struggle to rec...
- NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation : Abstract: We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a un...
- Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics : Abstract: We are entering a hybrid era in which human developers and AI coding agents work in the same codebases. While industry practice has long optimized code for human comprehension, it is increas...
- FormationEval, an open multiple-choice benchmark for petroleum geoscience : Abstract: This paper presents FormationEval, an open multiple-choice question benchmark for evaluating language models on petroleum geoscience and subsurface disciplines. The dataset contains 505 ques...
- Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting : Abstract: Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning...
- AI-enhanced tuning of quantum dot Hamiltonians toward Majorana modes : Abstract: We propose a neural network-based model capable of learning the broad landscape of working regimes in quantum dot simulators, and using this knowledge to autotune these devices - based on tr...
- BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models : Abstract: Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across visual and textual modalities. Existing debia...
- Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts : Abstract: Mixture-of-Experts (MoE) architectures scale large language models efficiently by employing a parametric "router" to dispatch tokens to a sparse subset of experts. Typically, this router is ...
- Remote Sensing Change Detection via Weak Temporal Supervision : Abstract: Semantic change detection in remote sensing aims to identify land cover changes between bi-temporal image pairs. Progress in this area has been limited by the scarcity of annotated datasets,...
- SingingBot: An Avatar-Driven System for Robotic Face Singing Performance : Abstract: Equipping robotic faces with singing capabilities is crucial for empathetic Human-Robot Interaction. However, existing robotic face driving research primarily focuses on conversations or mim...
- DeCode: Decoupling Content and Delivery for Medical QA : Abstract: Large language models (LLMs) exhibit strong medical knowledge and can generate factually accurate responses. However, existing models often fail to account for individual patient contexts, p...
- Inferring Network Evolutionary History via Structure-State Coupled Learning : Abstract: Inferring a network's evolutionary history from a single final snapshot with limited temporal annotations is fundamental yet challenging. Existing approaches predominantly rely on topology a...
- LION-DG: Layer-Informed Initialization with Deep Gradient Protocols for Accelerated Neural Network Training : Abstract: Weight initialization remains decisive for neural network optimization, yet existing methods are largely layer-agnostic. We study initialization for deeply-supervised architectures with auxi...
- Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots : Abstract: Strawberry harvesting robots faced persistent challenges such as low integration of visual perception, fruit-gripper misalignment, empty grasping, and strawberry slippage from the gripper du...
- The Homogeneity Trap: Spectral Collapse in Doubly-Stochastic Deep Networks : Abstract: Doubly-stochastic matrices (DSM) are increasingly utilized in structure-preserving deep architectures -- such as Optimal Transport layers and Sinkhorn-based attention -- to enforce numerical...
- Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows : Abstract: Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache com...
- Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory : Abstract: Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in Engl...
- Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming : Abstract: Functional programming provides strong foundations for developing reliable and secure software systems, yet its adoption remains not widespread due to the steep learning curve. Recent advanc...
- Agentic Retoucher for Text-To-Image Generation : Abstract: Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refine...
- The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers : Abstract: This survey has provided a systematic overview of the emerging field of LLM-enabled compilation by addressing several key research questions. We first answered how LLMs are being integrated ...
- Output Embedding Centering for Stable LLM Pretraining : Abstract: Pretraining of large language models is not only expensive but also prone to certain training instabilities. A specific instability that often occurs for large learning rates at the end of t...
- Not All Needles Are Found: How Fact Distribution and Don't Make It Up Prompts Shape Literal Extraction, Logical Inference, and Hallucination Risks in Long-Context LLMs : Abstract: Large language models (LLMs) increasingly support very long input contexts. Yet it remains unclear how reliably they extract and infer information at scale. Performance varies with context l...
- Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach : Abstract: This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine-grained, descriptive information available during tra...
- Surprisal and Metaphor Novelty: Moderate Correlations and Divergent Scaling Effects : Abstract: Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether ...
- A neural network for modeling human concept formation, understanding and communication : Abstract: A remarkable capability of the human brain is to form more abstract conceptual representations from sensorimotor experiences and flexibly apply them independent of direct sensory inputs. How...
- Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation capabilities. However, they are trained on...
- Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations : Abstract: ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains. Given these successes, the Recommender Systems (RSs) community has begun investigating its applica...
- VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis : Abstract: Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and...
- Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior : Abstract: Instruction tuning increasingly relies on LLM-based prompt refinement, where prompts in the training corpus are selectively rewritten by an external refiner to improve clarity and instructio...
- The Invisible Hand of AI Libraries Shaping Open Source Projects and Communities : Abstract: In the early 1980s, Open Source Software emerged as a revolutionary concept amidst the dominance of proprietary software. What began as a revolutionary idea has now become the cornerstone of...
- Visualizing the Structure of Lenia Parameter Space : Abstract: Continuous cellular automata are rocketing in popularity, yet developing a theoretical understanding of their behaviour remains a challenge. In the case of Lenia, a few fundamental open prob...
- D\'ej\`aQ: Open-Ended Evolution of Diverse, Learnable and Verifiable Problems : Abstract: Recent advances in reasoning models have yielded impressive results in mathematics and coding. However, most approaches rely on static datasets, which have been suggested to encourage memori...
- MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search : Abstract: Graph-based Approximate Nearest Neighbor (ANN) search often suffers from performance degradation in high-dimensional spaces due to the ``Euclidean-Geodesic mismatch,'' where greedy routing d...
- Theoretical Convergence of SMOTE-Generated Samples : Abstract: Imbalanced data affects a wide range of machine learning applications, from healthcare to network security. As SMOTE is one of the most popular approaches to addressing this issue, it is imp...
- A Defect is Being Born: How Close Are We? A Time Sensitive Forecasting Approach : Abstract: Background. Defect prediction has been a highly active topic among researchers in the Empirical Software Engineering field. Previous literature has successfully achieved the most accurate pr...
- Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection : Abstract: Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic...
- Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning : Abstract: Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily availabl...
- Tackling the Inherent Difficulty of Noise Filtering in RAG : Abstract: Retrieval-Augmented Generation (RAG) has become a widely adopted approach to enhance Large Language Models (LLMs) by incorporating external knowledge and reducing hallucinations. However, no...
- Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance : Abstract: Fine-tuning safety-aligned large language models (LLMs) can substantially compromise their safety. Previous approaches require many safety samples or calibration sets, which not only incur s...
- CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving : Abstract: Despite significant progress, multimodal large language models continue to struggle with visual mathematical problem solving. Some recent works recognize that visual perception is a bottlene...
- MORE: Multi-Objective Adversarial Attacks on Speech Recognition : Abstract: The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against...
- The Machine Learning Canvas: Empirical Findings on Why Strategy Matters More Than AI Code Generation : Abstract: Despite the growing popularity of AI coding assistants, over 80% of machine learning (ML) projects fail to deliver real business value. This study creates and tests a Machine Learning Canvas...
- RSwinV2-MD: An Enhanced Residual SwinV2 Transformer for Monkeypox Detection from Skin Images : Abstract: In this paper, a deep learning approach for Mpox diagnosis named Customized Residual SwinTransformerV2 (RSwinV2) has been proposed, trying to enhance the capability of lesion classification ...
- Yukthi Opus: A Multi-Chain Hybrid Metaheuristic for Large-Scale NP-Hard Optimization : Abstract: We present Yukthi Opus (YO), a multi-chain hybrid metaheuristic designed for NP-hard optimization under explicit evaluation budget constraints. YO integrates three complementary mechanisms i...
- ARIES: A Scalable Multi-Agent Orchestration Framework for Real-Time Epidemiological Surveillance and Outbreak Monitoring : Abstract: Global health surveillance is currently facing a challenge of Knowledge Gaps. While general-purpose AI has proliferated, it remains fundamentally unsuited for the high-stakes epidemiological...
- Emergent Introspective Awareness in Large Language Models : Abstract: We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be d...
- Adaptive Hybrid Optimizer based Framework for Lumpy Skin Disease Identification : Abstract: Lumpy Skin Disease (LSD) is a contagious viral infection that significantly deteriorates livestock health, thereby posing a serious threat to the global economy and food security. Owing to i...
- Moments Matter:Stabilizing Policy Optimization using Return Distributions : Abstract: Deep Reinforcement Learning (RL) agents often learn policies that achieve the same episodic return yet behave very differently, due to a combination of environmental (random transitions, ini...
- Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving : Abstract: Reinforcement learning (RL) has shown considerable potential in autonomous driving (AD), yet its vulnerability to perturbations remains a critical barrier to real-world deployment. As a prim...
- VerLM: Explaining Face Verification Using Natural Language : Abstract: Face verification systems have seen substantial advancements; however, they often lack transparency in their decision-making processes. In this paper, we introduce an innovative Vision-Langu...
- HyperCLOVA X 8B Omni : Abstract: In this report, we present HyperCLOVA X 8B Omni, the first any-to-any omnimodal model in the HyperCLOVA X family that supports text, audio, and vision as both inputs and outputs. By consolid...
- Subimage Overlap Prediction: Task-Aligned Self-Supervised Pretraining For Semantic Segmentation In Remote Sensing Imagery : Abstract: Self-supervised learning (SSL) methods have become a dominant paradigm for creating general purpose models whose capabilities can be transferred to downstream supervised learning tasks. Howe...
- LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment : Abstract: Issue assignment is a critical process in software maintenance, where new issue reports are validated and assigned to suitable developers. However, manual issue assignment is often inconsist...
- MergeRec: Model Merging for Data-Isolated Cross-Domain Sequential Recommendation : Abstract: Modern recommender systems trained on domain-specific data often struggle to generalize across multiple domains. Cross-domain sequential recommendation has emerged as a promising research di...
- Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis : Abstract: Large Language Models (LLMs) have been used as relevance assessors for Information Retrieval (IR) evaluation collection creation due to reduced cost and increased scalability as compared to ...
- Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization : Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jai...
- Multi-granularity Interactive Attention Framework for Residual Hierarchical Pronunciation Assessment : Abstract: Automatic pronunciation assessment plays a crucial role in computer-assisted pronunciation training systems. Due to the ability to perform multiple pronunciation tasks simultaneously, multi-...
- K-EXAONE Technical Report : Abstract: This technical report presents K-EXAONE, a large-scale multilingual language model developed by LG AI Research. K-EXAONE is built on a Mixture-of-Experts architecture with 236B total paramet...
- RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference : Abstract: Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Ge...
- Explicit World Models for Reliable Human-Robot Collaboration : Abstract: This paper addresses the topic of robustness under sensing noise, ambiguous instructions, and human-robot interaction. We take a radically different tack to the issue of reliable embodied AI...
- Beyond Homophily: Community Search on Heterophilic Graphs : Abstract: Community search aims to identify a refined set of nodes that are most relevant to a given query, supporting tasks ranging from fraud detection to recommendation. Unlike homophilic graphs, m...
- Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT : Abstract: Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven d...
- FALCON: Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation : Abstract: Precise delineation of anatomical and pathological structures within 3D medical volumes is crucial for accurate diagnosis, effective surgical planning, and longitudinal disease monitoring. D...
- Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage : Abstract: As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces...
- Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks : Abstract: Private macOS frameworks underpin critical services and daemons but remain undocumented and distributed only as stripped binaries, complicating security analysis. We present MOTIF, an agenti...
- EHRSummarizer: A Privacy-Aware, FHIR-Native Architecture for Structured Clinical Summarization of Electronic Health Records : Abstract: Clinicians routinely navigate fragmented electronic health record (EHR) interfaces to assemble a coherent picture of a patient's problems, medications, recent encounters, and longitudinal tr...
- Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives : Abstract: Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based sol...
- Length-Aware Adversarial Training for Variable-Length Trajectories: Digital Twins for Mall Shopper Paths : Abstract: We study generative modeling of \emph{variable-length trajectories} -- sequences of visited locations/items with associated timestamps -- for downstream simulation and counterfactual analysi...
- UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction : Abstract: Accurate crop yield prediction relies on diverse data streams, including satellite, meteorological, soil, and topographic information. However, despite rapid advances in machine learning, ex...
- Learning Resilient Elections with Adversarial GNNs : Abstract: In the face of adverse motives, it is indispensable to achieve a consensus. Elections have been the canonical way by which modern democracy has operated since the 17th century. Nowadays, the...
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models : Abstract: As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before clinical use. However, existing safety b...
- REE-TTT: Highly Adaptive Radar Echo Extrapolation Based on Test-Time Training : Abstract: Precipitation nowcasting is critically important for meteorological forecasting. Deep learning-based Radar Echo Extrapolation (REE) has become a predominant nowcasting approach, yet it suffe...
- From Theory of Mind to Theory of Environment: Counterfactual Simulation of Latent Environmental Dynamics : Abstract: The vertebrate motor system employs dimensionality-reducing strategies to limit the complexity of movement coordination, for efficient motor control. But when environments are dense with hid...
- CONSENT: A Negotiation Framework for Leveraging User Flexibility in Vehicle-to-Building Charging under Uncertainty : Abstract: The growth of Electric Vehicles (EVs) creates a conflict in vehicle-to-building (V2B) settings between building operators, who face high energy costs from uncoordinated charging, and drivers...
- The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs : Abstract: Self-reflection capabilities emerge in Large Language Models after RL post-training, with multi-turn RL achieving substantial gains over SFT counterparts. Yet the mechanism of how a unified ...
- HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller : Abstract: Current attempts of Reinforcement Learning for Autonomous Controller are data-demanding while the results are under-performed, unstable, and unable to grasp and anchor on the concept of safe...
- OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment : Abstract: Evaluating novelty is critical yet challenging in peer review, as reviewers must assess submissions against a vast, rapidly evolving literature. This report presents OpenNovelty, an LLM-powe...
- MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning : Abstract: Joint audio-video generation aims to synthesize synchronized multisensory content, yet current unified models struggle with fine-grained acoustic control, particularly for identity-preservin...
- Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings : Abstract: Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe...
- MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization : Abstract: Speaker-Attributed, Time-Stamped Transcription (SATS) aims to transcribe what is said and to precisely determine the timing of each speaker, which is particularly valuable for meeting transc...
- EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding : Abstract: The ability to reason about spatial dynamics is a cornerstone of intelligence, yet current research overlooks the human intent behind spatial changes. To address these limitations, we introd...
- Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM : Abstract: Current advancements in Natural Language Processing (NLP) have largely favored resource-rich languages, leaving a significant gap in high-quality datasets for low-resource languages like Hin...
- DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving : Abstract: Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal ev...
- FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation : Abstract: Vision-Language Models (VLMs) excel at visual reasoning but still struggle with integrating external knowledge. Retrieval-Augmented Generation (RAG) is a promising solution, but current meth...
- The Optimal Sample Complexity of Linear Contracts : Abstract: In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn from an unknown distribution and the principal's goal ...
- Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints : Abstract: With the widespread adoption of large language models (LLMs), hallucinations, which are non-factual fabrications in model outputs, have become serious concerns. Reasoning capabilities have r...
- DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion : Abstract: Diffusion inversion is a task of recovering the noise of an image in a diffusion model, which is vital for controllable diffusion image editing. At present, diffusion inversion still remains...
- Accelerating Storage-Based Training for Graph Neural Networks : Abstract: Graph neural networks (GNNs) have achieved breakthroughs in various real-world downstream tasks due to their powerful expressiveness. As the scale of real-world graphs has been continuously ...
- Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration : Abstract: In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in "Fuse-then-Refine" paradigms: the "Plasticity-Stability Dilemma." In ad...
- Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models : Abstract: Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations, but existing methods rely on one-step gra...
- Online Estimation and Manipulation of Articulated Objects : Abstract: From refrigerators to kitchen drawers, humans interact with articulated objects effortlessly every day while completing household chores. For automating these tasks, service robots must be c...
- Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems : Abstract: Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evalu...
- SwinIFS: Landmark Guided Swin Transformer For Identity Preserving Face Super Resolution : Abstract: Face super-resolution aims to recover high-quality facial images from severely degraded low-resolution inputs, but remains challenging due to the loss of fine structural details and identity...
- A Graph-based Framework for Online Time Series Anomaly Detection Using Model Ensemble : Abstract: With the increasing volume of streaming data in industrial systems, online anomaly detection has become a critical task. The diverse and rapidly evolving data patterns pose significant chall...
- Scale-Adaptive Power Flow Analysis with Local Topology Slicing and Multi-Task Graph Learning : Abstract: Developing deep learning models with strong adaptability to topological variations is of great practical significance for power flow analysis. To enhance model performance under variable sys...
- ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking : Abstract: Parking is a critical task for autonomous driving systems (ADS), with unique challenges in crowded parking slots and GPS-denied environments. However, existing works focus on 2D parking slot...
- Data Complexity-aware Deep Model Performance Forecasting : Abstract: Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repet...
- UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models : Abstract: The development of audio foundation models has accelerated rapidly since the emergence of GPT-4o. However, the lack of comprehensive evaluation has become a critical bottleneck for further p...
- Slot-ID: Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding : Abstract: Producing prompt-faithful videos that preserve a user-specified identity remains challenging: models need to extrapolate facial dynamics from sparse reference while balancing the tension bet...
- From Classification to Generation: An Open-Ended Paradigm for Adverse Drug Reaction Prediction Based on Graph-Motif Feature Fusion : Abstract: Computational biology offers immense potential for reducing the high costs and protracted cycles of new drug development through adverse drug reaction (ADR) prediction. However, current meth...
- LinMU: Multimodal Understanding Made Linear : Abstract: Modern Vision-Language Models (VLMs) achieve impressive performance but are limited by the quadratic complexity of self-attention, which prevents their deployment on edge devices and makes t...
- Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python : Abstract: Large Language Models have become integral to software development, yet they frequently generate vulnerable code. Existing code vulnerability detection benchmarks employ binary classificatio...
- Quantifying Local Strain Field and Deformation in Active Contraction of Bladder Using a Pretrained Transformer Model: A Speckle-Free Approach : Abstract: Accurate quantification of local strain fields during bladder contraction is essential for understanding the biomechanics of bladder micturition, in both health and disease. Conventional dig...
- T3C: Test-Time Tensor Compression with Consistency Guarantees : Abstract: We present T3C, a train-once, test-time budget-conditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization ...
- Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware : Abstract: Current multi-agent Large Language Model (LLM) frameworks suffer from linear memory scaling, rendering "System 2" parallel reasoning impractical on consumer hardware. We present Warp Cortex,...
- ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System : Abstract: Detecting distributional drift in high-dimensional data streams presents fundamental challenges: global comparison methods scale poorly, projection-based approaches lose geometric structure,...
- Aggressive Compression Enables LLM Weight Theft : Abstract: As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltra...
- Diffusion Timbre Transfer Via Mutual Information Guided Inpainting : Abstract: We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires n...
- PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS : Abstract: Reinforcement learning from pixels is often bottlenecked by the performance and complexity of 3D rendered environments. Researchers face a trade-off between high-speed, low-level engines and...
- AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures : Abstract: The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision ...
- Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory : Abstract: Graph structures are increasingly used in dialog memory systems, but empirical findings on their effectiveness remain inconsistent, making it unclear which design choices truly matter. We pr...
- LLM Collusion : Abstract: We study how delegating pricing to large language models (LLMs) can facilitate collusion in a duopoly when both sellers rely on the same pre-trained model. The LLM is characterized by (i) a ...
- From Policy to Logic for Efficient and Interpretable Coverage Assessment : Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in interpreting lengthy, complex legal and policy language. However, their reliability can be undermined by hallucinations ...
- MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance : Abstract: The deployment of large language models (LLMs) in real-world clinical applications is constrained by the fundamental trade-off between computational cost and the efficiency of linear-time mo...
- Seamlessly Natural: Image Stitching with Natural Appearance Preservation : Abstract: This paper introduces SENA (SEamlessly NAtural), a geometry-driven image stitching approach that prioritizes structural fidelity in challenging real-world scenes characterized by parallax an...
- Benchmarking the Computational and Representational Efficiency of State Space Models against Transformers on Long-Context Dyadic Sessions : Abstract: State Space Models (SSMs) have emerged as a promising alternative to Transformers for long-context sequence modeling, offering linear $O(N)$ computational complexity compared to the Transfor...
- Stylometry Analysis of Human and Machine Text for Academic Integrity : Abstract: This work addresses critical challenges to academic integrity, including plagiarism, fabrication, and verification of authorship of educational content, by proposing a Natural Language Proce...
- Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment : Abstract: Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots ...
- Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code : Abstract: Large language models (LLMs) can generate programs that pass unit tests, but passing tests does not guarantee reliable runtime behavior. We find that different correct solutions to the same ...
- MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches : Abstract: Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distort...
- RefSR-Adv: Adversarial Attack on Reference-based Image Super-Resolution Models : Abstract: Single Image Super-Resolution (SISR) aims to recover high-resolution images from low-resolution inputs. Unlike SISR, Reference-based Super-Resolution (RefSR) leverages an additional high-res...
- Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models : Abstract: Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categori...
- AI-Powered Hybrid Intrusion Detection Framework for Cloud Security Using Novel Metaheuristic Optimization : Abstract: Cybersecurity poses considerable problems to Cloud Computing (CC), especially regarding Intrusion Detection Systems (IDSs), facing difficulties with skewed datasets and suboptimal classifica...
- Generating Diverse TSP Tours via a Combination of Graph Pointer Network and Dispersion : Abstract: We address the Diverse Traveling Salesman Problem (D-TSP), a bi-criteria optimization challenge that seeks a set of $k$ distinct TSP tours. The objective requires every selected tour to have...
- RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian : Abstract: Large Language Models (LLMs)-powered code review automation has the potential to transform code review workflows. Despite the advances of LLM-powered code review comment generation approache...
- Wittgenstein's Family Resemblance Clustering Algorithm : Abstract: This paper, introducing a novel method in philomatics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. ...
- Learning from Historical Activations in Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains such as social networks, molecular chemistry, and more. A crucial component of GNNs is the pooling proced...
- ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services : Abstract: The rapid growth of AI for Science (AI4S) has underscored the significance of scientific datasets, leading to the establishment of numerous national scientific data centers and sharing platf...
- Evolving CNN Architectures: From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks : Abstract: This paper presents a comparative study of a custom convolutional neural network (CNN) architecture against widely used pretrained and transfer learning CNN models across five real-world ima...
- SoulSeek: Exploring the Use of Social Cues in LLM-based Information Seeking : Abstract: Social cues, which convey others' presence, behaviors, or identities, play a crucial role in human information seeking by helping individuals judge relevance and trustworthiness. However, ex...
- ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining : Abstract: Large Language Models (LLMs) demonstrate remarkable fluency across high-resource languages yet consistently fail to generate coherent text in Kashmiri, a language spoken by approximately sev...
- Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai : Abstract: Large Language Models (LLMs) are increasingly embedded in autonomous agents that participate in online social ecosystems, where interactions are sequential, cumulative, and only partially co...
- Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models : Abstract: In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified watermarking method for general vision generative models. Our approach is built upon a novel wate...
- Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees : Abstract: We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting ...
- Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments : Abstract: Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external obj...
- Gendered Pathways in AI Companionship: Cross-Community Behavior and Toxicity Patterns on Reddit : Abstract: AI-companionship platforms are rapidly reshaping how people form emotional, romantic, and parasocial bonds with non-human agents, raising new questions about how these relationships intersec...
- SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models : Abstract: Vision-Language Models (VLMs) have achieved remarkable success in descriptive tasks such as image captioning and visual question answering (VQA). However, their ability to generate engaging,...
- A UCB Bandit Algorithm for General ML-Based Estimators : Abstract: We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying s...
- Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance : Abstract: The era of digital pathology has advanced histopathological examinations, making automated image analysis essential in clinical practice. This study evaluates the classification performance ...
- EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos : Abstract: We propose EgoGrasp, the first method to reconstruct world-space hand-object interactions (W-HOI) from egocentric monocular videos with dynamic cameras in the wild. Accurate W-HOI reconstruc...
- Multi-Dimensional Prompt Chaining to Improve Open-Domain Dialogue Generation : Abstract: Small language models (SLMs) offer significant deployment advantages but often struggle to match the dialogue quality of larger models in open-domain settings. In this paper, we propose a mu...
- Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights : Abstract: This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending....
- A Platform for Interactive AI Character Experiences : Abstract: From movie characters to modern science fiction - bringing characters into interactive, story-driven conversations has captured imaginations across generations. Achieving this vision is high...
- Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation : Abstract: We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image ana...
- ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval : Abstract: Vision Language Models (VLMs) have rapidly advanced and show strong promise for text-based person search (TBPS), a task that requires capturing fine-grained relationships between images and ...
- Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking : Abstract: Existing RGB-Event visual object tracking approaches primarily rely on conventional feature-level fusion, failing to fully exploit the unique advantages of event cameras. In particular, the ...
- Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study : Abstract: In this study, we focus on the training process and inference improvements of deep neural networks (DNNs), specifically Autoencoders (AEs) and Variational Autoencoders (VAEs), using Random F...
- Geometric and Dynamic Scaling in Deep Transformers : Abstract: Despite their empirical success, pushing Transformer architectures to extreme depth often leads to a paradoxical failure: representations become increasingly redundant, lose rank, and ultima...
- Intention Collapse: Intention-Level Metrics for Reasoning in Language Models : Abstract: Every act of language generation compresses a rich internal state into a single token sequence. We call this process intention collapse: a many-to-one projection from a high dimensional inte...
- Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms : Abstract: This paper employs a data-driven approach to determine the impact of concrete mixture compositions on the temporal evolution of chloride in concrete structures. This is critical for assessin...
- An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions : Abstract: Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonan...
- Scale-aware Adaptive Supervised Network with Limited Medical Annotations : Abstract: Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annota...
- VEAT Quantifies Implicit Associations in Text-to-Video Generator Sora and Reveals Challenges in Bias Mitigation : Abstract: Text-to-Video (T2V) generators such as Sora raise concerns about whether generated content reflects societal bias. We extend embedding-association tests from words and images to video by int...
- WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift : Abstract: Wildlife monitoring is crucial for studying biodiversity loss and climate change. Camera trap images provide a non-intrusive method for analyzing animal populations and identifying ecologica...
- Value Vision-Language-Action Planning & Search : Abstract: Vision-Language-Action (VLA) models have emerged as powerful generalist policies for robotic manipulation, yet they remain fundamentally limited by their reliance on behavior cloning, leadin...
- Adapting Feature Attenuation to NLP : Abstract: Transformer classifiers such as BERT deliver impressive closed-set accuracy, yet they remain brittle when confronted with inputs from unseen categories--a common scenario for deployed NLP sy...
- Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra : Abstract: Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generatin...
- Emoji-Based Jailbreaking of Large Language Models : Abstract: Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt engineering. This study investigates emo...
- Improving Code-Switching Speech Recognition with TTS Data Augmentation : Abstract: Automatic speech recognition (ASR) for conversational code-switching speech remains challenging due to the scarcity of realistic, high-quality labeled speech data. This paper explores multil...
- LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection : Abstract: We study the problem of influence maximization (IM) in an online setting, where the goal is to select a subset of nodes$\unicode{x2014}$called the seed set$\unicode{x2014}$at each time step ...
- AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation : Abstract: Evaluating recommender systems remains challenging due to the gap between offline metrics and real user behavior, as well as the scarcity of interaction data. Recent work explores large lang...
- Analyzing the Shopping Journey: Computing Shelf Browsing Visits in a Physical Retail Store : Abstract: Motivated by recent challenges in the deployment of robots into customer-facing roles within retail, this work introduces a study of customer activity in physical stores as a step toward aut...
- Measuring Social Media Polarization Using Large Language Models and Heuristic Rules : Abstract: Understanding affective polarization in online discourse is crucial for evaluating the societal impact of social media interactions. This study presents a novel framework that leverages larg...
- MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers : Abstract: Modern enterprise retrieval systems must handle short, underspecified queries such as ``foreign transaction fee refund'' and ``recent check status''. In these cases, semantic nuance and meta...
- Application of deep learning techniques in non-contrast computed tomography pulmonary angiogram for pulmonary embolism diagnosis : Abstract: Pulmonary embolism is a life-threatening disease, early detection and treatment can significantly reduce mortality. In recent years, many studies have been using deep learning in the diagnos...
- Complexity-based code embeddings : Abstract: This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against diffe...
- Practical Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease : Abstract: Skeletal muscle dysfunction is a clinically relevant extra-pulmonary manifestation of chronic obstructive pulmonary disease (COPD) and is closely linked to systemic and airway inflammation. ...
- MODE: Efficient Time Series Prediction with Mamba Enhanced by Low-Rank Neural ODEs : Abstract: Time series prediction plays a pivotal role across diverse domains such as finance, healthcare, energy systems, and environmental modeling. However, existing approaches often struggle to bal...
- Attention Needs to Focus: A Unified Perspective on Attention Allocation : Abstract: The Transformer architecture, a cornerstone of modern Large Language Models (LLMs), has achieved extraordinary success in sequence modeling, primarily due to its attention mechanism. However...
- The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries : Abstract: When someone asks ChatGPT to recommend a project management tool, which products show up in the response? And more importantly for startup founders: will their newly launched product ever ap...
- Device-Native Autonomous Agents for Privacy-Preserving Negotiations : Abstract: Automated negotiations in insurance and business-to-business (B2B) commerce encounter substantial challenges. Current systems force a trade-off between convenience and privacy by routing sen...
- Conformal Prediction Under Distribution Shift: A COVID-19 Natural Experiment : Abstract: Conformal prediction guarantees degrade under distribution shift. We study this using COVID-19 as a natural experiment across 8 supply chain tasks. Despite identical severe feature turnover ...
- Placenta Accreta Spectrum Detection using Multimodal Deep Learning : Abstract: Placenta Accreta Spectrum (PAS) is a life-threatening obstetric complication involving abnormal placental invasion into the uterine wall. Early and accurate prenatal diagnosis is essential t...
- Evaluating Contextual Intelligence in Recyclability: A Comprehensive Study of Image-Based Reasoning Systems : Abstract: While the importance of efficient recycling is widely acknowledged, accurately determining the recyclability of items and their proper disposal remains a complex task for the general public....
- CornViT: A Multi-Stage Convolutional Vision Transformer Framework for Hierarchical Corn Kernel Analysis : Abstract: Accurate grading of corn kernels is critical for seed certification, directional seeding, and breeding, yet it is still predominantly performed by manual inspection. This work introduces Cor...
- Enhancing Retrieval-Augmented Generation with Topic-Enriched Embeddings: A Hybrid Approach Integrating Traditional NLP Techniques : Abstract: Retrieval-augmented generation (RAG) systems rely on accurate document retrieval to ground large language models (LLMs) in external knowledge, yet retrieval quality often degrades in corpora...
- LearnAD: Learning Interpretable Rules for Brain Networks in Alzheimer's Disease Classification : Abstract: We introduce LearnAD, a neuro-symbolic method for predicting Alzheimer's disease from brain magnetic resonance imaging data, learning fully interpretable rules. LearnAD applies statistical m...
- LLMize: A Framework for Large Language Model-Based Numerical Optimization : Abstract: Large language models (LLMs) have recently shown strong reasoning capabilities beyond traditional language tasks, motivating their use for numerical optimization. This paper presents LLMize,...
- SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation : Abstract: SmartFlow is a multi-layered framework that integrates Reinforcement Learning and Agentic AI to address the dynamic rebalancing problem in urban bike-sharing services. Its architecture separ...
- The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models : Abstract: Large Language Models (LLMs) are rapidly transitioning from conversational assistants to autonomous agents embedded in critical organizational functions, including Security Operations Center...
- A-PINN: Auxiliary Physics-informed Neural Networks for Structural Vibration Analysis in Continuous Euler-Bernoulli Beam : Abstract: Recent advancements in physics-informed neural networks (PINNs) and their variants have garnered substantial focus from researchers due to their effectiveness in solving both forward and inv...
- Path Integral Solution for Dissipative Generative Dynamics : Abstract: Can purely mechanical systems generate intelligent language? We prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text genera...
- FedSCAM (Federated Sharpness-Aware Minimization with Clustered Aggregation and Modulation): Scam-resistant SAM for Robust Federated Optimization in Heterogeneous Environments : Abstract: Federated Learning (FL) enables collaborative model training across decentralized edge devices while preserving data privacy. However, statistical heterogeneity among clients, often manifest...
- Value-guided action planning with JEPA world models : Abstract: Building deep learning models that can reason about their environment requires capturing its underlying dynamics. Joint-Embedded Predictive Architectures (JEPA) provide a promising framework...
- A Global Atlas of Digital Dermatology to Map Innovation and Disparities : Abstract: The adoption of artificial intelligence in dermatology promises democratized access to healthcare, but model reliability depends on the quality and comprehensiveness of the data fueling thes...
- Pediatric Pneumonia Detection from Chest X-Rays:A Comparative Study of Transfer Learning and Custom CNNs : Abstract: Pneumonia is a leading cause of mortality in children under five, with over 700,000 deaths annually. Accurate diagnosis from chest X-rays is limited by radiologist availability and variabili...
- Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds : Abstract: Simulating nonlinear reaction-diffusion dynamics on complex, non-Euclidean manifolds remains a fundamental challenge in computational morphogenesis, constrained by high-fidelity mesh generat...
- A Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System for Advertisement Retrieval and Personalization : Abstract: In modern digital marketing, the growing complexity of advertisement data demands intelligent systems capable of understanding semantic relationships among products, audiences, and advertisi...
- Speak the Art: A Direct Speech to Image Generation Framework : Abstract: Direct speech-to-image generation has recently shown promising results. However, compared to text-to-image generation, there is still a large gap to enclose. Current approaches use two stage...
- Free Energy-Based Modeling of Emotional Dynamics in Video Advertisements : Abstract: Emotional responses during advertising video viewing are recognized as essential for understanding media effects because they have influenced attention, memory, and purchase intention. To es...
- Can Large Language Models Improve Venture Capital Exit Timing After IPO? : Abstract: Exit timing after an IPO is one of the most consequential decisions for venture capital (VC) investors, yet existing research focuses mainly on describing when VCs exit rather than evaluatin...
- A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction : Abstract: Agentic workflows driven by large language models (LLMs) are increasingly applied to Building Information Modelling (BIM), enabling natural-language retrieval, modification and generation of...
- The Qualitative Laboratory: Theory Prototyping and Hypothesis Generation with Large Language Models : Abstract: A central challenge in social science is to generate rich qualitative hypotheses about how diverse social groups might interpret new information. This article introduces and illustrates a no...
- Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling : Abstract: This work introduces Falcon-H1R, a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs). F...
- Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents : Abstract: As Large Language Model (LLM) agents are increasingly tasked with high-stakes autonomous decision-making, the transparency of their reasoning processes has become a critical safety concern. ...
- Streaming Hallucination Detection in Long Chain-of-Thought Reasoning : Abstract: Long chain-of-thought (CoT) reasoning improves the performance of large language models, yet hallucinations in such settings often emerge subtly and propagate across reasoning steps. We sugg...
- EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning : Abstract: Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interac...
- FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations : Abstract: Pharmaceutical three-dimensional (3D) printing is an advanced fabrication technology with the potential to enable truly personalised dosage forms. Recent studies have integrated artificial i...
- Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management : Abstract: Deep reinforcement learning agents often exhibit erratic, high-frequency control behaviors that hinder real-world deployment due to excessive energy consumption and mechanical wear. We syste...
- Simulated Reasoning is Reasoning : Abstract: Reasoning has long been understood as a pathway between stages of understanding. Proper reasoning leads to understanding of a given subject. This reasoning was conceptualized as a process of...
- XAI-MeD: Explainable Knowledge Guided Neuro-Symbolic Framework for Domain Generalization and Rare Class Detection in Medical Imaging : Abstract: Explainability domain generalization and rare class reliability are critical challenges in medical AI where deep models often fail under real world distribution shifts and exhibit bias again...
- MindChat: A Privacy-preserving Large Language Model for Mental Health Support : Abstract: Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this artic...
- ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems : Abstract: Large language models (LLMs) excel at natural language tasks but remain brittle in domains requiring precise logical and symbolic reasoning. Chaotic dynamical systems provide an especially d...
- CNC-TP: Classifier Nominal Concept Based on Top-Pertinent Attributes : Abstract: Knowledge Discovery in Databases (KDD) aims to exploit the vast amounts of data generated daily across various domains of computer applications. Its objective is to extract hidden and meanin...
- OpenSocInt: A Multi-modal Training Environment for Human-Aware Social Navigation : Abstract: In this paper, we introduce OpenSocInt, an open-source software package providing a simulator for multi-modal social interactions and a modular architecture to train social agents. We descri...
- MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning : Abstract: Autonomous path planning requires a synergy between global reasoning and geometric precision, especially in complex or cluttered environments. While classical A* is valued for its optimality...
- Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs : Abstract: Socio-cognitive benchmarks for large language models (LLMs) often fail to predict real-world behavior, even when models achieve high benchmark scores. Prior work has attributed this evaluati...
- Toward Auditable Neuro-Symbolic Reasoning in Pathology: SQL as an Explicit Trace of Evidence : Abstract: Automated pathology image analysis is central to clinical diagnosis, but clinicians still ask which slide features drive a model's decision and why. Vision-language models can produce natura...
- Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios : Abstract: As agent systems powered by large language models (LLMs) advance, improving the task performance of an autonomous agent, especially in context understanding, tool usage, and response generat...
- Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation : Abstract: Large language models (LLMs) offer new opportunities for constructing knowledge graphs (KGs) from unstructured clinical narratives. However, existing approaches often rely on structured inpu...
- COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs : Abstract: As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet exist...
- Admissibility Alignment : Abstract: This paper introduces Admissibility Alignment: a reframing of AI alignment as a property of admissible action and decision selection over distributions of outcomes under uncertainty, evaluat...
- PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor : Abstract: To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenge...
- Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches : Abstract: Transcendental equations requiring iterative numerical solution pervade engineering practice, from fluid mechanics friction factor calculations to orbital position determination. We systemat...
- A New Benchmark for the Appropriate Evaluation of RTL Code Optimization : Abstract: The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for genera...
- AI Agent Systems: Architectures, Applications, and Evaluation : Abstract: AI agents -- systems that combine foundation models with reasoning, planning, memory, and tool use -- are rapidly becoming a practical interface between natural-language intent and real-worl...
- Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications : Abstract: We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enh...
- Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration : Abstract: Rule-based reasoning over natural language input arises in domains where decisions must be auditable and justifiable: clinical protocols specify eligibility criteria in prose, evidence rules...
- CaveAgent: Transforming LLMs into Stateful Runtime Operators : Abstract: LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms. Traditional approaches rely on procedural JSON-...
- Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement : Abstract: We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale ope...
- Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation : Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in experimental settings, but they systematically diverge from human decisions in complex decision-making enviro...
- Aletheia: Quantifying Cognitive Conviction in Reasoning Models via Regularized Inverse Confusion Matrix : Abstract: In the progressive journey toward Artificial General Intelligence (AGI), current evaluation paradigms face an epistemological crisis. Static benchmarks measure knowledge breadth but fail to ...
- Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making : Abstract: Large language models (LLMs) are increasingly deployed as autonomous decision agents in settings with asymmetric error costs: hiring (missed talent vs wasted interviews), medical triage (mis...
- Reading Between the Lines: Deconfounding Causal Estimates using Text Embeddings and Deep Learning : Abstract: Estimating causal treatment effects in observational settings is frequently compromised by selection bias arising from unobserved confounders. While traditional econometric methods struggle ...
- A construction of an optimal base for conditional attribute and attributional condition implications in triadic contexts : Abstract: This article studies implications in triadic contexts. Specifically, we focus on those introduced by Ganter and Obiedkov, namely conditional attribute and attributional condition implication...
- Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification : Abstract: Small language models (SLMs) are increasingly used for financial classification due to their fast inference and local deployability. However, compared with large language models, SLMs are mo...
- KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models : Abstract: With the rapid adoption of multimodal large language models (MLMs) in autonomous agents, cross-platform task execution capabilities in educational settings have garnered significant attentio...
- A unified multimodal understanding and generation model for cross-disciplinary scientific research : Abstract: Scientific discovery increasingly relies on integrating heterogeneous, high-dimensional data across disciplines nowadays. While AI models have achieved notable success across various scienti...
- Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale : Abstract: Large Language Models (LLMs) have rapidly advanced, with Gemini-3-Pro setting a new performance milestone. In this work, we explore collective intelligence as an alternative to monolithic sc...
- Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models : Abstract: Digital twins, as precise digital representations of physical systems, have evolved from passive simulation tools into intelligent and autonomous entities through the integration of artifici...
- Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies : Abstract: We introduce a recursive AlphaZero-style Monte--Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB is speed. In RMCTS, the search tree is explored in a br...
- Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering : Abstract: Temporal knowledge graph question answering (TKGQA) involves multi-hop reasoning over temporally constrained entity relationships in the knowledge graph to answer a given question. However, ...
- ElecTwit: A Framework for Studying Persuasion in Multi-Agent Social Systems : Abstract: This paper introduces ElecTwit, a simulation framework designed to study persuasion within multi-agent systems, specifically emulating the interactions on social media platforms during a pol...
- Context Collapse: In-Context Learning and Model Collapse : Abstract: This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on li...
- Counterfactual Self-Questioning for Stable Policy Optimization in Language Models : Abstract: Recent work on language model self-improvement shows that models can refine their own reasoning through reflection, verification, debate, or self-generated rewards. However, most existing ap...
- Universal Conditional Logic: A Formal Language for Prompt Engineering : Abstract: We present Universal Conditional Logic (UCL), a mathematical framework for prompt optimization that transforms prompt engineering from heuristic practice into systematic optimization. Throug...
- Cultural Encoding in Large Language Models: The Existence Gap in AI-Mediated Brand Discovery : Abstract: As artificial intelligence systems increasingly mediate consumer information discovery, brands face algorithmic invisibility. This study investigates Cultural Encoding in Large Language ...
- Comment on: Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Tasks : Abstract: Recently published work titled Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task by Kosmyna et al. (2025) has sparked a vivid debate on ...
- Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models : Abstract: We present an openly documented methodology for fine-tuning language models to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis. We curate a dat...
- Enhancing Temporal Awareness in LLMs for Temporal Point Processes : Abstract: Temporal point processes (TPPs) are crucial for analyzing events over time and are widely used in fields such as finance, healthcare, and social systems. These processes are particularly val...
- OmniNeuro: A Multimodal HCI Framework for Explainable BCI Feedback via Generative AI and Sonification : Abstract: While Deep Learning has improved Brain-Computer Interface (BCI) decoding accuracy, clinical adoption is hindered by the "Black Box" nature of these algorithms, leading to user frustration an...
- Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning : Abstract: When AI systems explain their reasoning step-by-step, practitioners often assume these explanations reveal what actually influenced the AI's answer. We tested this assumption by embedding hi...
- Decomposing LLM Self-Correction: The Accuracy-Correction Paradox and Error Depth Hypothesis : Abstract: Large Language Models (LLMs) are widely believed to possess self-correction capabilities, yet recent studies suggest that intrinsic self-correction--where models correct their own outputs wi...
- Energy-Aware Routing to Large Reasoning Models : Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and op...
- CogCanvas: Compression-Resistant Cognitive Artifacts for Long LLM Conversations : Abstract: Large language models face a fundamental tension between context window limits and information fidelity in long conversations. Existing approaches--truncation and summarization--either disca...
- Agentic AI for Autonomous, Explainable, and Real-Time Credit Risk Decision-Making : Abstract: Significant digitalization of financial services in a short period of time has led to an urgent demand to have autonomous, transparent and real-time credit risk decision making systems. The ...
- MathLedger: A Verifiable Learning Substrate with Ledger-Attested Feedback : Abstract: Contemporary AI systems achieve extraordinary performance yet remain opaque and non-verifiable, creating a crisis of trust for safety-critical deployment. We introduce MathLedger, a substrat...
- Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections : Abstract: The paper presents our work on cross-lingual ontology alignment system which uses embedding based cosine similarity matching. The ontology entities are made contextually richer by creating d...
Research Sources: 642 | Generated: 1/6/2026
