AI RESEARCH PAPERS & ACADEMIC SOURCES
- Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection : Abstract: Face Presentation Attack Detection (PAD) demands incremental learning (IL) to combat evolving spoofing tactics and domains. Privacy regulations, however, forbid retaining past data, necessit...
- RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic : Abstract: Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger...
- Schr\"odinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation : Abstract: Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existin...
- Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CT: From Simulated to Real Data : Abstract: Cone Beam CT (CBCT) is an important imaging modality nowadays, however lower image quality of CBCT compared to more conventional Computed Tomography (CT) remains a limiting factor in CBCT ap...
- TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars : Abstract: Constructing drivable and photorealistic 3D head avatars has become a central task in AR/XR, enabling immersive and expressive user experiences. With the emergence of high-fidelity and effic...
- Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation : Abstract: Grasping is one of the most fundamental challenging capabilities in robotic manipulation, especially in unstructured, cluttered, and semantically diverse environments. Recent researches have...
- Flow Gym : Abstract: Flow Gym is a toolkit for research and deployment of flow-field quantification methods inspired by OpenAI Gym and Stable-Baselines3. It uses SynthPix as synthetic image generation engine and...
- HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming : Abstract: High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeas...
- Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models : Abstract: We expose a significant popularity bias in state-of-the-art vision-language models (VLMs), which achieve up to 34% higher accuracy on famous buildings compared to ordinary ones, indicating a...
- Streaming Video Instruction Tuning : Abstract: We present Streamo, a real-time streaming video LLM that serves as a general-purpose interactive assistant. Unlike existing online video models that focus narrowly on question answering or c...
- Fast SAM2 with Text-Driven Token Pruning : Abstract: Segment Anything Model 2 (SAM2), a vision foundation model has significantly advanced in prompt-driven video object segmentation, yet their practical deployment remains limited by the high c...
- TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning : Abstract: The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produc...
- AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents : Abstract: Graphical user interface (GUI) agents can substantially improve productivity by automating frequently executed long-latency tasks on mobile devices. However, existing evaluation benchmarks a...
- Post-Processing Mask-Based Table Segmentation for Structural Coordinate Extraction : Abstract: Structured data extraction from tables plays a crucial role in document image analysis for scanned documents and digital archives. Although many methods have been proposed to detect table st...
- Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential : Abstract: Modern surgical systems increasingly rely on intelligent scene understanding to provide timely situational awareness for enhanced intra-operative safety. Within this pipeline, surgical scene...
- GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation : Abstract: Modern deep learning methods typically treat image sequences as large tensors of sequentially stacked frames. However, is this straightforward representation ideal given the current state-of...
- ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision : Abstract: Controllability is a fundamental requirement in video synthesis, where accurate alignment with conditioning signals is essential. Existing classifier-free guidance methods typically achieve ...
- AnyAD: Unified Any-Modality Anomaly Detection in Incomplete Multi-Sequence MRI : Abstract: Reliable anomaly detection in brain MRI remains challenging due to the scarcity of annotated abnormal cases and the frequent absence of key imaging modalities in real clinical workflows. Exi...
- DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation : Abstract: The "one-shot" technique represents a distinct and sophisticated aesthetic in filmmaking. However, its practical realization is often hindered by prohibitive costs and complex real-world con...
- SegMo: Segment-aligned Text to 3D Human Motion Generation : Abstract: Generating 3D human motions from textual descriptions is an important research problem with broad applications in video games, virtual reality, and augmented reality. Recent methods align th...
- Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval : Abstract: Retrieving images from natural language descriptions is a core task at the intersection of computer vision and natural language processing, with wide-ranging applications in search engines, ...
- Latent Implicit Visual Reasoning : Abstract: While Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in ...
- Human Motion Estimation with Everyday Wearables : Abstract: While on-body device-based human motion estimation is crucial for applications such as XR interaction, existing methods often suffer from poor wearability, expensive hardware, and cumbersome...
- VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs : Abstract: Vision-Language Models (VLMs) have achieved remarkable progress across tasks such as visual question answering and image captioning. Yet, the extent to which these models perform visual reas...
- UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement : Abstract: In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coars...
- Towards Arbitrary Motion Completing via Hierarchical Continuous Representation : Abstract: Physical motions are inherently continuous, and higher camera frame rates typically contribute to improved smoothness and temporal coherence. For the first time, we explore continuous repres...
- A Turn Toward Better Alignment: Few-Shot Generative Adaptation with Equivariant Feature Rotation : Abstract: Few-shot image generation aims to effectively adapt a source generative model to a target domain using very few training images. Most existing approaches introduce consistency constraints-ty...
- ORCA: Object Recognition and Comprehension for Archiving Marine Species : Abstract: Marine visual understanding is essential for monitoring and protecting marine ecosystems, enabling automatic and scalable biological surveys. However, progress is hindered by limited trainin...
- TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation : Abstract: Text-guided medical segmentation enhances segmentation accuracy by utilizing clinical reports as auxiliary information. However, existing methods typically rely on unaligned image and text e...
- MarineEval: Assessing the Marine Intelligence of Vision-Language Models : Abstract: We have witnessed promising progress led by large language models (LLMs) and further vision language models (VLMs) in handling various queries as a general-purpose assistant. VLMs, as a brid...
- FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting : Abstract: Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users. The primary challenge is to accurately align the inpainted...
- UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters : Abstract: Text and formulas constitute the core informational components of many documents. Accurately and efficiently recognizing both is crucial for developing robust and generalizable document pars...
- T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation : Abstract: Text-to-Audio-Video (T2AV) generation aims to synthesize temporally coherent video and semantically synchronized audio from natural language, yet its evaluation remains fragmented, often rel...
- UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer : Abstract: Visual Place Recognition (VPR) has been traditionally formulated as a single-image retrieval task. Using multiple views offers clear advantages, yet this setting remains relatively underexpl...
- Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition : Abstract: Multimodal human action understanding is a significant problem in computer vision, with the central challenge being the effective utilization of the complementarity among diverse modalities ...
- Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control : Abstract: In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative mod...
- Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera : Abstract: Object pose tracking is one of the pivotal technologies in multimedia, attracting ever-growing attention in recent years. Existing methods employing traditional cameras encounter numerous ch...
- Matrix Completion Via Reweighted Logarithmic Norm Minimization : Abstract: Low-rank matrix completion (LRMC) has demonstrated remarkable success in a wide range of applications. To address the NP-hard nature of the rank minimization problem, the nuclear norm is com...
- A Large-Depth-Range Layer-Based Hologram Dataset for Machine Learning-Based 3D Computer-Generated Holography : Abstract: Machine learning-based computer-generated holography (ML-CGH) has advanced rapidly in recent years, yet progress is constrained by the limited availability of high-quality, large-scale holog...
- Next-Scale Prediction: A Self-Supervised Approach for Real-World Image Denoising : Abstract: Self-supervised real-world image denoising remains a fundamental challenge, arising from the antagonistic trade-off between decorrelating spatially structured noise and preserving high-frequ...
- Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model : Abstract: Modern surveillance systems increasingly rely on multi-wavelength sensors and deep neural networks to recognize faces in infrared images captured at night. However, most facial recognition m...
- Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face : Abstract: State-of-the-art 3D-field video-referenced Talking Face Generation (TFG) methods synthesize high-fidelity personalized talking-face videos in real time by modeling 3D geometry and appearance...
- FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing : Abstract: Large-scale text-to-image diffusion models have achieved unprecedented success in image generation and editing. However, extending this success to video editing remains challenging. Recent v...
- Granular-ball Guided Masking: Structure-aware Data Augmentation : Abstract: Deep learning models have achieved remarkable success in computer vision, but they still rely heavily on large-scale labeled data and tend to overfit when data are limited or distributions s...
- Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations : Abstract: Recent advances in pretraining general foundation models have significantly improved performance across diverse downstream tasks. While autoregressive (AR) generative models like GPT have re...
- MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds : Abstract: Multi-view inverse rendering aims to recover geometry, materials, and illumination consistently across multiple viewpoints. When applied to multi-view images, existing single-view approaches...
- PUFM++: Point Cloud Upsampling via Enhanced Flow Matching : Abstract: Recent advances in generative modeling have demonstrated strong promise for high-quality point cloud upsampling. In this work, we present PUFM++, an enhanced flow-matching framework for reco...
- X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data : Abstract: Long-tailed pulmonary anomalies in chest radiography present formidable diagnostic challenges. Despite the recent strides in diffusion-based methods for enhancing the representation of taile...
- XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping : Abstract: Large-scale incremental mapping is fundamental to the development of robust and reliable autonomous systems, as it underpins incremental environmental understanding with sequential inputs fo...
- SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking : Abstract: CCTV-based vehicle tracking systems face structural limitations in continuously connecting the trajectories of the same vehicle across multiple camera environments. In particular, blind spot...
- Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection : Abstract: The rapid progress of generative models has intensified the need for reliable and robust detection under real-world conditions. However, existing detectors often overfit to generator-specifi...
- Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation : Abstract: Amodal completion, the task of inferring invisible object parts, faces significant challenges in maintaining semantic consistency and structural integrity. Prior progressive approaches are i...
- Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting : Abstract: Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, effi...
- Self-supervised Multiplex Consensus Mamba for General Image Fusion : Abstract: Image fusion integrates complementary information from different modalities to generate high-quality fused images, thereby enhancing downstream tasks such as object detection and semantic se...
- PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding : Abstract: 3D Visual Grounding (3DVG) is a critical bridge from vision-language perception to robotics, requiring both language understanding and 3D scene reasoning. Traditional supervised models lever...
- Benchmarking and Enhancing VLM for Compressed Image Understanding : Abstract: With the rapid development of Vision-Language Models (VLMs) and the growing demand for their applications, efficient compression of the image inputs has become increasingly important. Existi...
- DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction : Abstract: Lung cancer continues to be the leading cause of cancer-related deaths globally. Early detection and diagnosis of pulmonary nodules are essential for improving patient survival rates. Althou...
- Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification : Abstract: Cross-Modality Ship Re-Identification (CMS Re-ID) is critical for achieving all-day and all-weather maritime target tracking, yet it is fundamentally challenged by significant modality discr...
- NeRV360: Neural Representation for 360-Degree Videos with a Viewport Decoder : Abstract: Implicit neural representations for videos (NeRV) have shown strong potential for video compression. However, applying NeRV to high-resolution 360-degree videos causes high memory usage and ...
- Lightweight framework for underground pipeline recognition and spatial localization based on multi-view 2D GPR images : Abstract: To address the issues of weak correlation between multi-view features, low recognition accuracy of small-scale targets, and insufficient robustness in complex scenarios in underground pipeli...
- ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction : Abstract: Traditional lecture videos offer flexibility but lack mechanisms for real-time clarification, forcing learners to search externally when confusion arises. Recent advances in large language m...
- Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference : Abstract: Vision-Language Models (VLMs) have demonstrated strong performance on multimodal reasoning tasks, but their deployment remains challenging due to high inference latency and computational cos...
- Learning to Sense for Driving: Joint Optics-Sensor-Model Co-Design for Semantic Segmentation : Abstract: Traditional autonomous driving pipelines decouple camera design from downstream perception, relying on fixed optics and handcrafted ISPs that prioritize human viewable imagery rather than ma...
- OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective : Abstract: Semantic Scene Completion (SSC) is crucial for 3D perception in mobile robotics, as it enables holistic scene understanding by jointly estimating dense volumetric occupancy and per-voxel sem...
- VL4Gaze: Unleashing Vision-Language Models for Gaze Following : Abstract: Human gaze provides essential cues for interpreting attention, intention, and social interaction in visual scenes, yet gaze understanding remains largely unexplored in current vision-languag...
- Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback : Abstract: Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement learning from human feedback (RLHF) aligns LLMs by training...
- Improving Neural Question Generation using World Knowledge : Abstract: In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to en...
- ReaSeq: Unleashing World Knowledge via Reasoning for Sequential Modeling : Abstract: Industrial recommender systems face two fundamental limitations under the log-driven paradigm: (1) knowledge poverty in ID-based item representations that causes brittle interest modeling un...
- Beyond Context: Large Language Models Failure to Grasp Users Intent : Abstract: Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability to understand context and recognize user ...
- Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning : Abstract: Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling spe...
- Decoding Predictive Inference in Visual Language Processing via Spatiotemporal Neural Coherence : Abstract: Human language processing relies on the brain's capacity for predictive inference. We present a machine learning framework for decoding neural (EEG) responses to dynamic visual language stim...
- Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System : Abstract: As large language models (LLMs) are increasingly deployed in high-stakes domains, ensuring their security and alignment has become a critical challenge. Existing red-teaming practices depend...
- MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation : Abstract: Retrieval-augmented generation (RAG) enables large language models (LLMs) to dynamically access external information, which is powerful for answering questions over previously unseen documen...
- C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling : Abstract: We present C2LLM - Contrastive Code Large Language Models, a family of code embedding models in both 0.5B and 7B sizes. Building upon Qwen-2.5-Coder backbones, C2LLM adopts a Pooling by Mult...
- Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks : Abstract: Reasoning benchmarks such as the Abstraction and Reasoning Corpus (ARC) and ARC-AGI are widely used to assess progress in artificial intelligence and are often interpreted as probes of core,...
- SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance : Abstract: The user of Engineering Manuals (EM) finds it difficult to read EM s because they are long, have a dense format which includes written documents, step by step procedures, and standard parame...
- SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation : Abstract: Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised spee...
- ClarifyMT-Bench: Benchmarking and Improving Multi-Turn Clarification for Conversational Large Language Models : Abstract: Large language models (LLMs) are increasingly deployed as conversational assistants in open-domain, multi-turn settings, where users often provide incomplete or ambiguous information. Howeve...
- Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy : Abstract: With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-...
- Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation : Abstract: Distilling the reasoning capabilities from a large language model (LLM) to a smaller student model often involves training on substantial amounts of reasoning data. However, distillation ove...
- Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models : Abstract: Chain-of-Thought (CoT) prompting has significantly advanced task-solving capabilities in natural language processing with large language models. Unlike standard prompting, CoT encourages the...
- Neural Probe-Based Hallucination Detection for Large Language Models : Abstract: Large language models(LLMs) excel at text generation and knowledge question-answering tasks, but they are prone to generating hallucinated content, severely limiting their application in hig...
- Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study : Abstract: Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential b...
- Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation : Abstract: Reasoning distillation has attracted increasing attention. It typically leverages a large teacher model to generate reasoning paths, which are then used to fine-tune a student model so that ...
- How important is Recall for Measuring Retrieval Quality? : Abstract: In realistic retrieval settings with large and evolving knowledge bases, the total number of documents relevant to a query is typically unknown, and recall cannot be computed. In this paper,...
- MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs : Abstract: Large Language Models (LLMs) are increasingly applied to medicine, yet their adoption is limited by concerns over reliability and safety. Existing evaluations either test factual medical kno...
- EssayCBM: Rubric-Aligned Concept Bottleneck Models for Transparent Essay Grading : Abstract: Understanding how automated grading systems evaluate essays remains a significant challenge for educators and students, especially when large language models function as black boxes. We intr...
- Semantic Deception: When Reasoning Models Can't Compute an Addition : Abstract: Large language models (LLMs) are increasingly used in situations where human values are at stake, such as decision-making tasks that involve reasoning when performed by humans. We investigat...
- Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics? : Abstract: We investigate how independent demographic bias mechanisms are from general demographic recognition in language models. Using a multi-task evaluation setup where demographics are associated ...
- Investigating Model Editing for Unlearning in Large Language Models : Abstract: Machine unlearning aims to remove unwanted information from a model, but many methods are inefficient for LLMs with large numbers of parameters or fail to fully remove the intended informati...
- Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles : Abstract: Recent work has explored the use of large language models for generating tutoring responses in mathematics, yet it remains unclear how closely their instructional behavior aligns with expert...
- Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization : Abstract: Realistic user simulation is crucial for training and evaluating task-oriented dialogue (TOD) systems, yet creating simulators that accurately replicate human behavior remains challenging. A...
- SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention : Abstract: Diffusion based approaches to long form text generation suffer from prohibitive computational cost and memory overhead as sequence length increases. We introduce SA-DiffuSeq, a diffusion fra...
- MatchMiner-AI: An Open-Source Solution for Cancer Clinical Trial Matching : Abstract: Clinical trials drive improvements in cancer treatments and outcomes. However, most adults with cancer do not participate in trials, and trials often fail to enroll enough patients to answer...
- Agnostic Process Tomography : Abstract: Characterizing a quantum system by learning its state or evolution is a fundamental problem in quantum physics and learning theory with a myriad of applications. Recently, as a new approach ...
- Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning : Abstract: We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectr...
- Deep Kronecker Network : Abstract: We propose Deep Kronecker Network (DKN), a novel framework designed for analyzing medical imaging data, such as MRI, fMRI, CT, etc. Medical imaging data is different from general images in a...
- Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods : Abstract: Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progressio...
- Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes : Abstract: Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of visi...
- DATTA: Domain Diversity Aware Test-Time Adaptation for Dynamic Domain Shift Data Streams : Abstract: Test-Time Adaptation (TTA) addresses domain shifts between training and testing. However, existing methods assume a homogeneous target domain (e.g., single domain) at any given time. They fa...
- Explicit Group Sparse Projection with Applications to Deep Learning and NMF : Abstract: We design a new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure (an affine function of the ratio ...
- Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty : Abstract: Masked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are ...
- Autonomous Uncertainty Quantification for Computational Point-of-care Sensors : Abstract: Computational point-of-care (POC) sensors enable rapid, low-cost, and accessible diagnostics in emergency, remote and resource-limited areas that lack access to centralized medical facilitie...
- Parallel Token Prediction for Language Models : Abstract: We propose Parallel Token Prediction (PTP), a universal framework for parallel sequence generation in language models. PTP jointly predicts multiple dependent tokens in a single transformer ...
- Variationally correct operator learning: Reduced basis neural operator with a posteriori error estimation : Abstract: Minimizing PDE-residual losses is a common strategy to promote physical consistency in neural operators. However, standard formulations often lack variational correctness, meaning that small...
- LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation : Abstract: Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete tasks, the LLM must be grounded in the envi...
- Assessing the Software Security Comprehension of Large Language Models : Abstract: Large language models (LLMs) are increasingly used in software development, but their level of software security expertise remains unclear. This work systematically evaluates the security co...
- Causal-driven attribution (CDA): Estimating channel influence without user-level data : Abstract: Attribution modelling lies at the heart of marketing effectiveness, yet most existing approaches depend on user-level path data, which are increasingly inaccessible due to privacy regulation...
- A Community-Enhanced Graph Representation Model for Link Prediction : Abstract: Although Graph Neural Networks (GNNs) have become the dominant approach for graph representation learning, their performance on link prediction tasks does not always surpass that of traditio...
- ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update : Abstract: In this paper, we present ElfCore, a 28nm digital spiking neural network processor tailored for event-driven sensory signal processing. ElfCore is the first to efficiently integrate: (1) a l...
- AutoBaxBuilder: Bootstrapping Code Security Benchmarking : Abstract: As LLMs see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work has demonstrated that securit...
- Semi-Supervised Learning for Large Language Models Safety and Content Moderation : Abstract: Safety for Large Language Models (LLMs) has been an ongoing research focus since their emergence and is even more relevant nowadays with the increasing capacity of those models. Currently, t...
- Semantic Refinement with LLMs for Graph Representations : Abstract: Graph-structured data exhibit substantial heterogeneity in where their predictive signals originate: in some domains, node-level semantics dominate, while in others, structural patterns play...
- Hierarchical Modeling Approach to Fast and Accurate Table Recognition : Abstract: The extraction and use of diverse knowledge from numerous documents is a pressing challenge in intelligent information retrieval. Documents contain elements that require different recognitio...
- Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics : Abstract: Controlling systems with complex, nonlinear dynamics poses a significant challenge, particularly in achieving efficient and robust control. In this paper, we propose a Dyna-Style Reinforceme...
- LLM Personas as a Substitute for Field Experiments in Method Benchmarking : Abstract: Field experiments (A/B tests) are often the most credible benchmark for methods in societal systems, but their cost and latency create a major bottleneck for iterative method development. LL...
- Blurb-Refined Inference from Crowdsourced Book Reviews using Hierarchical Genre Mining with Dual-Path Graph Convolutions : Abstract: Accurate book genre classification is fundamental to digital library organization, content discovery, and personalized recommendation. Existing approaches typically model genre prediction as...
- DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors : Abstract: The trend in sign language generation is centered around data-driven generative methods that require vast amounts of precise 2D and 3D human pose data to achieve an acceptable generation qua...
- zkFL-Health: Blockchain-Enabled Zero-Knowledge Federated Learning for Medical AI Privacy : Abstract: Healthcare AI needs large, diverse datasets, yet strict privacy and governance constraints prevent raw data sharing across institutions. Federated learning (FL) mitigates this by training wh...
- Agentic Multi-Persona Framework for Evidence-Aware Fake News Detection : Abstract: The rapid proliferation of online misinformation poses significant risks to public trust, policy, and safety, necessitating reliable automated fake news detection. Existing methods often str...
- Critical Points of Degenerate Metrics on Algebraic Varieties: A Tale of Overparametrization : Abstract: We study the critical points over an algebraic variety of an optimization problem defined by a quadratic objective that is degenerate. This scenario arises in machine learning when the datas...
- Towards Better Search with Domain-Aware Text Embeddings for C2C Marketplaces : Abstract: Consumer-to-consumer (C2C) marketplaces pose distinct retrieval challenges: short, ambiguous queries; noisy, user-generated listings; and strict production constraints. This paper reports ou...
- Enhancing diffusion models with Gaussianization preprocessing : Abstract: Diffusion models are a class of generative models that have demonstrated remarkable success in tasks such as image generation. However, one of the bottlenecks of these models is slow samplin...
- Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments : Abstract: Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious d...
- Automatic Replication of LLM Mistakes in Medical Conversations : Abstract: Large language models (LLMs) are increasingly evaluated in clinical settings using multi-dimensional rubrics which quantify reasoning quality, safety, and patient-centeredness. Yet, replicat...
- GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model : Abstract: Language Model (LM)-based generative modeling has emerged as a promising direction for TSE, offering potential for improved generalization and high-fidelity speech. We present GenTSE, a two-...
- Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions : Abstract: As foundation models grow in size, fine-tuning them becomes increasingly expensive. While GPU spot instances offer a low-cost alternative to on-demand resources, their volatile prices and av...
- MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment : Abstract: This paper presents our system for SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval. In an era where misinformation spreads rapidly, effective fact-checking is...
- AirGS: Real-Time 4D Gaussian Streaming for Free-Viewpoint Video Experiences : Abstract: Free-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D ...
- Clever Hans in Chemistry: Chemist Style Signals Confound Activity Prediction on Public Benchmarks : Abstract: Can machine learning models identify which chemist made a molecule from structure alone? If so, models trained on literature data may exploit chemist intent rather than learning causal struc...
- Architectural Trade-offs in Small Language Models Under Compute Constraints : Abstract: We present a systematic empirical study of small language models under strict compute constraints, analyzing how architectural choices and training budget interact to determine performance. ...
- Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification : Abstract: Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in...
- NVIDIA Nemotron 3: Efficient and Open Intelligence : Abstract: We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-o...
- Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning : Abstract: We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion ne...
- CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images : Abstract: Quantifying cell morphology using images and machine learning has proven to be a powerful tool to study the response of cells to treatments. However, models used to quantify cellular morphol...
- Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions : Abstract: Real-world sequential decision-making often involves parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters gove...
- Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights : Abstract: Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are se...
- NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts : Abstract: Breast ultrasound (BUS) segmentation provides lesion boundaries essential for computer-aided diagnosis and treatment planning. While promptable methods can improve segmentation performance a...
- TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior : Abstract: Tokenizers provide the fundamental basis through which text is represented and processed by language models (LMs). Despite the importance of tokenization, its role in LM performance and beha...
- A Physics Informed Neural Network For Deriving MHD State Vectors From Global Active Regions Observations : Abstract: Solar active regions (ARs) do not appear randomly but cluster along longitudinally warped toroidal bands ('toroids') that encode information about magnetic structures in the tachocline, wher...
- TrashDet: Iterative Neural Architecture Search for Efficient Waste Detection : Abstract: This paper addresses trash detection on the TACO dataset under strict TinyML constraints using an iterative hardware-aware neural architecture search framework targeting edge and IoT devices...
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent : Abstract: Large Reasoning Models (LRMs) like o3 and DeepSeek-R1 have achieved remarkable progress in natural language reasoning with long chain-of-thought. However, they remain computationally ineffic...
- AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication : Abstract: The 6G wireless aims at the Tb/s peak data rates are expected, a sub-millisecond latency, massive Internet of Things/vehicle connectivity, which requires sustainable access to audio over the...
- Real-World Adversarial Attacks on RF-Based Drone Detectors : Abstract: Radio frequency (RF) based systems are increasingly used to detect drones by analyzing their RF signal patterns, converting them into spectrogram images which are processed by object detecti...
- Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems : Abstract: Autonomous multi-agent systems are fundamentally fragile: they struggle to solve the Hayekian Information problem (eliciting dispersed private knowledge) and the Hurwiczian Incentive problem...
- Diffusion Models in Simulation-Based Inference: A Tutorial Review : Abstract: Diffusion models have recently emerged as powerful learners for simulation-based inference (SBI), enabling fast and accurate estimation of latent parameters from simulated and real data. The...
- Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding : Abstract: Least-absolute-deviations (LAD) line fitting is robust to outliers but computationally more involved than least squares regression. Although the literature includes linear and near-linear ti...
- Graph Neural Networks for Source Detection: A Review and Benchmark Study : Abstract: The source detection problem arises when an epidemic process unfolds over a contact network, and the objective is to identify its point of origin, i.e., the source node. Research on this pro...
- Uncovering Competency Gaps in Large Language Models and Their Benchmarks : Abstract: The evaluation of large language models (LLMs) relies heavily on standardized benchmarks. These benchmarks provide useful aggregated metrics for a given capability, but those aggregated metr...
- Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps : Abstract: Cybersickness poses a serious challenge for users of virtual reality (VR) technology. Consequently, there has been significant effort to track its occurrence during VR use with brain activit...
- Measuring all the noises of LLM Evals : Abstract: Separating signal from noise is central to experimental science. Applying well-established statistical method effectively to LLM evals requires consideration of their unique noise characteri...
- Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks : Abstract: The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it su...
- Learning to Solve PDEs on Neural Shape Representations : Abstract: Solving partial differential equations (PDEs) on shapes underpins many shape analysis and engineering tasks; yet, prevailing PDE solvers operate on polygonal/triangle meshes while modern 3D ...
- Transcriptome-Conditioned Personalized De Novo Drug Generation for AML Using Metaheuristic Assembly and Target-Driven Filtering : Abstract: Acute Myeloid Leukemia (AML) remains a clinical challenge due to its extreme molecular heterogeneity and high relapse rates. While precision medicine has introduced mutation-specific therapi...
- Model Merging via Multi-Teacher Knowledge Distillation : Abstract: Model merging has emerged as a lightweight alternative to joint multi-task learning (MTL), yet the generalization properties of merged models remain largely unexplored. Establishing such the...
- Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks : Abstract: In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major obstacle to practical deployment. In this pape...
- MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models : Abstract: Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards. However, recent studies reveal a critical constraint: reinforcement learning succ...
- Analytic and Variational Stability of Deep Learning Systems : Abstract: We propose a unified analytic and variational framework for studying stability in deep learning systems viewed as coupled representation-parameter dynamics. The central object is the Learnin...
- A Unified Framework for EEG Seizure Detection Using Universum-Integrated Generalized Eigenvalues Proximal Support Vector Machine : Abstract: The paper presents novel Universum-enhanced classifiers: the Universum Generalized Eigenvalue Proximal Support Vector Machine (U-GEPSVM) and the Improved U-GEPSVM (IU-GEPSVM) for EEG signal ...
- BALLAST: Bandit-Assisted Learning for Latency-Aware Stable Timeouts in Raft : Abstract: Randomized election timeouts are a simple and effective liveness heuristic for Raft, but they become brittle under long-tail latency, jitter, and partition recovery, where repeated split vot...
- MODE: Multi-Objective Adaptive Coreset Selection : Abstract: We present Mode(Multi-Objective adaptive Data Efficiency), a framework that dynamically combines coreset selection strategies based on their evolving contribution to model performance. Unlik...
- STLDM: Spatio-Temporal Latent Diffusion Model for Precipitation Nowcasting : Abstract: Precipitation nowcasting is a critical spatio-temporal prediction task for society to prevent severe damage owing to extreme weather events. Despite the advances in this field, the complex a...
- A Mechanistic Analysis of Transformers for Dynamical Systems : Abstract: Transformers are increasingly adopted for modeling and forecasting time-series, yet their internal mechanisms remain poorly understood from a dynamical systems perspective. In contrast to cl...
- Shared Representation Learning for High-Dimensional Multi-Task Forecasting under Resource Contention in Cloud-Native Backends : Abstract: This study proposes a unified forecasting framework for high-dimensional multi-task time series to meet the prediction demands of cloud native backend systems operating under highly dynamic ...
- Understanding Scaling Laws in Deep Neural Networks via Feature Learning Dynamics : Abstract: The empirical success of deep learning is often attributed to scaling laws that predict consistent gains as model, data, and compute grow; however, large models can exhibit training instabil...
- LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics : Abstract: The rapid proliferation of Large Language Models (LLMs) and diverse specialized benchmarks necessitates a shift from fragmented, task-specific metrics to a holistic, competitive ranking syst...
- CoSeNet: A Novel Approach for Optimal Segmentation of Correlation Matrices : Abstract: In this paper, we propose a novel approach for the optimal identification of correlated segments in noisy correlation matrices. The proposed model is known as CoSeNet (Correlation Seg-mentat...
- Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions : Abstract: Bayesian Reinforcement Learning (BRL) provides a framework for generalisation of Reinforcement Learning (RL) problems from its use of Bayesian task parameters in the transition and reward mo...
- Generalization of Diffusion Models Arises with a Balanced Representation Space : Abstract: Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memoriz...
- Can Agentic AI Match the Performance of Human Data Scientists? : Abstract: Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly autom...
- ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design : Abstract: De novo drug design is a crucial component of modern drug development, yet navigating the vast chemical space to find synthetically accessible, high-affinity candidates remains a significant...
- Solving Functional PDEs with Gaussian Processes and Applications to Functional Renormalization Group Equations : Abstract: We present an operator learning framework for solving non-perturbative functional renormalization group equations, which are integro-differential equations defined on functionals. Our propos...
- A Multi-fidelity Double-Delta Wing Dataset and Empirical Scaling Laws for GNN-based Aerodynamic Field Surrogate : Abstract: Data-driven surrogate models are increasingly adopted to accelerate vehicle design. However, open-source multi-fidelity datasets and empirical guidelines linking dataset size to model perfor...
- Guardrailed Elasticity Pricing: A Churn-Aware Forecasting Playbook for Subscription Strategy : Abstract: This paper presents a marketing analytics framework that operationalizes subscription pricing as a dynamic, guardrailed decision system, uniting multivariate demand forecasting, segment-leve...
- RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks : Abstract: Full parameter fine tuning is a key technique for adapting large language models (LLMs) to downstream tasks, but it incurs substantial memory overhead due to the need to cache extensive inte...
- Towards a General Framework for Predicting and Explaining the Hardness of Graph-based Combinatorial Optimization Problems using Machine Learning and Association Rule Mining : Abstract: This study introduces GCO-HPIF, a general machine-learning-based framework to predict and explain the computational hardness of combinatorial optimization problems that can be represented on...
- DiEC: Diffusion Embedded Clustering : Abstract: Deep clustering hinges on learning representations that are inherently clusterable. However, using a single encoder to produce a fixed embedding ignores the representation trajectory formed ...
- Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks : Abstract: With deep neural networks (DNNs) increasingly embedded in modern society, ensuring their safety has become a critical and urgent issue. In response, substantial efforts have been dedicated t...
- From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction : Abstract: Accurate prediction of flow delay is essential for optimizing and managing modern communication networks. We investigate three levels of modeling for this task. First, we implement a heterog...
- Robustness Certificates for Neural Networks against Adversarial Attacks : Abstract: The increasing use of machine learning in safety-critical domains amplifies the risk of adversarial threats, especially data poisoning attacks that corrupt training data to degrade performan...
- Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs : Abstract: Recent advances in transformer-based foundation models have made them the default choice for many tasks, but their rapidly growing size makes fitting a full model on a single GPU increasingl...
- Defending against adversarial attacks using mixture of experts : Abstract: Machine learning is a powerful tool enabling full automation of a huge number of tasks without explicit programming. Despite recent progress of machine learning in different domains, these m...
- FedMPDD: Communication-Efficient Federated Learning with Privacy Preservation Attributes via Projected Directional Derivative : Abstract: This paper introduces \texttt{FedMPDD} (\textbf{Fed}erated Learning via \textbf{M}ulti-\textbf{P}rojected \textbf{D}irectional \textbf{D}erivatives), a novel algorithm that simultaneously op...
- GraphFire-X: Physics-Informed Graph Attention Networks and Structural Gradient Boosting for Building-Scale Wildfire Preparedness at the Wildland-Urban Interface : Abstract: As wildfires increasingly evolve into urban conflagrations, traditional risk models that treat structures as isolated assets fail to capture the non-linear contagion dynamics characteristic ...
- Symbolic regression for defect interactions in 2D materials : Abstract: Machine learning models have become firmly established across all scientific fields. Extracting features from data and making inferences based on them with neural network models often yields...
- Improving Matrix Exponential for Generative AI Flows: A Taylor-Based Approach Beyond Paterson--Stockmeyer : Abstract: The matrix exponential is a fundamental operator in scientific computing and system simulation, with applications ranging from control theory and quantum mechanics to modern generative machi...
- Subgroup Discovery with the Cox Model : Abstract: We study the problem of subgroup discovery for survival analysis, where the goal is to find an interpretable subset of the data on which a Cox model is highly accurate. Our work is the first...
- TS-Arena Technical Report -- A Pre-registered Live Forecasting Platform : Abstract: While Time Series Foundation Models (TSFMs) offer transformative capabilities for forecasting, they simultaneously risk triggering a fundamental evaluation crisis. This crisis is driven by i...
- Generalization of RLVR Using Causal Reasoning as a Testbed : Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising paradigm for post-training large language models (LLMs) on complex reasoning tasks. Yet, the conditions under...
- Bridging Efficiency and Safety: Formal Verification of Neural Networks with Early Exits : Abstract: Ensuring the safety and efficiency of AI systems is a central goal of modern research. Formal verification provides guarantees of neural network robustness, while early exits improve inferen...
- Stabilizing Multimodal Autoencoders: A Theoretical and Empirical Analysis of Fusion Strategies : Abstract: In recent years, the development of multimodal autoencoders has gained significant attention due to their potential to handle multimodal complex data types and improve model performance. Und...
- FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs : Abstract: As LLMs advance their reasoning capabilities about the physical world, the absence of rigorous benchmarks for evaluating their ability to generate scientifically valid physical models has be...
- PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation : Abstract: Transformers operate as horizontal token-by-token scanners; at each generation step, the model attends to an ever-growing sequence of token-level states. This access pattern increases prefil...
- Revisiting the Learning Objectives of Vision-Language Reward Models : Abstract: Learning generalizable reward functions is a core challenge in embodied intelligence. Recent work leverages contrastive vision language models (VLMs) to obtain dense, domain-agnostic rewards...
- HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model : Abstract: Vision Language Models (VLMs) have undergone significant advancements, particularly with the emergence of mobile-oriented VLMs, which offer a wide range of application scenarios. However, th...
- Disentangling Fact from Sentiment: A Dynamic Conflict-Consensus Framework for Multimodal Fake News Detection : Abstract: Prevalent multimodal fake news detection relies on consistency-based fusion, yet this paradigm fundamentally misinterprets critical cross-modal discrepancies as noise, leading to over-smooth...
- Improving Cardiac Risk Prediction Using Data Generation Techniques : Abstract: Cardiac rehabilitation constitutes a structured clinical process involving multiple interdependent phases, individualized medical decisions, and the coordinated participation of diverse heal...
- Forward Only Learning for Orthogonal Neural Networks of any Depth : Abstract: Backpropagation is still the de facto algorithm used today to train neural networks. With the exponential growth of recent architectures, the computational cost of this algorithm also ...
- Dominating vs. Dominated: Generative Collapse in Diffusion Models : Abstract: Text-to-image diffusion models have drawn significant attention for their ability to generate diverse and high-fidelity images. However, when generating from multi-concept prompts, one conce...
- Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering : Abstract: Current approaches to AI coding agents appear to blur the lines between the Large Language Model (LLM) and the agent itself, asking the LLM to make decisions best left to deterministic proce...
- MaskOpt: A Large-Scale Mask Optimization Dataset to Advance AI in Integrated Circuit Manufacturing : Abstract: As integrated circuit (IC) dimensions shrink below the lithographic wavelength, optical lithography faces growing challenges from diffraction and process variability. Model-based optical pro...
- Q-RUN: Quantum-Inspired Data Re-uploading Networks : Abstract: Data re-uploading quantum circuits (DRQC) are a key approach to implementing quantum neural networks and have been shown to outperform classical neural networks in fitting high-frequency fun...
- Forecasting N-Body Dynamics: A Comparative Study of Neural Ordinary Differential Equations and Universal Differential Equations : Abstract: The n body problem, fundamental to astrophysics, simulates the motion of n bodies acting under the effect of their own mutual gravitational interactions. Traditional machine learning models ...
- Data-Free Pruning of Self-Attention Layers in LLMs : Abstract: Many self-attention sublayers in large language models (LLMs) can be removed with little to no loss. We attribute this to the Attention Suppression Hypothesis: during pre-training, some deep...
- SHRP: Specialized Head Routing and Pruning for Efficient Encoder Compression : Abstract: Transformer encoders are widely deployed in large-scale web services for natural language understanding tasks such as text classification, semantic retrieval, and content ranking. However, t...
- Real Time Detection and Quantitative Analysis of Spurious Forgetting in Continual Learning : Abstract: Catastrophic forgetting remains a fundamental challenge in continual learning for large language models. Recent work revealed that performance degradation may stem from spurious forgetting c...
- Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models : Abstract: Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional mod...
- Zero-Training Temporal Drift Detection for Transformer Sentiment Models: A Comprehensive Analysis on Authentic Social Media Streams : Abstract: We present a comprehensive zero-training temporal drift analysis of transformer-based sentiment models validated on authentic social media data from major real-world events. Through systemat...
- Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning : Abstract: This study proposes a multi-agent language framework that enables continual strategy evolution without fine-tuning the language model's parameters. The core idea is to liberate the latent ve...
- Parameter-Efficient Neural CDEs via Implicit Function Jacobians : Abstract: Neural Controlled Differential Equations (Neural CDEs, NCDEs) are a unique branch of methods, specifically tailored for analysing temporal sequences. However, they come with drawbacks, the m...
Research Sources: 210 | Generated: 1/2/2026
