AI RESEARCH PAPERS & ACADEMIC SOURCES
- Extended OpenTT Games Dataset: A table tennis dataset for fine-grained shot type and point outcome : Abstract: Automatically detecting and classifying strokes in table tennis video can streamline training workflows, enrich broadcast overlays, and enable fine-grained performance analytics. For this to...
- Name That Part: 3D Part Segmentation and Naming : Abstract: We address semantic 3D part segmentation: decomposing objects into parts with meaningful names. While datasets exist with part annotations, their definitions are inconsistent across datasets...
- SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection : Abstract: Developing robust drone detection systems is often constrained by the limited availability of large-scale annotated training data and the high costs associated with real-world data collectio...
- Explainable Binary Classification of Separable Shape Ensembles : Abstract: Scientists, engineers, biologists, and technology specialists universally leverage image segmentation to extract shape ensembles containing many thousands of curves representing patterns in ...
- Controllable Generation with Text-to-Image Diffusion Models: A Survey : Abstract: In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generat...
- Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration : Abstract: Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two persistent bottlenecks: the scar...
- Learning Latent Action World Models In The Wild : Abstract: Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often re...
- GenAI-DrawIO-Creator: A Framework for Automated Diagram Generation : Abstract: Diagrams are crucial for communicating complex information, yet creating and modifying them remains a labor-intensive task. We present GenAI-DrawIO-Creator, a novel framework that leverages ...
- Scalable neural pushbroom architectures for real-time denoising of hyperspectral images onboard satellites : Abstract: The next generation of Earth observation satellites will seek to deploy intelligent models directly onboard the payload in order to minimize the latency incurred by the transmission and proc...
- Decentralized Privacy-Preserving Federal Learning of Computer Vision Models on Edge Devices : Abstract: Collaborative training of a machine learning model comes with a risk of sharing sensitive or private data. Federated learning offers a way of collectively training a single global model with...
- In-SRAM Radiant Foam Rendering on a Graph Processor : Abstract: Many emerging many-core accelerators replace a single large device memory with hundreds to thousands of lightweight cores, each owning only a small local SRAM and exchanging data via explici...
- End-to-end differentiable design of geometric waveguide displays : Abstract: Geometric waveguides are a promising architecture for optical see-through augmented reality displays, but their performance is severely bottlenecked by the difficulty of jointly optimizing n...
- UNIC: Learning Unified Multimodal Extrinsic Contact Estimation : Abstract: Contact-rich manipulation requires reliable estimation of extrinsic contacts-the interactions between a grasped object and its environment which provide essential contextual information for ...
- Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video : Abstract: We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's complete 3D shape and motion, r...
- QNeRF: Neural Radiance Fields on a Simulated Gate-Based Quantum Computer : Abstract: Recently, Quantum Visual Fields (QVFs) have shown promising improvements in model compactness and convergence speed for learning the provided 2D or 3D signals. Meanwhile, novel-view synthesi...
- RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes : Abstract: Nighttime color constancy remains a challenging problem in computational photography due to low-light noise and complex illumination conditions. We present RL-AWB, a novel framework combinin...
- Pixel-Perfect Visual Geometry Estimation : Abstract: Recovering clean and accurate geometry from images is essential for robotics and augmented reality. However, existing geometry foundation models still suffer severely from flying pixels and ...
- GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation : Abstract: Referring Expression Segmentation (RES) and Comprehension (REC) respectively segment and detect the object described by an expression, while Referring Expression Generation (REG) generates a...
- RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation : Abstract: The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constraints, collecting large-scale ...
- Plenoptic Video Generation : Abstract: Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often stru...
- ObjectForesight: Predicting Future 3D Object Trajectories from Human Videos : Abstract: Humans can effortlessly anticipate how objects might move or change through interaction--imagining a cup being lifted, a knife slicing, or a lid being closed. We aim to endow computational s...
- FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching : Abstract: Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Prediction (BAP), which estimates an i...
- MoE3D: A Mixture-of-Experts Module for 3D Reconstruction : Abstract: MoE3D is a mixture-of-experts module designed to sharpen depth boundaries and mitigate flying-point artifacts (highlighted in red) of existing feed-forward 3D reconstruction models (left sid...
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice : Abstract: Chain-of-thought (CoT) reasoning has emerged as a powerful tool for multimodal large language models on video understanding tasks. However, its necessity and advantages over direct answering...
- CoV: Chain-of-View Prompting for Spatial Reasoning : Abstract: Embodied question answering (EQA) in 3D environments often requires collecting context that is distributed across multiple viewpoints and partially occluded. However, most recent vision--lan...
- Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering : Abstract: Object hallucination critically undermines the reliability of Multimodal Large Language Models, often stemming from a fundamental failure in cognitive introspection, where models blindly tru...
- Multi-Scale Local Speculative Decoding for Image Generation : Abstract: Autoregressive (AR) models have achieved remarkable success in image synthesis, yet their sequential nature imposes significant latency constraints. Speculative Decoding offers a promising a...
- VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control : Abstract: Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inher...
- VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding : Abstract: This work introduces VERSE, a methodology for analyzing and improving Vision-Language Models applied to Visually-rich Document Understanding by exploring their visual embedding space. VERSE ...
- Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing : Abstract: In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts, demanding precise understanding and faithful execution of user...
- From Rays to Projections: Better Inputs for Feed-Forward View Synthesis : Abstract: Feed-forward view synthesis models predict a novel view in a single pass with minimal 3D inductive bias. Existing works encode cameras as Plücker ray maps, which tie predictions to the arbit...
- UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition : Abstract: Unlabeled LiDAR logs, in autonomous driving applications, are inherently a gold mine of dense 3D geometry hiding in plain sight - yet they are almost useless without human labels, highlighti...
- Driving on Registers : Abstract: We present DrivoR, a simple and efficient transformer-based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera...
- Patch-based Representation and Learning for Efficient Deformation Modeling : Abstract: In this paper, we present a patch-based representation of surfaces, PolyFit, which is obtained by fitting jet functions locally on surface patches. Such a representation can be learned effic...
- Higher-Order Adversarial Patches for Real-Time Object Detectors : Abstract: Higher-order adversarial attacks can directly be considered the result of a cat-and-mouse game -- an elaborate action involving constant pursuit, near captures, and repeated escapes. This id...
- OceanSplat: Object-aware Gaussian Splatting with Trinocular View Consistency for Underwater Scene Reconstruction : Abstract: We introduce OceanSplat, a novel 3D Gaussian Splatting-based approach for accurately representing 3D geometry in underwater scenes. To overcome multi-view inconsistencies caused by underwate...
- SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection : Abstract: 3D lane detection has emerged as a critical challenge in autonomous driving, encompassing identification and localization of lane markings and the 3D road surface. Conventional 3D methods de...
- TEA: Temporal Adaptive Satellite Image Semantic Segmentation : Abstract: Crop mapping based on satellite images time-series (SITS) holds substantial economic value in agricultural production settings, in which parcel segmentation is an essential step. Existing ap...
- Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics : Abstract: Automatic metrics are now central to evaluating text-to-image models, often substituting for human judgment in benchmarking and large-scale filtering. However, it remains unclear whether the...
- DivAS: Interactive 3D Segmentation of NeRFs via Depth-Weighted Voxel Aggregation : Abstract: Existing methods for segmenting Neural Radiance Fields (NeRFs) are often optimization-based, requiring slow per-scene training that sacrifices the zero-shot capabilities of 2D foundation mod...
- Character Detection using YOLO for Writer Identification in multiple Medieval books : Abstract: Paleography is the study of ancient and historical handwriting, its key objectives include the dating of manuscripts and understanding the evolution of writing. Estimating when a document wa...
- SOVABench: A Vehicle Surveillance Action Retrieval Benchmark for Multimodal Large Language Models : Abstract: Automatic identification of events and recurrent behavior analysis are critical for video surveillance. However, most existing content-based video retrieval benchmarks focus on scene-level s...
- Integrated Framework for Selecting and Enhancing Ancient Marathi Inscription Images from Stone, Metal Plate, and Paper Documents : Abstract: Ancient script images often suffer from severe background noise, low contrast, and degradation caused by aging and environmental effects. In many cases, the foreground text and background ex...
- Detector-Augmented SAMURAI for Long-Duration Drone Tracking : Abstract: Robust long-term tracking of drone is a critical requirement for modern surveillance systems, given their increasing threat potential. While detector-based approaches typically achieve stron...
- PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference : Abstract: Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with ...
- SRU-Pix2Pix: A Fusion-Driven Generator Network for Medical Image Translation with Few-Shot Learning : Abstract: Magnetic Resonance Imaging (MRI) provides detailed tissue information, but its clinical application is limited by long acquisition time, high cost, and restricted resolution. Image translati...
- Defocus Aberration Theory Confirms Gaussian Model in Most Imaging Devices : Abstract: Over the past three decades, defocus has consistently provided groundbreaking depth information in scene images. However, accurately estimating depth from 2D images continues to be a persist...
- GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive progress in single-image grounding and general multi-image understanding. Recently, some methods begin to address multi-...
- Segmentation-Driven Monocular Shape from Polarization based on Physical Model : Abstract: Monocular shape-from-polarization (SfP) leverages the intrinsic relationship between light polarization properties and surface geometry to recover surface normals from single-view polarized ...
- ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting : Abstract: We present ProFuse, an efficient context-aware framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). The pipeline enhances cross-view consistency and intra-...
- Skeletonization-Based Adversarial Perturbations on Large Vision Language Model's Mathematical Text Recognition : Abstract: This work explores the visual capabilities and limitations of foundation models by introducing a novel adversarial attack method utilizing skeletonization to reduce the search space effectiv...
- AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection : Abstract: Multimodal large language models (MLLMs) demonstrate exceptional capabilities in semantic understanding and visual reasoning, yet they still face challenges in precise object localization an...
- Training a Custom CNN on Five Heterogeneous Image Datasets : Abstract: Deep learning has transformed visual data analysis, with Convolutional Neural Networks (CNNs) becoming highly effective in learning meaningful feature representations directly from images. U...
- On the Holistic Approach for Detecting Human Image Forgery : Abstract: The rapid advancement of AI-generated content (AIGC) has escalated the threat of deepfakes, from facial manipulations to the synthesis of entire photorealistic human bodies. However, existin...
- Forge-and-Quench: Enhancing Image Generation for Higher Fidelity in Unified Multimodal Models : Abstract: Integrating image generation and understanding into a single framework has become a pivotal goal in the multimodal domain. However, how understanding can effectively assist generation has no...
- WebCryptoAgent: Agentic Crypto Trading with Web Informatics : Abstract: Cryptocurrency trading increasingly depends on timely integration of heterogeneous web information and market microstructure signals to support short-horizon decision making under extreme vo...
- HATIR: Heat-Aware Diffusion for Turbulent Infrared Video Super-Resolution : Abstract: Infrared video has been of great interest in visual tasks under challenging environments, but often suffers from severe atmospheric turbulence and compression degradation. Existing video sup...
- DB-MSMUNet:Dual Branch Multi-scale Mamba UNet for Pancreatic CT Scans Segmentation : Abstract: Accurate segmentation of the pancreas and its lesions in CT scans is crucial for the precise diagnosis and treatment of pancreatic cancer. However, it remains a highly challenging task due t...
- HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment : Abstract: With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing meth...
- HUR-MACL: High-Uncertainty Region-Guided Multi-Architecture Collaborative Learning for Head and Neck Multi-Organ Segmentation : Abstract: Accurate segmentation of organs at risk in the head and neck is essential for radiation therapy, yet deep learning models often fail on small, complexly shaped organs. While hybrid architect...
- Detection of Deployment Operational Deviations for Safety and Security of AI-Enabled Human-Centric Cyber Physical Systems : Abstract: In recent years, Human-centric cyber-physical systems have increasingly involved artificial intelligence to enable knowledge extraction from sensor-collected data. Examples include medical m...
- MiLDEdit: Reasoning-Based Multi-Layer Design Document Editing : Abstract: Real-world design documents (e.g., posters) are inherently multi-layered, combining decoration, text, and images. Editing them from natural-language instructions requires fine-grained, layer...
- 3D Conditional Image Synthesis of Left Atrial LGE MRI from Composite Semantic Masks : Abstract: Segmentation of the left atrial (LA) wall and endocardium from late gadolinium-enhanced (LGE) MRI is essential for quantifying atrial fibrosis in patients with atrial fibrillation. The devel...
- All Changes May Have Invariant Principles: Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction : Abstract: Harmful memes are ever-shifting in the Internet communities, which are difficult to analyze due to their type-shifting and temporal-evolving nature. Although these memes are shifting, we fin...
- FaceRefiner: High-Fidelity Facial Texture Refinement with Differentiable Rendering-based Style Transfer : Abstract: Recent facial texture generation methods prefer to use deep networks to synthesize image content and then fill in the UV map, thus generating a compelling full texture from a single image. N...
- TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression : Abstract: Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regi...
- UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving : Abstract: World models have become central to autonomous driving, where accurate scene understanding and future prediction are crucial for safe control. Recent work has explored using vision-language ...
- CRUNet-MR-Univ: A Foundation Model for Diverse Cardiac MRI Reconstruction : Abstract: In recent years, deep learning has attracted increasing attention in the field of Cardiac MRI (CMR) reconstruction due to its superior performance over traditional methods, particularly in h...
- From Preoperative CT to Postmastoidectomy Mesh Construction:1Mastoidectomy Shape Prediction for Cochlear Implant Surgery : Abstract: Cochlear Implant (CI) surgery treats severe hearing loss by inserting an electrode array into the cochlea to stimulate the auditory nerve. An important step in this procedure is mastoidectom...
- 3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation : Abstract: Driven by applications in autonomous driving robotics and augmented reality 3D object annotation presents challenges beyond 2D annotation including spatial complexity occlusion and viewpoint...
- Performance Analysis of Image Classification on Bangladeshi Datasets : Abstract: Convolutional Neural Networks (CNNs) have demonstrated remarkable success in image classification tasks; however, the choice between designing a custom CNN from scratch and employing establi...
- Few-Shot LoRA Adaptation of a Flow-Matching Foundation Model for Cross-Spectral Object Detection : Abstract: Foundation models for vision are predominantly trained on RGB data, while many safety-critical applications rely on non-visible modalities such as infrared (IR) and synthetic aperture radar ...
- Combining facial videos and biosignals for stress estimation during driving : Abstract: Reliable stress recognition from facial videos is challenging due to stress's subjective nature and voluntary facial control. While most methods rely on Facial Action Units, the role of dise...
- PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache : Abstract: A unified autoregressive model is a Transformer-based framework that addresses diverse multimodal tasks (e.g., text, image, video) as a single sequence modeling problem under a shared token ...
- SCAR-GS: Spatial Context Attention for Residuals in Progressive Gaussian Splatting : Abstract: Recent advances in 3D Gaussian Splatting have allowed for real-time, high-fidelity novel view synthesis. Nonetheless, these models have significant storage requirements for large and medium-...
- ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers : Abstract: Recent advances in video diffusion models have shifted towards transformer-based architectures, achieving state-of-the-art video generation but at the cost of quadratic attention complexity,...
- Unified Text-Image Generation with Weakness-Targeted Post-Training : Abstract: Unified multimodal generation architectures that jointly produce text and images have recently emerged as a promising direction for text-to-image (T2I) synthesis. However, many existing syst...
- Embedding Textual Information in Images Using Quinary Pixel Combinations : Abstract: This paper presents a novel technique for embedding textual data into images using quinary combinations of pixel intensities in RGB space. Existing methods predominantly rely on least and mo...
- Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes : Abstract: Post-training alignment of diffusion models relies on simplified signals, such as scalar rewards or binary preferences. This limits alignment with complex human expertise, which is hierarchi...
- ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration : Abstract: We present the first systematic study of machine translation for Chakma, an endangered and extremely low-resource Indo-Aryan language, with the goal of supporting language access and preserv...
- Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities : Abstract: In this work, we propose a simple theoretical framework, Pelican Soup, aiming to better understand how pretraining allows LLMs to (1) generalize to unseen instructions and (2) perform in-con...
- Mechanisms of Prompt-Induced Hallucination in Vision-Language Models : Abstract: Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting set...
- Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop : Abstract: The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, wher...
- A Lightweight and Explainable Vision-Language Framework for Crop Disease Visual Question Answering : Abstract: Visual question answering for crop disease analysis requires accurate visual understanding and reliable language generation. This work presents a lightweight vision-language framework for cr...
- Semantically Orthogonal Framework for Citation Classification: Disentangling Intent and Content : Abstract: Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation in...
- Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts : Abstract: Identifying suitable datasets for a research question remains challenging because existing dataset search engines rely heavily on metadata quality and keyword overlap, which often fail to ca...
- Reinforced Efficient Reasoning via Semantically Diverse Exploration : Abstract: Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs). Monte Carlo Tree Search (MCTS)-based extensions improve...
- Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence : Abstract: Scientific reviews are central to knowledge integration in materials science, yet their key insights remain locked in narrative text and static PDF tables, limiting reuse by humans and machi...
- ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning : Abstract: Recent breakthroughs in Large Reasoning Models (LRMs) have demonstrated that extensive Chain-of-Thought (CoT) generation is critical for enabling intricate cognitive behaviors, such as self-...
- DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation : Abstract: Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs). Parameter-efficient fine-tuning (PEFT), such as LoRA, is widely adopted to adapt pretrained...
- Defense Against Indirect Prompt Injection via Tool Result Parsing : Abstract: As LLM agents transition from digital assistants to physical controllers in autonomous systems and robotics, they face an escalating threat from indirect prompt injection. By embedding adver...
- CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models : Abstract: Video-language models (VLMs) achieve strong multimodal understanding but remain prone to hallucinations, especially when reasoning about actions and temporal order. Existing mitigation strat...
- AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search : Abstract: LLM agents have emerged as powerful systems for tackling multi-turn tasks by interleaving internal reasoning and external tool interactions. Agentic Reinforcement Learning has recently drawn...
- Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models : Abstract: Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste o...
- Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning : Abstract: Large language models (LLMs) are increasingly deployed as intelligent agents that reason, plan, and interact with their environments. To effectively scale to long-horizon scenarios, a key ca...
- A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models : Abstract: In the process of digital transformation, enterprises are faced with problems such as insufficient semantic understanding of unstructured data and lack of intelligent decision-making basis i...
- Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning : Abstract: Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustne...
- BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents : Abstract: Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surfa...
- Advancing Language Models for Code-related Tasks : Abstract: Recent advances in language models (LMs) have driven significant progress in various software engineering tasks. However, existing LMs still struggle with complex programming scenarios due t...
- CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts : Abstract: Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronics design, as large language models (LLMs) frequently hallucin...
- Vision-Language Agents for Interactive Forest Change Analysis : Abstract: Modern forest monitoring workflows increasingly benefit from the growing availability of high-resolution satellite imagery and advances in deep learning. Two persistent challenges in this co...
- Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization : Abstract: Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking appro...
- The Language of Bargaining: Linguistic Effects in LLM Negotiations : Abstract: Negotiation is a core component of social intelligence, requiring agents to balance strategic reasoning, cooperation, and social norms. Recent work shows that LLMs can engage in multi-turn n...
- Generalization to Political Beliefs from Fine-Tuning on Sports Team Preferences : Abstract: Fine-tuned LLMs often exhibit unexpected behavior as a result of generalizing beyond the data they're shown. We present results in which an LLM fine-tuned to prefer either coastal sports tea...
- Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs : Abstract: Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to be Forgotten'. However, many existing met...
- Sphinx: Benchmarking and Modeling for LLM-Driven Pull Request Review : Abstract: Pull request (PR) review is essential for ensuring software quality, yet automating this task remains challenging due to noisy supervision, limited contextual understanding, and inadequate e...
- Generative Teaching via Code : Abstract: The scalability of high-quality online education is hindered by the high costs and slow cycles of labor-intensive manual content creation. Despite advancements in video generation, current a...
- LELA: an LLM-based Entity Linking Approach with Zero-Shot Domain Adaptation : Abstract: Entity linking (mapping ambiguous mentions in text to entities in a knowledge base) is a foundational step in tasks such as knowledge graph construction, question-answering, and information ...
- Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems : Abstract: Existing long-term personalized dialogue systems struggle to reconcile unbounded interaction streams with finite context constraints, often succumbing to memory noise accumulation, reasoning...
- Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference : Abstract: Natural Language Inference (NLI) has been an important task for evaluating language models for Natural Language Understanding, but the logical properties of the task are poorly understood an...
- DocDancer: Towards Agentic Document-Grounded Information Seeking : Abstract: Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source mo...
- Agent-as-a-Judge : Abstract: LLM-as-a-Judge has revolutionized AI evaluation by leveraging large language models for scalable assessments. However, as evaluands become increasingly complex, specialized, and multi-step, ...
- How Human is AI? Examining the Impact of Emotional Prompts on Artificial and Human and Responsiveness : Abstract: This research examines how the emotional tone of human-AI interactions shapes ChatGPT and human behavior. In a between-subject experiment, we asked participants to express a specific emotion...
- SemPA: Improving Sentence Embeddings of Large Language Models through Semantic Preference Alignment : Abstract: Traditional sentence embedding methods employ token-level contrastive learning on non-generative pre-trained models. Recently, there have emerged embedding methods based on generative large ...
- ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG : Abstract: Retrieval-Augmented Generation (RAG) helps LLMs stay accurate, but feeding long documents into a prompt makes the model slow and expensive. This has motivated context compression, ranging fr...
- H\'an D\=an Xu\'e B\`u (Mimicry) or Q\=ing Ch\=u Y\'u L\'an (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models : Abstract: Recent Large Reasoning Models trained via reinforcement learning exhibit a "natural" alignment with human cognitive costs. However, we show that the prevailing paradigm of reasoning distilla...
- Can Large Language Models Resolve Semantic Discrepancy in Self-Destructive Subcultures? Evidence from Jirai Kei : Abstract: Self-destructive behaviors are linked to complex psychological states and can be challenging to diagnose. These behaviors may be even harder to identify within subcultural groups due to thei...
- Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization : Abstract: Supervised fine-tuning (SFT) on chain-of-thought (CoT) trajectories demonstrations is a common approach for enabling reasoning in large language models. Standard practices typically only ret...
- Text as a Universal Interface for Transferable Personalization : Abstract: We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaq...
- A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction : Abstract: This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT inc...
- GenProve: Learning to Generate Text with Fine-Grained Provenance : Abstract: Large language models (LLM) often hallucinate, and while adding citations is a common solution, it is frequently insufficient for accountability as users struggle to verify how a cited sourc...
- Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences : Abstract: Large Language Models (LLMs) can generate highly persuasive text, raising concerns about their misuse for propaganda, manipulation, and other harmful purposes. This leads us to our central q...
- Faithful Summarisation under Disagreement via Belief-Level Aggregation : Abstract: Opinion and multi-document summarisation often involve genuinely conflicting viewpoints, yet many existing approaches, particularly LLM-based systems, implicitly smooth disagreement and over...
- Mind2Report: A Cognitive Deep Research Agent for Expert-Level Commercial Report Synthesis : Abstract: Synthesizing informative commercial reports from massive and noisy web sources is critical for high-stakes business decisions. Although current deep research agents achieve notable progress,...
- EvolSQL: Structure-Aware Evolution for Scalable Text-to-SQL Data Synthesis : Abstract: Training effective Text-to-SQL models remains challenging due to the scarcity of high-quality, diverse, and structurally complex datasets. Existing methods either rely on limited human-annot...
- A Navigational Approach for Comprehensive RAG via Traversal over Proposition Graphs : Abstract: Standard RAG pipelines based on chunking excel at simple factual retrieval but fail on complex multi-hop queries due to a lack of structural connectivity. Conversely, initial strategies that...
- MisSpans: Fine-Grained False Span Identification in Cross-Domain Fake News : Abstract: Online misinformation is increasingly pervasive, yet most existing benchmarks and methods evaluate veracity at the level of whole claims or paragraphs using coarse binary labels, obscuring h...
- RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection : Abstract: Cross-domain misinformation detection is challenging, as misinformation arises across domains with substantial differences in knowledge and discourse. Existing methods often rely on single-p...
- When AI Settles Down: Late-Stage Stability as a Signature of AI-Generated Text Detection : Abstract: Zero-shot detection methods for AI-generated text typically aggregate token-level statistics across entire sequences, overlooking the temporal dynamics inherent to autoregressive generation....
- Belief in Authority: Impact of Authority in Multi-Agent Evaluation Framework : Abstract: Multi-agent systems utilizing large language models often assign authoritative roles to improve performance, yet the impact of authority bias on agent interactions remains underexplored. We ...
- NC2C: Automated Convexification of Generic Non-Convex Optimization Problems : Abstract: Non-convex optimization problems are pervasive across mathematical programming, engineering design, and scientific computing, often posing intractable challenges for traditional solvers due ...
- LANGSAE EDITING: Improving Multilingual Information Retrieval via Post-hoc Language Identity Removal : Abstract: Dense retrieval in multilingual settings often searches over mixed-language collections, yet multilingual embeddings encode language identity alongside semantics. This language signal can in...
- Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence : Abstract: Judge Decoding accelerates LLM inference by relaxing the strict verification of Speculative Decoding, yet it typically relies on expensive and noisy supervision. In this work, we revisit thi...
- PILOT-Bench: A Benchmark for Legal Reasoning in the Patent Domain with IRAC-Aligned Classification Tasks : Abstract: The Patent Trial and Appeal Board (PTAB) of the USPTO adjudicates thousands of ex parte appeals each year, requiring the integration of technical understanding and legal reasoning. While lar...
- Tool-MAD: A Multi-Agent Debate Framework for Fact Verification with Diverse Tool Augmentation and Adaptive Retrieval : Abstract: Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks. Multi-Agent Debate (MAD) systems aim to improve...
- RiskAtlas: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation : Abstract: Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety risks. Domain-specific datasets of harmful pro...
- AM$^3$Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs : Abstract: Multi-modal Large Language Models (MLLMs) are increasingly deployed in interactive applications. However, their safety vulnerabilities become pronounced in multi-turn multi-modal scenarios, ...
- Automatic Classifiers Underdetect Emotions Expressed by Men : Abstract: The widespread adoption of automatic sentiment and emotion classifiers makes it important to ensure that these tools perform reliably across different populations. Yet their reliability is t...
- Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking : Abstract: In this report, we introduce the Qwen3-VL-Embedding and Qwen3-VL-Reranker model series, the latest extensions of the Qwen family built on the Qwen3-VL foundation model. Together, they provid...
- Fame Fades, Nature Remains: Disentangling the Character Identity of Role-Playing Agents : Abstract: Despite the rapid proliferation of Role-Playing Agents (RPAs) based on Large Language Models (LLMs), the structural dimensions defining a character's identity remain weakly formalized, often...
- DSC2025 -- ViHallu Challenge: Detecting Hallucination in Vietnamese LLMs : Abstract: The reliability of large language models (LLMs) in production environments remains significantly constrained by their propensity to generate hallucinations -- fluent, plausible-sounding outp...
- PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards : Abstract: Current techniques for post-training Large Language Models (LLMs) rely either on costly human supervision or on external verifiers to boost performance on tasks such as mathematical reasonin...
- Thunder-KoNUBench: A Corpus-Aligned Benchmark for Korean Negation Understanding : Abstract: Although negation is known to challenge large language models (LLMs), benchmarks for evaluating negation understanding, especially in Korean, are scarce. We conduct a corpus-based analysis o...
- See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation : Abstract: In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a rang...
- ToolGate: Contract-Grounded and Verified Tool Execution for LLMs : Abstract: Large Language Models (LLMs) augmented with external tools have demonstrated remarkable capabilities in complex reasoning tasks. However, existing frameworks rely heavily on natural language...
- CRANE: Causal Relevance Analysis of Language-Specific Neurons in Multilingual Large Language Models : Abstract: Multilingual large language models (LLMs) achieve strong performance across languages, yet how language capabilities are organized at the neuron level remains poorly understood. Prior work h...
- SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation : Abstract: Medical consultations are intrinsically speech-centric. However, most prior works focus on long-text-based interactions, which are cumbersome and patient-unfriendly. Recent advances in speec...
- MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark : Abstract: Large Language Models (LLMs) alignment is constantly evolving. Machine-Generated Text (MGT) is becoming increasingly difficult to distinguish from Human-Written Text (HWT). This has exacerba...
- From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset : Abstract: Large language models (LLMs) achieve strong performance on many tasks, but their progress remains uneven across languages and cultures, often reflecting values latent in English-centric trai...
- Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR : Abstract: Current role-playing agents (RPAs) are typically constructed by imitating surface-level behaviors, but this approach lacks internal cognitive consistency, often causing out-of-character erro...
- When More Words Say Less: Decoupling Length and Specificity in Image Description Evaluation : Abstract: Vision-language models (VLMs) are increasingly used to make visual content accessible via text-based descriptions. In current systems, however, description specificity is often conflated wit...
- THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai -- Technical Report : Abstract: Large Language Models (LLMs) have demonstrated significant potential across various domains, particularly in banking and finance, where they can automate complex tasks and enhance decision-m...
- Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization : Abstract: Text-to-Visualization (Text2Vis) systems translate natural language queries over tabular data into concise answers and executable visualizations. While closed-source LLMs generate functional...
- FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback : Abstract: Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification an...
- Identifying Good and Bad Neurons for Task-Level Controllable LLMs : Abstract: Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks, but the complex mechanisms underlying their large-scale neurons remain opaqu...
- BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation : Abstract: As large language models (LLMs) are increasingly deployed for text generation, watermarking has become essential for authorship attribution, intellectual property protection, and misuse dete...
- GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence : Abstract: Retrieval-Augmented Generation (RAG) integrates external knowledge to enhance Large Language Models (LLMs), yet systems remain susceptible to two critical flaws: providing correct answers wi...
- LinguaGame: A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation : Abstract: Large Language Models (LLMs) have enabled Multi-Agent Systems (MASs) where agents interact through natural language to solve complex tasks or simulate multi-party dialogues. Recent work on L...
- WESR: Scaling and Evaluating Word-level Event-Speech Recognition : Abstract: Speech conveys not only linguistic information but also rich non-verbal vocal events such as laughing and crying. While semantic transcription is well-studied, the precise localization of no...
- Beyond Static Summarization: Proactive Memory Extraction for LLM Agents : Abstract: Memory management is vital for LLM agents to handle long-term interaction and personalization. Most research focuses on how to organize and use memory summary, but often overlooks the initia...
- Users Mispredict Their Own Preferences for AI Writing Assistance : Abstract: Proactive AI writing assistants need to predict when users want drafting help, yet we lack empirical understanding of what drives preferences. Through a factorial vignette study with 50 part...
- Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models : Abstract: Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad task generalization without additional fine...
- Learning to Simulate Human Dialogue : Abstract: To predict what someone will say is to model how they think. We study this through next-turn dialogue prediction: given a conversation, predict the next utterance produced by a person. We co...
- Accommodation and Epistemic Vigilance: A Pragmatic Account of Why LLMs Fail to Challenge Harmful Beliefs : Abstract: Large language models (LLMs) frequently fail to challenge users' harmful beliefs in domains ranging from medical advice to social reasoning. We argue that these failures can be understood an...
- Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization : Abstract: Large language models (LLMs) now support contexts of up to 1M tokens, but their effectiveness on complex long-context tasks remains unclear. In this paper, we study multi-document legal case...
- Interpreting Transformers Through Attention Head Intervention : Abstract: Neural networks are growing more capable on their own, but we do not understand their neural mechanisms. Understanding these mechanisms' decision-making processes, or mechanistic interpretab...
- ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models : Abstract: Human cognition, driven by complex neurochemical processes, oscillates between imagination and reality and learns to self-correct whenever such subtle drifts lead to hallucinations or unsafe...
- MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking : Abstract: Current safety evaluations of large language models (LLMs) create a dangerous illusion of universality, aggregating "Identity Hate" into scalar scores that mask systemic vulnerabilities agai...
- Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource Indic Language Varieties : Abstract: We conduct an empirical study of cross-lingual transfer using spontaneous, noisy, and code-mixed speech across a wide range of Indic dialects and language varieties. Our results indicate tha...
- RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation : Abstract: Scientific rigour tends to be sidelined in favour of bold statements, leading authors to overstate claims beyond what their results support. We present RIGOURATE, a two-stage multimodal fram...
- AnimatedLLM: Explaining LLMs with Interactive Visualizations : Abstract: Large language models (LLMs) are becoming central to natural language processing education, yet materials showing their mechanics are sparse. We present AnimatedLLM, an interactive web appli...
- TrueBrief: Faithful Summarization through Small Language Models : Abstract: Large language models (LLMs) have exhibited remarkable proficiency in generating high-quality text; however, their propensity for producing hallucinations poses a significant challenge for t...
- Qwerty AI: Explainable Automated Age Rating and Content Safety Assessment for Russian-Language Screenplays : Abstract: We present Qwerty AI, an end-to-end system for automated age-rating and content-safety assessment of Russian-language screenplays according to Federal Law No. 436-FZ. The system processes fu...
- Complexity Agnostic Recursive Decomposition of Thoughts : Abstract: Large language models often fail on multi-step reasoning due to fixed reasoning strategies that ignore problem specific difficulty. We introduce CARD (Complexity Agnostic Recursive Decomposi...
- Leveraging Language Models and RAG for Efficient Knowledge Discovery in Clinical Environments : Abstract: Large language models (LLMs) are increasingly recognized as valuable tools across the medical environment, supporting clinical, research, and administrative workflows. However, strict privac...
- LLMs for Explainable Business Decision-Making: A Reinforcement Learning Fine-Tuning Approach : Abstract: Artificial Intelligence (AI) models increasingly drive high-stakes consumer interactions, yet their decision logic often remains opaque. Prevailing explainable AI techniques rely on post hoc...
- Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis : Abstract: LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model sp...
- Enhancing Admission Inquiry Responses with Fine-Tuned Models and Retrieval-Augmented Generation : Abstract: University admissions offices face the significant challenge of managing high volumes of inquiries efficiently while maintaining response quality, which critically impacts prospective studen...
- STDD:Spatio-Temporal Dynamics-Driven Token Refinement in Diffusion Language Models : Abstract: Unlike autoregressive language models, diffusion language models (DLMs) generate text by iteratively denoising all token positions in parallel. At each timestep, the remasking strategy of a ...
- Collective Narrative Grounding: Community-Coordinated Data Contributions to Improve Local AI Systems : Abstract: Large language model (LLM) question-answering systems often fail on community-specific queries, creating "knowledge blind spots" that marginalize local voices and reinforce epistemic injusti...
- Attribute-Aware Controlled Product Generation with LLMs for E-commerce : Abstract: Product information extraction is crucial for e-commerce services, but obtaining high-quality labeled datasets remains challenging. We present a systematic approach for generating synthetic ...
- Automatic Construction of Chinese Verb Collostruction Database : Abstract: This paper proposes a fully unsupervised approach to the construction of verb collostruction database for Chinese language, aimed at complementing LLMs by providing explicit and interpretabl...
- RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation : Abstract: Evaluating Retrieval-Augmented Generation (RAG) systems remains a challenging task: existing metrics often collapse heterogeneous behaviors into single scores and provide little insight into...
- MedPI: Evaluating AI Systems in Medical Patient-facing Interactions : Abstract: We present MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn question-answer (QA) benchmarks, MedPI evalu...
- Variational decision diagrams for quantum-inspired machine learning applications : Abstract: Decision diagrams (DDs) have emerged as an efficient tool for simulating quantum circuits due to their capacity to exploit data redundancies in quantum states and quantum operations, enablin...
- A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests : Abstract: Software vulnerabilities are often detected via taint analysis, penetration testing, or fuzzing. They are also found via unit tests that exercise security-sensitive behavior with specific in...
- Surface solar radiation: AI satellite retrieval can outperform Heliosat and generalizes well to other climate zones : Abstract: Accurate estimates of surface solar irradiance (SSI) are essential for solar resource assessments and solar energy forecasts in grid integration and building control applications. SSI estima...
- Fourier Neural Operators for Learning Dynamics in Quantum Spin Systems : Abstract: Fourier Neural Operators (FNOs) excel on tasks using functional data, such as those originating from partial differential equations. Such characteristics render them an effective approach fo...
- Extreme Solar Flare Prediction Using Residual Networks with HMI Magnetograms and Intensitygrams : Abstract: Solar flares, especially C, M, and X class, pose significant risks to satellite operations, communication systems, and power grids. We present a novel approach for predicting extreme solar f...
- Realised Volatility Forecasting: Machine Learning via Financial Word Embedding : Abstract: We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and ...
- OpenEM: Large-scale multi-structural 3D datasets for electromagnetic methods : Abstract: Electromagnetic methods have become one of the most widely used techniques in geological exploration. With the remarkable success of deep learning, applying such techniques to EM methods has...
- Low-rank variational dropout: Rank selection and uncertainty in adapters : Abstract: Low-rank adaptation methods enable efficient task-specific updates in large neural networks, but provide no principled mechanism for uncertainty estimation or capacity control. We introduce ...
- Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data : Abstract: Representing and exploiting multivariate signals requires capturing relations between variables, which we can represent by graphs. Graph dictionaries allow to describe complex relational inf...
- Human-in-the-Loop Feature Selection Using Interpretable Kolmogorov-Arnold Network-based Double Deep Q-Network : Abstract: Feature selection is critical for improving the performance and interpretability of machine learning models, particularly in high-dimensional spaces where complex feature interactions can re...
- $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control : Abstract: Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial inte...
- What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions : Abstract: Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of t...
- GRAPHGINI: Fostering Individual and Group Fairness in Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have demonstrated impressive performance across various tasks, leading to their increased adoption in high-stakes decision-making systems. However, concerns have...
- Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization : Abstract: signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses typically assume data are sampled with replacement in each iteration, contradicting a...
- GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization : Abstract: As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenari...
- Measuring and Fostering Peace through Machine Learning and Artificial Intelligence : Abstract: We used machine learning and artificial intelligence: 1) to measure levels of peace in countries from news and social media and 2) to develop on-line tools that promote peace by helping user...
- Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data : Abstract: I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involv...
- CAOS: Conformal Aggregation of One-Shot Predictors : Abstract: One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal pre...
- Stock Market Price Prediction using Neural Prophet with Deep Neural Network : Abstract: Stock market price prediction is a significant interdisciplinary research domain that depends at the intersection of finance, statistics, and economics. Forecasting Accurately predicting sto...
- Cutting AI Research Costs: How Task-Aware Compression Makes Large Language Model Agents Affordable : Abstract: When researchers deploy large language models for autonomous tasks like reviewing literature or generating hypotheses, the computational bills add up quickly. A single research session using...
- RelayLLM: Efficient Reasoning via Collaborative Decoding : Abstract: Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessa...
- Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms : Abstract: In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous...
- ROOFS: RObust biOmarker Feature Selection : Abstract: Feature selection (FS) is essential for biomarker discovery and in the analysis of biomedical datasets. However, challenges such as high-dimensional feature space, low sample size, multicoll...
- Atlas 2 -- Foundation models for clinical deployment : Abstract: Pathology foundation models substantially advanced the possibilities in computational pathology -- yet tradeoffs in terms of performance, robustness, and computational requirements remained,...
- Neural Algorithmic Reasoning for Approximate $k$-Coloring with Recursive Warm Starts : Abstract: Node coloring is the task of assigning colors to the nodes of a graph such that no two adjacent nodes have the same color, while using as few colors as possible. It is the most widely studie...
- Token-Level LLM Collaboration via FusionRoute : Abstract: Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scali...
- Code-Mix Sentiment Analysis on Hinglish Tweets : Abstract: The effectiveness of brand monitoring in India is increasingly challenged by the rise of Hinglish--a hybrid of Hindi and English--used widely in user-generated content on platforms like Twit...
- Quantitative mapping from conventional MRI using self-supervised physics-guided deep learning: applications to a large-scale, clinically heterogeneous dataset : Abstract: Magnetic resonance imaging (MRI) is a cornerstone of clinical neuroimaging, yet conventional MRIs provide qualitative information heavily dependent on scanner hardware and acquisition settin...
- Compositional Steering of Large Language Models with Steering Tokens : Abstract: Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a singl...
- From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs) : Abstract: Vision Language Models (VLMs) are poised to revolutionize the digital transformation of pharmacyceutical industry by enabling intelligent, scalable, and automated multi-modality content proc...
- Challenges and Research Directions for Large Language Model Inference Hardware : Abstract: Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by re...
- Exponential capacity scaling of classical GANs compared to hybrid latent style-based quantum GANs : Abstract: Quantum generative modeling is a very active area of research in looking for practical advantage in data analysis. Quantum generative adversarial networks (QGANs) are leading candidates for ...
- Leveraging Prediction Entropy for Automatic Prompt Weighting in Zero-Shot Audio-Language Classification : Abstract: Audio-language models have recently demonstrated strong zero-shot capabilities by leveraging natural-language supervision to classify audio events without labeled training data. Yet, their p...
- Rotation-Robust Regression with Convolutional Model Trees : Abstract: We study rotation-robust learning for image inputs using Convolutional Model Trees (CMTs) [1], whose split and leaf coefficients can be structured on the image grid and transformed geometric...
- V-FAT: Benchmarking Visual Fidelity Against Text-bias : Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive performance on standard visual reasoning benchmarks. However, there is growing concern that these...
- Scaling Vision Language Models for Pharmaceutical Long Form Video Reasoning on Industrial GenAI Platform : Abstract: Vision Language Models (VLMs) have shown strong performance on multimodal reasoning tasks, yet most evaluations focus on short videos and assume unconstrained computational resources. In ind...
- CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters : Abstract: As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when...
- Higher-Order Knowledge Representations for Agentic Scientific Reasoning : Abstract: Scientific inquiry requires systems-level reasoning that integrates heterogeneous experimental data, cross-domain knowledge, and mechanistic evidence into coherent explanations. While Large ...
- Gradient-based Optimisation of Modulation Effects : Abstract: Modulation effects such as phasers, flangers and chorus effects are heavily used in conjunction with the electric guitar. Machine learning based emulation of analog modulation units has been...
- Token Maturation: Autoregressive Language Generation via Continuous Token Dynamics : Abstract: Autoregressive language models are conventionally defined over discrete token sequences, committing to a specific token at every generation step. This early discretization forces uncertainty...
- Illumination Angular Spectrum Encoding for Controlling the Functionality of Diffractive Networks : Abstract: Diffractive neural networks have recently emerged as a promising framework for all-optical computing. However, these networks are typically trained for a single task, limiting their potentia...
- Comparison of Maximum Likelihood Classification Before and After Applying Weierstrass Transform : Abstract: The aim of this paper is to use Maximum Likelihood (ML) Classification on multispectral data by means of qualitative and quantitative approaches. Maximum Likelihood is a supervised classific...
- MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration : Abstract: High-Level Synthesis (HLS) design space exploration (DSE) seeks Pareto-optimal designs within expansive pragma configuration spaces. To accelerate HLS DSE, graph neural networks (GNNs) are c...
- Measurement-Consistent Langevin Corrector: A Remedy for Latent Diffusion Inverse Solvers : Abstract: With recent advances in generative models, diffusion models have emerged as powerful priors for solving inverse problems in each domain. Since Latent Diffusion Models (LDMs) provide generic ...
- Differential syntactic and semantic encoding in LLMs : Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging ...
- The Role of Quantum in Hybrid Quantum-Classical Neural Networks: A Realistic Assessment : Abstract: Quantum machine learning has emerged as a promising application domain for near-term quantum hardware, particularly through hybrid quantum-classical models that leverage both classical and q...
- Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning : Abstract: Fine-tuning large language models (LLMs) has achieved remarkable success across various NLP tasks, but the substantial memory overhead during backpropagation remains a critical bottleneck, e...
- TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning : Abstract: Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face sev...
- Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning : Abstract: We present Tape, a controlled reinforcement-learning benchmark designed to isolate out-of-distribution (OOD) failure under latent rule shifts.Tape is derived from one-dimensional cellular au...
- Mechanism Design for Federated Learning with Non-Monotonic Network Effects : Abstract: Mechanism design is pivotal to federated learning (FL) for maximizing social welfare by coordinating self-interested clients. Existing mechanisms, however, often overlook the network effects...
- Succeeding at Scale: Automated Multi-Retriever Fusion and Query-Side Adaptation for Multi-Tenant Search : Abstract: Large-scale multi-tenant retrieval systems amass vast user query logs yet critically lack the curated relevance labels required for effective domain adaptation. This "dark data" problem is e...
- DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization : Abstract: The deployment of Machine-Generated Text (MGT) detection systems necessitates processing sensitive user data, creating a fundamental conflict between authorship verification and privacy pres...
- Crystal Generation using the Fully Differentiable Pipeline and Latent Space Optimization : Abstract: We present a materials generation framework that couples a symmetry-conditioned variational autoencoder (CVAE) with a differentiable SO(3) power spectrum objective to steer candidates toward...
- On the Limitations of Rank-One Model Editing in Answering Multi-hop Questions : Abstract: Recent advances in Knowledge Editing (KE), particularly Rank-One Model Editing (ROME), show superior efficiency over fine-tuning and in-context learning for updating single-hop facts in tran...
- Sci-Reasoning: A Dataset Decoding AI Innovation Patterns : Abstract: While AI innovation accelerates rapidly, the intellectual process behind breakthroughs -- how researchers identify gaps, synthesize prior work, and generate insights -- remains poorly unders...
- Neurosymbolic Retrievers for Retrieval-augmented Generation : Abstract: Retrieval Augmented Generation (RAG) has made significant strides in overcoming key limitations of large language models, such as hallucination, lack of contextual grounding, and issues with...
- Paradoxical noise preference in RNNs : Abstract: In recurrent neural networks (RNNs) used to model biological neural networks, noise is typically introduced during training to emulate biological variability and regularize learning. The exp...
- Integrating Distribution Matching into Semi-Supervised Contrastive Learning for Labeled and Unlabeled Data : Abstract: The advancement of deep learning has greatly improved supervised image classification. However, labeling data is costly, prompting research into unsupervised learning methods such as contras...
- Bridging Distance and Spectral Positional Encodings via Anchor-Based Diffusion Geometry Approximation : Abstract: Molecular graph learning benefits from positional signals that capture both local neighborhoods and global topology. Two widely used families are spectral encodings derived from Laplacian or...
- Multiagent Reinforcement Learning with Neighbor Action Estimation : Abstract: Multiagent reinforcement learning, as a prominent intelligent paradigm, enables collaborative decision-making within complex systems. However, existing approaches often rely on explicit acti...
- Towards Spatio-Temporal Extrapolation of Phase-Field Simulations with Convolution-Only Neural Networks : Abstract: Phase-field simulations of liquid metal dealloying (LMD) can capture complex microstructural evolutions but can be prohibitively expensive for large domains and long time horizons. In this p...
- The Minary Primitive of Computational Autopoiesis : Abstract: We introduce Minary, a computational framework designed as a candidate for the first formally provable autopoietic primitive. Minary represents interacting probabilistic events as multi-dime...
- Prediction of Cellular Malignancy Using Electrical Impedance Signatures and Supervised Machine Learning : Abstract: Bioelectrical properties of cells such as relative permittivity, conductivity, and characteristic time constants vary significantly between healthy and malignant cells across different frequ...
- Convergence Rates for Learning Pseudo-Differential Operators : Abstract: This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wa...
- SampoNLP: A Self-Referential Toolkit for Morphological Analysis of Subword Tokenizers : Abstract: The quality of subword tokenization is critical for Large Language Models, yet evaluating tokenizers for morphologically rich Uralic languages is hampered by the lack of clean morpheme lexic...
- Concept Tokens: Learning Behavioral Embeddings Through Concept Definitions : Abstract: We propose Concept Tokens, a lightweight method that adds a new special token to a pretrained LLM and learns only its embedding from multiple natural language definitions of a target concept...
- Re-Rankers as Relevance Judges : Abstract: Using large language models (LLMs) to predict relevance judgments has shown promising results. Most studies treat this task as a distinct research line, e.g., focusing on prompt design for p...
- SpectraFormer: an Attention-Based Raman Unmixing Tool for Accessing the Graphene Buffer-Layer Signature on SiC : Abstract: Raman spectroscopy is a key tool for graphene characterization, yet its application to graphene grown on silicon carbide (SiC) is strongly limited by the intense and variable second-order Ra...
- Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays : Abstract: This paper presents a large language model (LLM)-based framework for detecting cyberattacks on transformer current differential relays (TCDRs), which, if undetected, may trigger false trippi...
- Learning Multinomial Logits in $O(n \log n)$ time : Abstract: A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=\{1,..., n\}$, each assigned a positive weight. A query specifies an admissible subset -- called a slate -- and...
- Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces : Abstract: Conventional optimization-based metering depends on strict adherence to precomputed schedules, which limits the flexibility required for the stochastic operations of Advanced Air Mobility (A...
- Disco-RAG: Discourse-Aware Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has emerged as an important means of enhancing the performance of large language models (LLMs) in knowledge-intensive tasks. However, most existing RAG s...
- Comparative Analysis of Custom CNN Architectures versus Pre-trained Models and Transfer Learning: A Study on Five Bangladesh Datasets : Abstract: This study presents a comprehensive comparative analysis of custom-built Convolutional Neural Networks (CNNs) against popular pre-trained architectures (ResNet-18 and VGG-16) using both feat...
- Correct and Weight: A Simple Yet Effective Loss for Implicit Feedback Recommendation : Abstract: Learning from implicit feedback has become the standard paradigm for modern recommender systems. However, this setting is fraught with the persistent challenge of false negatives, where unob...
- Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment Framework : Abstract: We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based...
- A Future Capabilities Agent for Tactical Air Traffic Control : Abstract: Escalating air traffic demand is driving the adoption of automation to support air traffic controllers, but existing approaches face a trade-off between safety assurance and interpretability...
- From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning : Abstract: Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true "forgetting scope" le...
- Systems Explaining Systems: A Framework for Intelligence and Consciousness : Abstract: This paper proposes a conceptual framework in which intelligence and consciousness emerge from relational structure rather than from prediction or domain-specific mechanisms. Intelligence is...
- State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space : Abstract: Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security ...
- Towards a Mechanistic Understanding of Propositional Logical Reasoning in Large Language Models : Abstract: Understanding how Large Language Models (LLMs) perform logical reasoning internally remains a fundamental challenge. While prior mechanistic studies focus on identifying taskspecific circuit...
- Scaling Trends for Multi-Hop Contextual Reasoning in Mid-Scale Language Models : Abstract: We present a controlled study of multi-hop contextual reasoning in large language models, providing a clean demonstration of the task-method dissociation: rule-based pattern matching achieve...
- SAGE-32B: Agentic Reasoning via Iterative Distillation : Abstract: We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, ...
- Automated Reproducibility Has a Problem Statement Problem : Abstract: Background. Reproducibility is essential to the scientific method, but reproduction is often a laborious task. Recent works have attempted to automate this process and relieve researchers of...
- Beyond Interaction Effects: Two Logics for Studying Population Inequalities : Abstract: When sociologists and other social scientist ask whether the return to college differs by race and gender, they face a choice between two fundamentally different modes of inquiry. Traditiona...
- FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback : Abstract: We present FronTalk, a benchmark for front-end code generation that pioneers the study of a unique interaction dynamic: conversational code generation with multi-modal feedback. In front-end...
- TeleTables: A Benchmark for Large Language Models in Telecom Table Interpretation : Abstract: Language Models (LLMs) are increasingly explored in the telecom industry to support engineering tasks, accelerate troubleshooting, and assist in interpreting complex technical documents. How...
- Optimal Lower Bounds for Online Multicalibration : Abstract: We prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend...
- Robust Reasoning as a Symmetry-Protected Topological Phase : Abstract: Large language models suffer from "hallucinations"-logical inconsistencies induced by semantic noise. We propose that current architectures operate in a "Metric Phase," where causal order is...
- EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI : Abstract: Pervasive AI increasingly depends on on-device learning systems that deliver low-latency and energy-efficient computation under strict resource constraints. Liquid State Machines (LSMs) offe...
- An interpretable data-driven approach to optimizing clinical fall risk assessment : Abstract: In this study, we aim to better align fall risk prediction from the Johns Hopkins Fall Risk Assessment Tool (JHFRAT) with additional clinically meaningful measures via a data-driven modellin...
- FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts : Abstract: Spatial-Temporal Graph (STG) forecasting on large-scale networks has garnered significant attention. However, existing models predominantly focus on short-horizon predictions and suffer from...
- Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art : Abstract: This work provides a state-of-the-art survey of continual safe online reinforcement learning (COSRL) methods. We discuss theoretical aspects, challenges, and open questions in building conti...
- Sequential Subspace Noise Injection Prevents Accuracy Collapse in Certified Unlearning : Abstract: Certified unlearning based on differential privacy offers strong guarantees but remains largely impractical: the noisy fine-tuning approaches proposed so far achieve these guarantees but sev...
- Exploring Student Expectations and Confidence in Learning Analytics : Abstract: Learning Analytics (LA) is nowadays ubiquitous in many educational systems, providing the ability to collect and analyze student data in order to understand and optimize learning and the env...
- Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward : Abstract: Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigor...
- DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights : Abstract: Building efficient and effective generative models for neural network weights has been a research focus of significant interest that faces challenges posed by the high-dimensional weight spa...
- A Data-Driven Predictive Framework for Inventory Optimization Using Context-Augmented Machine Learning Models : Abstract: Demand forecasting in supply chain management (SCM) is critical for optimizing inventory, reducing waste, and improving customer satisfaction. Conventional approaches frequently neglect exte...
- Approximate equivariance via projection-based regularisation : Abstract: Equivariance is a powerful inductive bias in neural networks, improving generalisation and physical consistency. Recently, however, non-equivariant models have regained attention, due to the...
- HMVI: Unifying Heterogeneous Attributes with Natural Neighbors for Missing Value Inference : Abstract: Missing value imputation is a fundamental challenge in machine intelligence, heavily dependent on data completeness. Current imputation methods often handle numerical and categorical attribu...
- On the Hidden Objective Biases of Group-based Reinforcement Learning : Abstract: Group-based reinforcement learning methods, like Group Relative Policy Optimization (GRPO), are widely used nowadays to post-train large language models. Despite their empirical success, the...
- On the Definition and Detection of Cherry-Picking in Counterfactual Explanations : Abstract: Counterfactual explanations are widely used to communicate how inputs must change for a model to alter its prediction. For a single instance, many valid counterfactuals can exist, which leav...
- Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following : Abstract: A central belief in scaling reinforcement learning with verifiable rewards for instruction following (IF) tasks is that, a diverse mixture of verifiable hard and unverifiable soft constraint...
- Cardinality augmented loss functions : Abstract: Class imbalance is a common and pernicious issue for the training of neural networks. Often, an imbalanced majority class can dominate training to skew classifier performance towards the maj...
- Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds : Abstract: We investigate distributed online convex optimization with compressed communication, where $n$ learners connected by a network collaboratively minimize a sequence of global loss functions us...
- Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers : Abstract: Applying weight decay (WD) to matrix layers is standard practice in large-language-model pretraining. Prior work suggests that stochastic gradient noise induces a Brownian-like expansion of ...
- FibreCastML: An Open Web Platform for Predicting Electrospun Nanofibre Diameter Distributions : Abstract: Electrospinning is a scalable technique for producing fibrous scaffolds with tunable micro- and nanoscale architectures for applications in tissue engineering, drug delivery, and wound care....
- Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution : Abstract: Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address rel...
- Parallelizing Node-Level Explainability in Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable performance in a wide range of tasks, such as node classification, link prediction, and graph classification, by exploiting the stru...
- Neural-Symbolic Integration with Evolvable Policies : Abstract: Neural-Symbolic (NeSy) Artificial Intelligence has emerged as a promising approach for combining the learning capabilities of neural networks with the interpretable reasoning of symbolic sys...
- AgentOCR: Reimagining Agent History via Optical Self-Compression : Abstract: Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlene...
- Smart IoT-Based Wearable Device for Detection and Monitoring of Common Cow Diseases Using a Novel Machine Learning Technique : Abstract: Manual observation and monitoring of individual cows for disease detection present significant challenges in large-scale farming operations, as the process is labor-intensive, time-consuming...
- Intraday spatiotemporal PV power prediction at national scale using satellite-based solar forecast models : Abstract: We present a novel framework for spatiotemporal photovoltaic (PV) power forecasting and use it to evaluate the reliability, sharpness, and overall performance of seven intraday PV power nowc...
- Fast Mining and Dynamic Time-to-Event Prediction over Multi-sensor Data Streams : Abstract: Given real-time sensor data streams obtained from machines, how can we continuously predict when a machine failure will occur? This work aims to continuously forecast the timing of future ev...
- Excess Description Length of Learning Generalizable Predictors : Abstract: Understanding whether fine-tuning elicits latent capabilities or teaches new ones is a fundamental question for language model evaluation and safety. We develop a formal information-theoreti...
- GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models : Abstract: The key-value (KV) cache in large language models presents a significant memory bottleneck during inference, growing linearly with sequence length and often exceeding the memory footprint of...
- MQ-GNN: A Multi-Queue Pipelined Architecture for Scalable and Efficient GNN Training : Abstract: Graph Neural Networks (GNNs) are powerful tools for learning graph-structured data, but their scalability is hindered by inefficient mini-batch generation, data transfer bottlenecks, and cos...
- A zone-based training approach for last-mile routing using Graph Neural Networks and Pointer Networks : Abstract: Rapid e-commerce growth has pushed last-mile delivery networks to their limits, where small routing gains translate into lower costs, faster service, and fewer emissions. Classical heuristic...
- Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks? : Abstract: Large Language Models (LLMs) have emerged as promising recommendation systems, offering novel ways to model user preferences through generative approaches. However, many existing methods oft...
- Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead : Abstract: Reinforcement Learning (RL) has shown remarkable success in real-world applications, particularly in robotics control. However, RL adoption remains limited due to insufficient safety guarant...
- Estimating Causal Effects in Gaussian Linear SCMs with Finite Data : Abstract: Estimating causal effects from observational data remains a fundamental challenge in causal inference, especially in the presence of latent confounders. This paper focuses on estimating caus...
- Learning Dynamics in RL Post-Training for Language Models : Abstract: Reinforcement learning (RL) post-training is a critical stage in modern language model development, playing a key role in improving alignment and reasoning ability. However, several phenomen...
- DeepHalo: A Neural Choice Model with Controllable Context Effects : Abstract: Modeling human decision-making is central to applications such as recommendation, preference learning, and human-AI alignment. While many classic models assume context-independent choice beh...
- Density Matrix RNN (DM-RNN): A Quantum Information Theoretic Framework for Modeling Musical Context and Polyphony : Abstract: Classical Recurrent Neural Networks (RNNs) summarize musical context into a deterministic hidden state vector, imposing an information bottleneck that fails to capture the inherent ambiguity...
- FedKDX: Federated Learning with Negative Knowledge Distillation for Enhanced Healthcare AI Systems : Abstract: This paper introduces FedKDX, a federated learning framework that addresses limitations in healthcare AI through Negative Knowledge Distillation (NKD). Unlike existing approaches that focus ...
- Spatial-Temporal Feedback Diffusion Guidance for Controlled Traffic Imputation : Abstract: Imputing missing values in spatial-temporal traffic data is essential for intelligent transportation systems. Among advanced imputation methods, score-based diffusion models have demonstrate...
- A Vision for Multisensory Intelligence: Sensing, Synergy, and Science : Abstract: Our experience of the world is multisensory, spanning a synthesis of language, sight, sound, touch, taste, and smell. Yet, artificial intelligence has primarily advanced in digital modalitie...
- Improving Semi-Supervised Contrastive Learning via Entropy-Weighted Confidence Integration of Anchor-Positive Pairs : Abstract: Conventional semi-supervised contrastive learning methods assign pseudo-labels only to samples whose highest predicted class probability exceeds a predefined threshold, and then perform supe...
- GEnSHIN: Graphical Enhanced Spatio-temporal Hierarchical Inference Network for Traffic Flow Prediction : Abstract: With the acceleration of urbanization, intelligent transportation systems have an increasing demand for accurate traffic flow prediction. This paper proposes a novel Graph Enhanced Spatio-te...
- Timeliness-Oriented Scheduling and Resource Allocation in Multi-Region Collaborative Perception : Abstract: Collaborative perception (CP) is a critical technology in applications like autonomous driving and smart cities. It involves the sharing and fusion of information among sensors to overcome t...
- Not All Steps are Informative: On the Linearity of LLMs' RLVR Training : Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a central component of large language model (LLM) post-training. Unlike supervised fine-tuning (SFT), RLVR lets an LLM genera...
- TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation : Abstract: The design of reliable, valid, and diverse molecules is fundamental to modern drug discovery, as improved molecular generation supports efficient exploration of the chemical space for potent...
- Surface-based Molecular Design with Multi-modal Flow Matching : Abstract: Therapeutic peptides show promise in targeting previously undruggable binding sites, with recent advancements in deep generative models enabling full-atom peptide co-design for specific prot...
- IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation : Abstract: Infographics are composite visual artifacts that combine data visualizations with textual and illustrative elements to communicate information. While recent text-to-image (T2I) models can ge...
- Hybrid Federated Learning for Noise-Robust Training : Abstract: Federated learning (FL) and federated distillation (FD) are distributed learning paradigms that train UE models with enhanced privacy, each offering different trade-offs between noise robust...
- When Models Manipulate Manifolds: The Geometry of a Counting Task : Abstract: Language models can perceive visual properties of text despite receiving only sequences of tokens-we mechanistically investigate how Claude 3.5 Haiku accomplishes one such task: linebreaking...
- Meta-probabilistic Modeling : Abstract: While probabilistic graphical models can discover latent structure in data, their effectiveness hinges on choosing well-specified models. Identifying such models is challenging in practice, ...
- Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning : Abstract: The field of learning analytics has made notable strides in automating the detection of complex learning processes in multimodal data. However, most advancements have focused on individualiz...
- Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries : Abstract: Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events. We develop and explain a predictive model for pLos using admission-level patie...
- When Predictions Shape Reality: A Socio-Technical Synthesis of Performative Predictions in Machine Learning : Abstract: Machine learning models are increasingly used in high-stakes domains where their predictions can actively shape the environments in which they operate, a phenomenon known as performative pre...
- Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization : Abstract: Reinforcement learning in discrete combinatorial action spaces requires searching over exponentially many joint actions to simultaneously select multiple sub-actions that form coherent combi...
- Distribution-Guided and Constrained Quantum Machine Unlearning : Abstract: Machine unlearning aims to remove the influence of specific training data from a learned model without full retraining. While recent work has begun to explore unlearning in quantum machine l...
- Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards : Abstract: Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs: sample a completion, verify it, and update. In practice, however, the verifier is a...
- Enhanced-FQL($\lambda$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay : Abstract: This paper introduces a fuzzy reinforcement learning framework, Enhanced-FQL($λ$), that integrates novel Fuzzified Eligibility Traces (FET) and Segmented Experience Replay (SER) into fuzzy Q...
- Aligned explanations in neural networks : Abstract: Feature attribution is the dominant paradigm for explaining deep neural networks. However, most existing methods only loosely reflect the model's prediction-making process, thereby merely wh...
- Machine Learning Model for Sparse PCM Completion : Abstract: In this paper, we propose a machine learning model for sparse pairwise comparison matrices (PCMs), combining classical PCM approaches with graph-based learning techniques. Numerical results ...
- Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning : Abstract: In evolutionary reinforcement learning tasks (ERL), agent policies are often encoded as small artificial neural networks (NERL). Such representations lack explicit modular structure, limitin...
- Phasor Agents: Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning : Abstract: Phasor Agents are dynamical systems whose internal state is a Phasor Graph: a weighted graph of coupled Stuart-Landau oscillators. A Stuart-Landau oscillator is a minimal stable "rhythm gene...
- Causally-Aware Information Bottleneck for Domain Adaptation : Abstract: We tackle a common domain adaptation setting in causal systems. In this setting, the target variable is observed in the source domain but is entirely missing in the target domain. We aim to ...
- Quantifying the Effect of Test Set Contamination on Generative Evaluations : Abstract: As frontier AI systems are pretrained on web-scale data, test set contamination has become a critical concern for accurately assessing their capabilities. While research has thoroughly inves...
- Transformer-Based Multi-Modal Temporal Embeddings for Explainable Metabolic Phenotyping in Type 1 Diabetes : Abstract: Type 1 diabetes (T1D) is a highly metabolically heterogeneous disease that cannot be adequately characterized by conventional biomarkers such as glycated hemoglobin (HbA1c). This study propo...
- ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues : Abstract: The objective assessment of human affective and psychological states presents a significant challenge, particularly through non-verbal channels. This paper introduces digital drawing as a ri...
- Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control : Abstract: We introduce online action-stacking, an inference-time wrapper for reinforcement learning policies that produces realistic air traffic control commands while allowing training on a much smal...
- Enhancing Robustness of Asynchronous EEG-Based Movement Prediction using Classifier Ensembles : Abstract: Objective: Stroke is one of the leading causes of disabilities. One promising approach is to extend the rehabilitation with self-initiated robot-assisted movement therapy. To enable this, it...
- Mitigating Position-Shift Failures in Text-Based Modular Arithmetic via Position Curriculum and Template Diversity : Abstract: Building on insights from the grokking literature, we study character-level Transformers trained to compute modular addition from text, and focus on robustness under input-format variation r...
- LEGATO: Good Identity Unlearning Is Continuous : Abstract: Machine unlearning has become a crucial role in enabling generative models trained on large datasets to remove sensitive, private, or copyright-protected data. However, existing machine unle...
- Generation of synthetic delay time series for air transport applications : Abstract: The generation of synthetic data is receiving increasing attention from the scientific community, thanks to its ability to solve problems like data scarcity and privacy, and is starting to f...
- Unlocking the Pre-Trained Model as a Dual-Alignment Calibrator for Post-Trained LLMs : Abstract: Post-training improves large language models (LLMs) but often worsens confidence calibration, leading to systematic overconfidence. Recent unsupervised post-hoc methods for post-trained LMs ...
- Predictable Gradient Manifolds in Deep Learning: Temporal Path-Length and Intrinsic Rank as a Complexity Regime : Abstract: Deep learning optimization exhibits structure that is not captured by worst-case gradient bounds. Empirically, gradients along training trajectories are often temporally predictable and evol...
- Making Tunable Parameters State-Dependent in Weather and Climate Models with Reinforcement Learning : Abstract: Weather and climate models rely on parametrisations to represent unresolved sub-grid processes. Traditional schemes rely on fixed coefficients that are weakly constrained and tuned offline, ...
- MemKD: Memory-Discrepancy Knowledge Distillation for Efficient Time Series Classification : Abstract: Deep learning models, particularly recurrent neural networks and their variants, such as long short-term memory, have significantly advanced time series data analysis. These models capture c...
- Learning to Reason: Temporal Saliency Distillation for Interpretable Knowledge Transfer : Abstract: Knowledge distillation has proven effective for model compression by transferring knowledge from a larger network called the teacher to a smaller network called the student. Current knowledg...
- Safety-Utility Conflicts Are Not Global: Surgical Alignment via Head-Level Diagnosis : Abstract: Safety alignment in Large Language Models (LLMs) inherently presents a multi-objective optimization conflict, often accompanied by an unintended degradation of general capabilities. Existing...
- Green MLOps: Closed-Loop, Energy-Aware Inference with NVIDIA Triton, FastAPI, and Bio-Inspired Thresholding : Abstract: Energy efficiency is a first-order concern in AI deployment, as long-running inference can exceed training in cumulative carbon impact. We propose a bio-inspired framework that maps protein-...
- The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs : Abstract: Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risk...
Research Sources: 348 | Generated: 1/9/2026
