AI Research News Feeds for December 31st, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Never-Ending Behavior-Cloning Agent for Robotic Manipulation : Abstract: Relying on multi-modal observations, embodied robots (e.g., humanoid robots) could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language...
Multi-agent Self-triage System with Medical Flowcharts : Abstract: Online health resources and large language models (LLMs) are increasingly used as a first point of contact for medical decision-making, yet their reliability in healthcare remains limited by...
Learnable WSN Deployment of Evidential Collaborative Sensing Model : Abstract: In wireless sensor networks (WSNs), coverage and deployment are two most crucial issues when conducting detection tasks. However, the detection information collected from sensors is oftentim...
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage : Abstract: With recent advancements in natural language processing, Large Language Models (LLMs) have emerged as powerful tools for various real-world applications. Despite their prowess, the intrinsic...
Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks : Abstract: Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), a...
Act2Goal: From World Model To General Goal-conditioned Policy : Abstract: Specifying robotic manipulation tasks in a manner that is both expressive and precise remains a central challenge. While visual goals provide a compact and unambiguous task specification, ex...
Theory of Mind for Explainable Human-Robot Interaction : Abstract: Within the context of human-robot interaction (HRI), Theory of Mind (ToM) is intended to serve as a user-friendly backend to the interface of robotic systems, enabling robots to infer and re...
Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation : Abstract: The software supply chain attacks are becoming more and more focused on trusted development and delivery procedures, so the conventional post-build integrity mechanisms cannot be used anymor...
Mobile-Efficient Speech Emotion Recognition Using DistilHuBERT: A Cross-Corpus Validation Study : Abstract: Speech Emotion Recognition (SER) has significant potential for mobile applications, yet deployment remains constrained by the computational demands of state-of-the-art transformer architectu...
PINNs for Electromagnetic Wave Propagation : Abstract: Physics-Informed Neural Networks (PINNs) are a methodology that aims to solve physical systems by directly embedding PDE constraints into the neural network training process. In electromagne...
Securing the AI Supply Chain: What Can We Learn From Developer-Reported Security Issues and Solutions of AI Projects? : Abstract: The rapid growth of Artificial Intelligence (AI) models and applications has led to an increasingly complex security landscape. Developers of AI projects must contend not only with tradition...
AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis : Abstract: The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this pape...
Explainable Neural Inverse Kinematics for Obstacle-Aware Robotic Manipulation: A Comparative Analysis of IKNet Variants : Abstract: Deep neural networks have accelerated inverse-kinematics (IK) inference to the point where low cost manipulators can execute complex trajectories in real time, yet the opaque nature of these...
EquaCode: A Multi-Strategy Jailbreak Approach for Large Language Models via Equation Solving and Code Completion : Abstract: Large language models (LLMs), such as ChatGPT, have achieved remarkable success across a wide range of fields. However, their trustworthiness remains a significant concern, as they are still...
Constraint programming model and biased random-key genetic algorithm for the single-machine coupled task scheduling problem with exact delays to minimize the makespan : Abstract: We consider the strongly NP-hard single-machine coupled task scheduling problem with exact delays to minimize the makespan. In this problem, a set of jobs has to be scheduled, each composed ...
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents : Abstract: Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes t...
Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware : Abstract: The proliferation of Large Language Models (LLMs) has been accompanied by a reliance on cloud-based, proprietary systems, raising significant concerns regarding data privacy, operational sov...
Heterogeneity in Multi-Agent Reinforcement Learning : Abstract: Heterogeneity is a fundamental property in multi-agent reinforcement learning (MARL), which is closely related not only to the functional differences of agents, but also to policy diversity ...
DECEPTICON: How Dark Patterns Manipulate Web Agents : Abstract: Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show tha...
Agentic AI for Cyber Resilience: A New Security Paradigm and Its System-Theoretic Foundations : Abstract: Cybersecurity is being fundamentally reshaped by foundation-model-based artificial intelligence. Large language models now enable autonomous planning, tool orchestration, and strategic adapt...
The body is not there to compute: Comment on "Informational embodiment: Computational role of information structure in codes and robots" by Pitti et al : Abstract: Applying the lens of computation and information has been instrumental in driving the technological progress of our civilization as well as in empowering our understanding of the world aroun...
FasterPy: An LLM-based Code Execution Efficiency Optimization Framework : Abstract: Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rul...
Reach-Avoid Differential game with Reachability Analysis for UAVs: A decomposition approach : Abstract: Reach-avoid (RA) games have significant applications in security and defense, particularly for unmanned aerial vehicles (UAVs). These problems are inherently challenging due to the need to c...
Robust LLM-based Column Type Annotation via Prompt Augmentation with LoRA Tuning : Abstract: Column Type Annotation (CTA) is a fundamental step towards enabling schema alignment and semantic understanding of tabular data. Existing encoder-only language models achieve high accuracy w...
Chord Recognition with Deep Learning : Abstract: Progress in automatic chord recognition has been slow since the advent of deep learning in the field. To understand why, I conduct experiments on existing methods and test hypotheses enabled...
Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring : Abstract: Large Language Models (LLMs) are increasingly deployed as automated tutors to address educator shortages; however, they often fail at pedagogical reasoning, frequently validating incorrect s...
Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving : Abstract: Speculative decoding (SD) accelerates LLM inference by verifying draft tokens in parallel. However, this method presents a critical trade-off: it improves throughput in low-load, memory-boun...
Emergence of Human to Robot Transfer in Vision-Language-Action Models : Abstract: Vision-language-action (VLA) models can enable broad open world generalization, but require large and diverse datasets. It is appealing to consider whether some of this data can come from hu...
A Unified AI, Embedded, Simulation, and Mechanical Design Approach to an Autonomous Delivery Robot : Abstract: This paper presents the development of a fully autonomous delivery robot integrating mechanical engineering, embedded systems, and artificial intelligence. The platform employs a heterogeneo...
Efficient Multi-Model Orchestration for Self-Hosted Large Language Models : Abstract: Self-hosting large language models (LLMs) is increasingly appealing for organizations seeking privacy, cost control, and customization. Yet deploying and maintaining in-house models poses ch...
Space AI: Leveraging Artificial Intelligence for Space to Improve Life on Earth : Abstract: Artificial Intelligence (AI) is transforming domains from healthcare and agriculture to finance and industry. As progress on Earth meets growing constraints, the next frontier is outer space...
AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents : Abstract: The rise of Large Language Models (LLMs) as coding agents promises to accelerate software development, but their impact on generated code reproducibility remains largely unexplored. This pap...
Cost-Aware Text-to-SQL: An Empirical Study of Cloud Compute Costs for LLM-Generated Queries : Abstract: Text-to-SQL systems powered by Large Language Models (LLMs) achieve high accuracy on standard benchmarks, yet existing efficiency metrics such as the Valid Efficiency Score (VES) measure exe...
Beyond Single Bugs: Benchmarking Large Language Models for Multi-Vulnerability Detection : Abstract: Large Language Models (LLMs) have demonstrated significant potential in automated software security, particularly in vulnerability detection. However, existing benchmarks primarily focus on ...
ReVEAL: GNN-Guided Reverse Engineering for Formal Verification of Optimized Multipliers : Abstract: We present ReVEAL, a graph-learning-based method for reverse engineering of multiplier architectures to improve algebraic circuit verification techniques. Our framework leverages structural ...
Agentic Software Issue Resolution with Large Language Models: A Survey : Abstract: Software issue resolution aims to address real-world issues in software repositories (e.g., bug fixing and efficiency optimization) based on natural language descriptions provided by users, ...
Scalable Cloud-Native Architectures for Intelligent PMU Data Processing : Abstract: Phasor Measurement Units (PMUs) generate high-frequency, time-synchronized data essential for real-time power grid monitoring, yet the growing scale of PMU deployments creates significant ch...
Literature Mining System for Nutraceutical Biosynthesis: From AI Framework to Biological Insight : Abstract: The extraction of structured knowledge from scientific literature remains a major bottleneck in nutraceutical research, particularly when identifying microbial strains involved in compound b...
Interpretable Link Prediction in AI-Driven Cancer Research: Uncovering Co-Authorship Patterns : Abstract: Artificial intelligence (AI) is transforming cancer diagnosis and treatment. The intricate nature of this disease necessitates the collaboration of diverse stakeholders with varied expertise...
iOS as Acceleration : Abstract: Practical utilization of large-scale machine learning requires a powerful compute setup, a necessity which poses a significant barrier to engagement with such artificial intelligence in more...
BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs : Abstract: Large Language Models (LLMs) deployed in practical and safety-critical settings are increasingly susceptible to bit-flip faults caused by hardware degradation, cosmic radiation, or deliberat...
Solving Multi-Agent Multi-Goal Path Finding Problems in Polynomial Time : Abstract: In this paper, we plan missions for a fleet of agents in undirected graphs, such as grids, with multiple goals. In contrast to regular multi-agent path-finding, the solver finds and updates ...
Practical challenges of control monitoring in frontier AI deployments : Abstract: Automated control monitors could play an important role in overseeing highly capable AI agents that we do not fully trust. Prior work has explored control monitoring in simplified settings, ...
Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless Environments : Abstract: Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence. However, efficiently deployi...
Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification : Abstract: Recent speaker verification studies have achieved notable success by leveraging layer-wise output from pre-trained Transformer models. However, few have explored the advancements in aggregat...
Pre-review to Peer review: Pitfalls of Automating Reviews using Large Language Models : Abstract: Large Language Models are versatile general-task solvers, and their capabilities can truly assist people with scholarly peer review as \textit{pre-review} agents, if not as fully autonomous ...
HLS4PC: A Parametrizable Framework For Accelerating Point-Based 3D Point Cloud Models on FPGA : Abstract: Point-based 3D point cloud models employ computation and memory intensive mapping functions alongside NN layers for classification/segmentation, and are executed on server-grade GPUs. The sp...
SoDA: An Efficient Interaction Paradigm for the Agentic Web : Abstract: As the internet evolves from the mobile App-dominated Attention Economy to the Intent-Interconnection of the Agentic Web era, existing interaction modes fail to address the escalating challe...
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems : Abstract: The proliferation of GPU-accelerated workloads, particularly in artificial intelligence and large language model (LLM) inference, has created unprecedented demand for efficient GPU resource ...
Physics-Informed Neural Networks for Device and Circuit Modeling: A Case Study of NeuroSPICE : Abstract: We present NeuroSPICE, a physics-informed neural network (PINN) framework for device and circuit simulation. Unlike conventional SPICE, which relies on time-discretized numerical solvers, Ne...
Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation : Abstract: Large language models (LLMs) have significant potential for generating educational questions and problems, enabling educators to create large-scale learning materials. However, LLMs are fund...
Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities : Abstract: How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI assistance and the AI shutdown games. The AI assis...
The Gaining Paths to Investment Success: Information-Driven LLM Graph Reasoning for Venture Capital Prediction : Abstract: Most venture capital (VC) investments fail, while a few deliver outsized returns. Accurately predicting startup success requires synthesizing complex relational evidence, including company d...
The World Is Bigger! A Computationally-Embedded Perspective on the Big World Hypothesis : Abstract: Continual learning is often motivated by the idea, known as the big world hypothesis, that "the world is bigger" than the agent. Recent problem formulations capture this idea by explicitly c...
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning : Abstract: Traditional workflow-based agents exhibit limited intelligence when addressing real-world problems requiring tool invocation. Tool-integrated reasoning (TIR) agents capable of autonomous rea...
On Conformant Planning and Model-Checking of $\exists^*\forall^*$ Hyperproperties : Abstract: We study the connection of two problems within the planning and verification community: Conformant planning and model-checking of hyperproperties. Conformant planning is the task of finding ...
TCEval: Using Thermal Comfort to Assess Cognitive and Perceptual Abilities of AI : Abstract: A critical gap exists in LLM task-specific benchmarks. Thermal comfort, a sophisticated interplay of environmental factors and personal perceptions involving sensory integration and adaptive...
From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research : Abstract: Large language models (LLMs) are increasingly used to simulate human behavior, but common practices to use LLM-generated data are inefficient. Treating an LLM's output ("model choice") as a ...
Why We Need a New Framework for Emotional Intelligence in AI : Abstract: In this paper, we develop the position that current frameworks for evaluating emotional intelligence (EI) in artificial intelligence (AI) systems need refinement because they do not adequate...
Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K--12 Education : Abstract: The rapid rise of large language model (LLM)-based tutors in K--12 education has fostered a misconception that generative models can replace traditional learner modelling for adaptive instru...
SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning : Abstract: Portfolio optimization in non-stationary markets is challenging due to regime shifts, dynamic correlations, and the limited interpretability of deep reinforcement learning (DRL) policies. We...
TravelBench: A Real-World Benchmark for Multi-Turn and Tool-Augmented Travel Planning : Abstract: Large language model (LLM) agents have demonstrated strong capabilities in planning and tool use. Travel planning provides a natural and high-impact testbed for these capabilities, as it req...
DICE: Discrete Interpretable Comparative Evaluation with Probabilistic Scoring for Retrieval-Augmented Generation : Abstract: As Retrieval-Augmented Generation (RAG) systems evolve toward more sophisticated architectures, ensuring their trustworthiness through explainable and robust evaluation becomes critical. Exi...
The Wisdom of Deliberating AI Crowds: Does Deliberation Improve LLM-Based Forecasting? : Abstract: Structured deliberation has been found to improve the performance of human forecasters. This study investigates whether a similar intervention, i.e. allowing LLMs to review each other's fore...
LLM Agents as VC investors: Predicting Startup Success via RolePlay-Based Collective Simulation : Abstract: Due to the high value and high failure rate of startups, predicting their success has become a critical challenge across interdisciplinary research. Existing approaches typically model succe...
Tyee: A Unified, Modular, and Fully-Integrated Configurable Toolkit for Intelligent Physiological Health Care : Abstract: Deep learning has shown great promise in physiological signal analysis, yet its progress is hindered by heterogeneous data formats, inconsistent preprocessing strategies, fragmented model pi...
SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G : Abstract: Agentic AI networking (AgentNet) is a novel AI-native networking paradigm in which a large number of specialized AI agents collaborate to perform autonomous decision-making, dynamic environm...
Lessons from Neuroscience for AI: How integrating Actions, Compositional Structure and Episodic Memory could enable Safe, Interpretable and Human-Like AI : Abstract: The phenomenal advances in large language models (LLMs) and other foundation models over the past few years have been based on optimizing large-scale transformer models on the surprisingly s...
Multi-AI Agent Framework Reveals the "Oxide Gatekeeper" in Aluminum Nanoparticle Oxidation : Abstract: Aluminum nanoparticles (ANPs) are among the most energy-dense solid fuels, yet the atomic mechanisms governing their transition from passivated particles to explosive reactants remain elusiv...
DarkPatterns-LLM: A Multi-Layer Benchmark for Detecting Manipulative and Harmful AI Behavior : Abstract: The proliferation of Large Language Models (LLMs) has intensified concerns about manipulative or deceptive behaviors that can undermine user autonomy, trust, and well-being. Existing safety ...
Lightweight Inference-Time Personalization for Frozen Knowledge Graph Embeddings : Abstract: Foundation models for knowledge graphs (KGs) achieve strong cohort-level performance in link prediction, yet fail to capture individual user preferences; a key disconnect between general rel...
HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification : Abstract: Artificial Intelligence (AI), particularly Large Language Models (LLMs), is transforming scientific discovery, enabling rapid knowledge generation and hypothesis formulation. However, a crit...
Subgoaling Relaxation-based Heuristics for Numeric Planning with Infinite Actions : Abstract: Numeric planning with control parameters extends the standard numeric planning model by introducing action parameters as free numeric variables that must be instantiated during planning. Thi...
With Great Capabilities Come Great Responsibilities: Introducing the Agentic Risk & Capability Framework for Governing Agentic AI Systems : Abstract: Agentic AI systems present both significant opportunities and novel risks due to their capacity for autonomous action, encompassing tasks such as code execution, internet interaction, and fi...
Toward Equitable Recovery: A Fairness-Aware AI Framework for Prioritizing Post-Flood Aid in Bangladesh : Abstract: Post-disaster aid allocation in developing nations often suffers from systematic biases that disadvantage vulnerable regions, perpetuating historical inequities. This paper presents a fairne...
GamiBench: Evaluating Spatial Reasoning and 2D-to-3D Planning Capabilities of MLLMs with Origami Folding Tasks : Abstract: Multimodal large language models (MLLMs) are proficient in perception and instruction-following, but they still struggle with spatial reasoning: the ability to mentally track and manipulate ...
Emergent Persuasion: Will LLMs Persuade Without Being Prompted? : Abstract: With the wide-scale adoption of conversational AI systems, AI are now able to exert unprecedented influence on human opinion and beliefs. Recent work has shown that many Large Language Model...
Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation : Abstract: Retrieval-Augmented Generation RAG systems enhance large language models by grounding responses in external knowledge bases, but conventional RAG architectures operate with static corpora th...

Research Sources: 78 | Generated: 12/31/2025