AI Research News Feeds for September 30th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Responsible AI Technical Report
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
EmbeddingGemma: Powerful and Lightweight Text Representations
i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents
SiNGER: A Clearer Voice Distills Vision Transformers Further
Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula
Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance
The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection
A Systematic Survey on Large Language Models for Evolutionary Optimization: From Modeling to Solving
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Accurate and Efficient Low-Rank Model Merging in Core Space
StefaLand: An Efficient Geoscience Foundation Model That Improves Dynamic Land-Surface Predictions
Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory
Self-Evolving LLMs via Continual Instruction Tuning
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Diversity Boosts AI-Generated Text Detection
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Making Language Model a Hierarchical Classifier
Learning to summarize user information for personalized reinforcement learning from human feedback
GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
The Ever-Evolving Science Exam
When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
Can Language Models Discover Scaling Laws?
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models
PakBBQ: A Culturally Adapted Bias Benchmark for QA
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
Improving LLM Reasoning through Interpretable Role-Playing Steering
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
Discrete Audio Tokens: More Than a Survey!
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework
Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models
Long-Context Generalization with Sparse Attention
Do We Need Large VLMs for Spotting Soccer Actions?
Adaptive Sample Scheduling for Direct Preference Optimization
From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
GRAF: Multi-turn Jailbreaking via Global Refinement and Active Fabrication
Improving Black-Box Generative Attacks via Generator Semantic Consistency
OmniGen2: Exploration to Advanced Multimodal Generation
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
R1-Ranker: Teaching LLM Rankers to Reason
Enhancing Live Broadcast Engagement: A Multi-modal Approach to Short Video Recommendations Using MMGCN and User Preferences
Semantic-guided Diverse Decoding for Large Language Model
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
Learning to Segment for Vehicle Routing Problems
Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Communication-Efficient Desire Alignment for Embodied Agent-Human Adaptation
CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
WorldGym: World Model as An Environment for Policy Evaluation
GRAM: Spatial general-purpose audio representation models for real-world applications
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
Towards Better Generalization via Distributional Input Projection Network
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
TreeRPO: Tree Relative Policy Optimization
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Visual Planning: Let's Think Only with Images
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement
AdaBoN: Adaptive Best-of-N Alignment
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
Mechanistic Fine-tuning for In-context Learning
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
Scaling Diffusion Transformers Efficiently via $\mu$P
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Scalable Graph Generative Modeling via Substructure Sequences
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Runtime Adaptive Pruning for LLM Inference
InfoDet: A Dataset for Infographic Element Detection
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Reward Model Overoptimisation in Iterated RLHF
Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
EnvSDD: Benchmarking Environmental Sound Deepfake Detection
A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
ePC: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
Variational Deep Learning via Implicit Regularization
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
PDFBench: A Benchmark for De novo Protein Design from Function
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Advanced Architectures Integrated with Agentic AI for Next-Generation Wireless Networks
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection
3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
UltraIF: Advancing Instruction Following from the Wild
Confidence Improves Self-Consistency in LLMs
OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting
Comprehensive Review of Neural Differential Equations for Time Series Analysis
Collaborative Deterministic-Probabilistic Forecasting for Diverse Spatiotemporal Systems
PAFT: Prompt-Agnostic Fine-Tuning
B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective POMDPs and its Impact on Randomised Strategies
SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
Delta-Triplane Transformers as Occupancy World Models
UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Implicit Bias-Like Patterns in Reasoning Models
RISE: Robust Imitation through Stochastic Encoding
What Makes a Reward Model a Good Teacher? An Optimization Perspective
On The Sample Complexity Bounds In Bilevel Reinforcement Learning
Reasoning to Learn from Latent Thoughts
Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials - A review
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
Beyond Synthetic Replays: Turning Diffusion Features into Few-Shot Class-Incremental Learning Knowledge
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
Efficient Reasoning Models: A Survey
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
Dynamic Early Exit in Reasoning Models
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Continual Dialogue State Tracking via Example-Guided Question Answering
A Double Machine Learning Approach to Combining Experimental and Observational Data
Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies
Ocassionally Secure: A Comparative Analysis of Code Generation Assistants
BlockFUL: Enabling Unlearning in Blockchained Federated Learning
Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity
FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability
Position: Towards Bidirectional Human-AI Alignment
Understanding Transformer Architecture through Continuous Dynamics: A Partial Differential Equation Perspective
Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
A GREAT Architecture for Edge-Based Graph Problems Like TSP
Parse Trees Guided LLM Prompt Compression
Distributed AI Platform for the 6G RAN
Disentangling Regional Primitives for Image Generation
Extracting Moore Machines from Transformers using Queries and Counterexamples
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
Gradient-Free Training of Quantized Neural Networks
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Self-Normalized Resets for Plasticity in Continual Learning
PACER: Physics Informed Uncertainty Aware Climate Emulator
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair
Adapting Chat Language Models Using Only Target Unlabeled Language Data
Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
A Partition Cover Approach to Tokenization
CGI: Identifying Conditional Generative Models with Example Images
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities
Principal Components for Neural Network Initialization
Beyond checkmate: exploring the creative chokepoints in AI text
Vintix: Action Model via In-Context Reinforcement Learning
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Do Larger Language Models Generalize Better? A Scaling Law for Implicit Reasoning at Pretraining Time
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery
Fuzzy Information Evolution with Three-Way Decision in Social Network Group Decision-Making
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
TabularGSM: Understanding the Limitations of LLMs in Tabular Math Reasoning
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence
Efficient LLM Collaboration via Planning
Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety
The 4th Dimension for Scaling Model Size
Breaking Rank Bottlenecks in Knowledge Graph Embeddings
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
GTA1: GUI Test-time Scaling Agent
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
Neuromorphic Intelligence
Imagined Autocurricula
Memory-QA: Answering Recall Questions Based on Multimodal Memories
LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis
The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging
WordAlchemy: A transformer-based Reverse Dictionary
Scaling Generalist Data-Analytic Agents
jina-reranker-v3: Last but Not Late Interaction for Document Reranking
Scaling with Collapse: Efficient and Predictable Training of LLM Families
ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
Towards Personalized Deep Research: Benchmarks and Evaluations
Score Distillation of Flow Matching Models
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Rethinking Entropy Regularization in Large Reasoning Models
Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
Pretraining Large Language Models with NVFP4
Chance-constrained Flow Matching for High-Fidelity Constraint-aware Generation
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
Incentive-Aligned Multi-Source LLM Summaries
Guided Diffusion for the Discovery of New Superconductors
InfoAgent: Advancing Autonomous Information-Seeking Agents
Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs
Taking control: Policies to address extinction risks from advanced AI
Understanding the Effects of Miscalibrated AI Confidence on User Trust, Reliance, and Decision Efficacy
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning
Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
Neuro-Symbolic Entity Alignment via Variational Inference
A Neurosymbolic Fast and Slow Architecture for Graph Coloring
From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle
GUI Agents: A Survey
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Broadening Ontologization Design: Embracing Data Pipeline Strategies
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs
Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
Putnam-like dataset summary: LLMs as mathematical competition contestants
Evaluating SAP Joule for Code Generation
SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
Hierarchical Error Correction for Large Language Models: A Systematic Framework for Domain-Specific AI Quality Enhancement
Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation
Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
Segmentor-Guided Counterfactual Fine-Tuning for Image Synthesis
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Scalable GANs with Transformers
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes
Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Fast Real-Time Pipeline for Robust Arm Gesture Recognition
Large Language Models for Software Testing: A Research Roadmap
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
CoTune: Co-evolutionary Configuration Tuning
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
T-POP: Test-Time Personalization with Online Preference Feedback
FedPOB: Sample-Efficient Federated Prompt Optimization via Bandits
Circuit-Aware Reward Training: A Mechanistic Framework for Longtail Robustness in RLHF
Discrete Variational Autoencoding via Policy Search
Q-Net: Transferable Queue Length Estimation via Kalman-based Neural Networks
A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Surjective Independence of Causal Influences for Local Bayesian Network Structures
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
Quantifying Generalisation in Imitation Learning
Sparse Autoencoders Make Audio Foundation Models more Explainable
Fidelity-Aware Data Composition for Robust Robot Generalization
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting
RDD: Pareto Analysis of the Rate-Distortion-Distinguishability Trade-off
Intelligent Optimization of Wireless Access Point Deployment for Communication-Based Train Control Systems Using Deep Reinforcement Learning
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
Short window attention enables long-term memorization
Deep Reinforcement Learning in Action: Real-Time Control of Vortex-Induced Vibrations
AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
Bandits roaming Hilbert space
PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control
Algorithms and data structures for automatic precision estimation of neural networks
Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
VNODE: A Piecewise Continuous Volterra Neural Network
Community detection robustness of graph neural networks
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Understanding the Dilemma of Unlearning for Large Language Models
Reference-Free Rating of LLM Responses via Latent Information
Data-Driven Discrete Geofence Design Using Binary Quadratic Programming
LAMP-PRo: Label-aware Attention for Multi-label Prediction of DNA- and RNA-binding Proteins using Protein Language Models
Cycle Diffusion Model for Counterfactual Image Generation
Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs
Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
A study of Universal ODE approaches to predicting soil organic carbon
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation
UI-UG: A Unified MLLM for UI Understanding and Generation
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Watermarking Diffusion Language Models
From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
LLaDA-MoE: A Sparse MoE Diffusion Language Model
The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies
Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents
Hybrid Layer-Wise ANN-SNN With Surrogate Spike Encoding-Decoding Structure
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers
A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models
Multi-Item-Query Attention for Stable Sequential Recommendation
Alternatives To Next Token Prediction In Text Generation - A Survey
EOE: Evolutionary Optimization of Experts for Training Language Models
An Agent-Based Framework for Automated Higher-Voice Harmony Generation
Moravec's Paradox and Restrepo's Model: Limits of AGI Automation in Growth
LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
LLM DNA: Tracing Model Evolution via Functional Representations
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Agentic Specification Generator for Move Programs
PhysiAgent: An Embodied Agent Framework in Physical World
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
FrameMind: Frame-Interleaved Chain-of-Thought for Video Reasoning via Reinforcement Learning
From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex
PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
In-Context Compositional Q-Learning for Offline Reinforcement Learning
A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding
PEARL: Peer-Enhanced Adaptive Radio via On-Device LLM
Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
PerfBench: Can Agents Resolve Real-World Performance Bugs?
GEAR: A General Evaluation Framework for Abductive Reasoning
Ancestry Tree Clustering for Particle Filter Diversity Maintenance
The Impossibility of Inverse Permutation Learning in Transformer Models
BOSfM: A View Planning Framework for Optimal 3D Reconstruction of Agricultural Scenes
ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings
EYE-DEX: Eye Disease Detection and EXplanation System
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
TENET: Leveraging Tests Beyond Validation for Code Generation
Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis
Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
Retrieval-augmented GUI Agents with Generative Guidelines
Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Chat to Chip: Large Language Model Based Design of Arbitrarily Shaped Metasurfaces
Can Large Language Models Express Uncertainty Like Human?
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
Metamorphic Testing for Audio Content Moderation Software
Conda: Column-Normalized Adam for Training Large Language Models Faster
ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Prompt and Parameter Co-Optimization for Large Language Models
Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization
SHAPoint: Task-Agnostic, Efficient, and Interpretable Point-Based Risk Scoring via Shapley Values
Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization
Knowledge Homophily in Large Language Models
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
Tequila: Trapping-free Ternary Quantization for Large Language Models
Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models
IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
Space Group Conditional Flow Matching
HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
Adversarial Diffusion for Robust Reinforcement Learning
GSID: Generative Semantic Indexing for E-Commerce Product Understanding
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification
Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Gradient Flow Convergence Guarantee for General Neural Network Architectures
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition
EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging
Continual Learning to Generalize Forwarding Strategies for Diverse Mobile Wireless Networks
Graph Mixing Additive Networks
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step
HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
Diffusion Models are Kelly Gamblers
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues
MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control
Toward Preference-aligned Large Language Models via Residual-based Model Steering
The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact
Guide: Generalized-Prior and Data Encoders for DAG Estimation
The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Node Classification via Simplicial Interaction with Augmented Maximal Clique Selection
Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting
Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence
Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales
ML-Asset Management: Curation, Discovery, and Utilization
Improving the Efficiency of LLM Agent Systems through Trajectory Reduction
Toward a Holistic Approach to Continual Model Merging
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Multi-Level Heterogeneous Knowledge Transfer Network on Forward Scattering Center Model for Limited Samples SAR ATR
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects
GraphIFE: Rethinking Graph Imbalance Node Classification via Invariant Learning
BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment
RIV: Recursive Introspection Mask Diffusion Vision Language Model
LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Pure Node Selection for Imbalanced Graph Node Classification
Calibration Meets Reality: Making Machine Learning Predictions Trustworthy
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Graph Neural Networks with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion for Multivariate Time Series Forecasting
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
Towards a Comprehensive Scaling Law of Mixture-of-Experts
Joint Hybrid Beamforming and Artificial Noise Design for Secure Multi-UAV ISAC Networks
Estimating Time Series Foundation Model Transferability via In-Context Learning
CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
Video Panels for Long Video Understanding
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
LocoFormer: Generalist Locomotion via Long-context Adaptation
Poivre: Self-Refining Visual Pointing with Reinforcement Learning
PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following
Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
AI Education in Higher Education: A Taxonomy for Curriculum Reform and the Mission of Knowledge
MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Graph Your Own Prompt
CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Enhanced Fracture Diagnosis Based on Critical Regional and Scale Aware in YOLO
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Hybrid Graph Embeddings and Louvain Algorithm for Unsupervised Community Detection
Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Enhancing Communication Efficiency in FL with Adaptive Gradient Quantization and Communication Frequency Optimization
NeuroBridge: Using Generative AI to Bridge Cross-neurotype Communication Differences through Neurotypical Perspective-taking
AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
Data-Efficient Training by Evolved Sampling
Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
Multi-Modal Manipulation via Multi-Modal Policy Consensus
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
Revisiting Multivariate Time Series Forecasting with Missing Values
The Impact of Role Design in In-Context Learning for Large Language Models
Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update
From Human Annotation to Automation: LLM-in-the-Loop Active Learning for Arabic Sentiment Analysis
Evaluating point-light biological motion in multimodal large language models
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Privy: Envisioning and Mitigating Privacy Risks for Consumer-facing AI Product Concepts
Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis
On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
End-to-End Deep Learning for Predicting Metric Space-Valued Outputs
Disentanglement of Variations with Multimodal Generative Modeling
Automatic Speech Recognition for Greek Medical Dictation
Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble
Pancreas Part Segmentation under Federated Learning Paradigm
HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Liaohe-CobotMagic-PnP: an Imitation Learning Dataset of Intelligent Robot for Industrial Applications
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Deep Learning-Based Detection of Cognitive Impairment from Passive Smartphone Sensing with Routine-Aware Augmentation and Demographic Personalization
Dense associative memory on the Bures-Wasserstein space
TRAX: TRacking Axles for Accurate Axle Count Estimation
WARBERT: A Hierarchical BERT-based Model for Web API Recommendation
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
Towards Monotonic Improvement in In-Context Reinforcement Learning
One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
Online Dynamic Goal Recognition in Gym Environments
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Continuous-Time Reinforcement Learning for Asset-Liability Management
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
A Neural ODE Approach to Aircraft Flight Dynamics Modelling
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
MELCOT: A Hybrid Learning Architecture with Marginal Preservation for Matrix-Valued Regression
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
Space Robotics Bench: Robot Learning Beyond Earth
Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting
Signal Preserving Weight Initialization for Odd-Sigmoid Activations
Causally-Enhanced Reinforcement Policy Optimization
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
Towards Quantum-Ready Blockchain Fraud Detection via Ensemble Graph Neural Networks
Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
Adaptive Margin RLHF via Preference over Preferences
Patient-specific Biomolecular Instruction Tuning
Observation-Free Attacks on Online Learning to Rank
Scalable Wi-Fi RSS-Based Indoor Localization via Automatic Vision-Assisted Calibration
From Noise to Knowledge: A Comparative Study of Acoustic Anomaly Detection Models in Pumped-storage Hydropower Plants
Convolutional Set Transformer
Extract-0: A Specialized Language Model for Document Information Extraction
TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Large language models management of medications: three performance analyses
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
Compute-Optimal Quantization-Aware Training
Unsupervised Speech Enhancement using Data-defined Priors
What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
Tiny-QMoE
Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
LLM Watermark Evasion via Bias Inversion
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
IsingFormer: Augmenting Parallel Tempering With Learned Proposals
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents
Localizing Adversarial Attacks To Produces More Imperceptible Noise
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
A Data-Driven Framework for Digital Transformation in Smart Cities: Integrating AI, Dashboards, and IoT Readiness
A Meta-Analysis of LLM Effects on Students across Qualification, Socialisation, and Subjectification
Prompt-aware classifier free guidance for diffusion models
Multi-Modal Sentiment Analysis with Dynamic Attention Fusion
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
Rebuild AC Power Flow Models with Graph Attention Networks
Automated Formative Feedback for Short-form Writing: An LLM-Driven Approach and Adoption Analysis
Regulating the Agency of LLM-based Agents
Consistency Models as Plug-and-Play Priors for Inverse Problems
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
Societal Capacity Assessment Framework: Measuring Resilience to Inform Advanced AI Risk Management
Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
MIRAGE: Multi-hop Reasoning with Ambiguity Evaluation for Illusory Questions
Variance-Bounded Evaluation without Ground Truth: VB-Score
Self-driving cars: Are we there yet?
Persistent Autoregressive Mapping with Traffic Rules for Autonomous Driving
Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
In-Context Learning can Perform Continual Learning Like Humans
A theoretical guarantee for SyncRank
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data
VideoScore2: Think before You Score in Generative Video Evaluation
MTRec: Learning to Align with User Preferences via Mental Reward Models
MMPB: It's Time for Multi-Modal Personalization
Dynamic Buffers: Cost-Efficient Planning for Tabletop Rearrangement with Stacking
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
Bridging Language Models and Formal Methods for Intent-Driven Optical Network Design
Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
Multimodal Slice Interaction Network Enhanced by Transfer Learning for Precise Segmentation of Internal Gross Tumor Volume in Lung Cancer PET/CT Imaging
On the Self-awareness of Large Reasoning Models' Capability Boundaries
Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG
From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Query Circuits: Explaining How Language Models Answer User Prompts
Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System
The Emergence of Social Science of Large Language Models
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
Neural network embeddings recover value dimensions from psychometric survey items on par with human data
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?
KIRETT - A wearable device to support rescue operations using artificial intelligence to improve first aid
Agentic Exploration of Physics Models
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
Scaling Synthetic Task Generation for Agents via Exploration
Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
HeDA: An Intelligent Agent System for Heatwave Risk Discovery through Automated Knowledge Graph Construction and Multi-layer Risk Propagation Analysis
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
The Era of Real-World Human Interaction: RL from User Conversations
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Visual serial processing deficits explain divergences in human and VLM reasoning
UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
BenLOC: A Benchmark for Learning to Configure MIP Optimizers
YOLO-based Bearing Fault Diagnosis With Continuous Wavelet Transform
Agentic DDQN-Based Scheduling for Licensed and Unlicensed Band Allocation in Sidelink Networks
Green Learning for STAR-RIS mmWave Systems with Implicit CSI
How are Scientific Concepts Birthed? Typing Rules of Concept Formation in Theoretical Physics Reasoning
Sustainable LSTM-Based Precoding for RIS-Aided mmWave MIMO Systems with Implicit CSI
GOAT: A Large Dataset of Paired Guitar Audio Recordings and Tablatures
How good are LLMs at Retrieving Documents in a Specific Domain?
Fairness for niche users and providers: algorithmic choice and profile portability
Next Point-of-interest (POI) Recommendation Model Based on Multi-modal Spatio-temporal Context Feature Embedding
PISA: An AI Pipeline for Interpretable-by-design Survival Analysis Providing Multiple Complexity-Accuracy Trade-off Models
Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization
AccessEval: Benchmarking Disability Bias in Large Language Models
Intelligent Load Balancing in Cloud Computer Systems
GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education through Automated Question Generation and Interactive Assessment
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
AgentGuard: Runtime Verification of AI Agents
Rethinking Reward Miscalibration of GRPO in Agentic RL
Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Automatic selection of primary studies in systematic reviews with evolutionary rule-based classification
TusoAI: Agentic Optimization for Scientific Methods
LLM/Agent-as-Data-Analyst: A Survey
Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education
Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
Transparent, Evaluable, and Accessible Data Agents: A Proof-of-Concept Framework
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
Humanline: Online Alignment as Perceptual Loss
ELHPlan: Efficient Long-Horizon Task Planning for Multi-Agent Collaboration
Learning to Ponder: Adaptive Reasoning in Latent Space
Model Merging Scaling Laws in Large Language Models
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations
Rethinking and Benchmarking Large Language Models for Graph Reasoning
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
PAME-AI: Patient Messaging Creation and Optimization using Agentic AI
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
SCI-Verifier: Scientific Verifier with Thinking
Experience Paper: Adopting Activity Recognition in On-demand Food Delivery Business
MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning
humancompatible.detect: a Python Toolkit for Detecting Bias in AI Models
Fin-Ally: Pioneering the Development of an Advanced, Commonsense-Embedded Conversational AI for Money Matters
From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
A Systematic Review of Digital Twin-Driven Predictive Maintenance in Industrial Engineering: Taxonomy, Architectural Elements, and Future Research Directions
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Overcoming Over-Fitting in Constraint Acquisition via Query-Driven Interactive Refinement
Neuroplasticity-inspired dynamic ANNs for multi-task demand forecasting
Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design
Training Agents Inside of Scalable World Models
BPMN Assistant: An LLM-Based Approach to Business Process Modeling
LTL$_f$ Learning Meets Boolean Set Cover
"Stop replacing salt with sugar!'': Towards Intuitive Human-Agent Teaching
Successful Misunderstandings: Learning to Coordinate Without Being Understood
SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
AI-Enhanced Distributed Channel Access for Collision Avoidance in Future Wi-Fi 8
Limit Analysis for Symbolic Multi-step Reasoning Tasks with Information Propagation Rules Based on Transformers
Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
$p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
GUI-PRA: Process Reward Agent for GUI Tasks
Socio-Economic Model of AI Agents
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
Democratizing AI scientists using ToolUniverse
Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models
Accurate Predictions in Education with Discrete Variational Inference
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Dynamic Trust Calibration Using Contextual Bandits
Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
DOoM: Difficult Olympiads of Math
Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
A Hierarchical Structure-Enhanced Personalized Recommendation Model for Traditional Chinese Medicine Formulas Based on KG Diffusion Guidance
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
How LLMs Learn to Reason: A Complex Network Perspective
Game-Oriented ASR Error Correction via RAG-Enhanced LLM
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Measuring Sparse Autoencoder Feature Sensitivity
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models
EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance
Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark
GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks
Transparent Visual Reasoning via Object-Centric Agent Collaboration
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Falcon: A Cross-Modal Evaluation Dataset for Comprehensive Safety Perception
From Frustration to Fun: An Adaptive Problem-Solving Puzzle Game Powered by Genetic Algorithm
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Can Large Language Models Develop Gambling Addiction?
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research
JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
Not only a helper, but also a teacher: Interactive LLM Cascade
Towards Strategic Persuasion with Language Models
AI Noether -- Bridging the Gap Between Scientific Laws Derived by AI Systems and Canonical Knowledge via Abductive Inference
Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
Risk Profiling and Modulation for LLMs
Multiplayer Nash Preference Optimization
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
Exploring LLM-based Frameworks for Fault Diagnosis
Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges
Learning Smooth State-Dependent Traversability from Dense Point Clouds
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
Origins of Creativity in Attention-Based Diffusion Models
RAM-W1K: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis
Warm Starts Accelerate Conditional Diffusion
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
3D-LATTE: Latent Space 3D Editing from Textual Instructions
Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
Implicit-ARAP: Efficient Handle-Guided Neural Field Deformation via Local Patch Meshing
Differential Encoding for Improved Representation Learning over Graphs
Attentive Dilated Convolution for Automatic Sleep Staging using Force-directed Layout
Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
Chronic Obstructive Pulmonary Disease Prediction Using Deep Convolutional Network
Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature
Towards agile multi-robot systems in the real world: Fast onboard tracking of active blinking markers for relative localization
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
Reconstruct Anything Model: a lightweight foundation model for computational imaging
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
Sharpness-Aware Minimization with Z-Score Gradient Filtering
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
Visual Planning: Let's Think Only with Images
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
Scaling Diffusion Transformers Efficiently via $\mu$P
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction
Vision Language Models are Biased
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
Bridging Semantic Logic Gaps: A Cognition Inspired Multimodal Boundary Preserving Network for Image Manipulation Localization
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
Semantic Discrepancy-aware Detector for Image Forgery Identification
SpotEdit: Evaluating Visually-Guided Image Editing Methods
Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders
SemaMIL: Semantic-Aware Multiple Instance Learning with Retrieval-Guided State Space Modeling for Whole Slide Images
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Physics-Guided Null-Space Diffusion with Sparse Masking for Corrective Sparse-View CT Reconstruction
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Fracture Detection In X-rays Using Custom Convolutional Neural Network (CNN) And Transfer Learning Models
DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection
Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
OViP: Online Vision-Language Preference Learning for VLM Hallucination
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
InfoDet: A Dataset for Infographic Element Detection
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models
Boosting Open Set Recognition Performance through Modulated Representation Learning
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
ReDDiT: Rehashing Noise for Discrete Visual Generation
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
Score Replacement with Bounded Deviation for Rare Prompt Generation
MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM
EgoVIS@CVPR: PAIR-Net: Enhancing Egocentric Speaker Detection via Pretrained Audio-Visual Fusion and Alignment Loss
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models
Training-Free Diffusion Framework for Stylized Image Generation with Identity Preservation
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis
Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
DART: Differentiable Dynamic Adaptive Region Tokenizer for Vision Foundation Models
Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
Do We Need Large VLMs for Spotting Soccer Actions?
From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
Improving Black-Box Generative Attacks via Generator Semantic Consistency
OmniGen2: Exploration to Advanced Multimodal Generation
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
Controllable Reference Guided Diffusion with Local Global Fusion for Real World Remote Sensing Image Super Resolution
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection
Counterfactual Visual Explanation via Causally-Guided Adversarial Steering
3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving
NoiseSDF2NoiseSDF: Learning Clean Neural Fields from Noisy Supervision
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
Disentangling Regional Primitives for Image Generation
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
CART: Compositional Auto-Regressive Transformer for Image Generation
Continuous Speculative Decoding for Autoregressive Image Generation
Open-Vocabulary Online Semantic Mapping for SLAM
GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Measurement of Medial Elbow Joint Space using Landmark Detection
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Training-Free Defense Against Adversarial Attacks in Deep Learning MRI Reconstruction
MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
CGI: Identifying Conditional Generative Models with Example Images
Med-PU: Point Cloud Upsampling for High-Fidelity 3D Medical Shape Reconstruction
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
PoI: A Filter to Extract Pixel of Interest from Novel View Synthesis for Scene Coordinate Regression
Bidirectional Uncertainty-Aware Region Learning for Semi-Supervised Medical Image Segmentation
IM360: Large-scale Indoor Mapping with 360 Cameras
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks
High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Exploring Reprensentation Invariance in Finetuning
UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
Controllable Adversarial Makeup for Privacy via Text-Guided Diffusion
A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Efficient Self-Supervised Adaptation for Medical Image Analysis
Audio-centric Video Understanding Benchmark without Text Shortcut
Beyond Synthetic Replays: Turning Diffusion Features into Few-Shot Class-Incremental Learning Knowledge
SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
HSACNet: Hierarchical Scale-Aware Consistency Regularized Semi-Supervised Change Detection
Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
DreamO: A Unified Framework for Image Customization
S2S-Net: Addressing the Domain Gap of Heterogeneous Sensor Systems in LiDAR-Based Collective Perception
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search
QVGen: Pushing the Limit of Quantized Video Generative Models
ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
DFG-PCN: Point Cloud Completion with Degree-Flexible Point Graph
StrucADT: Generating Structure-controlled 3D Point Clouds with Adjacency Diffusion Transformer
Diff-3DCap: Shape Captioning with Diffusion Models
GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
Transparent Visual Reasoning via Object-Centric Agent Collaboration
Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
ReLumix: Extending Image Relighting to Video via Video Diffusion Models
FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
AISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition Baselines
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition
A University of Texas Medical Branch Case Study on Aortic Calcification Detection
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex
AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Neural Visibility of Point Sets
Semantic Editing with Coupled Stochastic Differential Equations
Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
ReCon-GS: Continuum-Preserved Guassian Streaming for Fast and Compact Reconstruction of Dynamic Scenes
TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
Wavelet-Assisted Mamba for Satellite-Derived Sea Surface Temperature Super-Resolution
Hybrid Layer-Wise ANN-SNN With Surrogate Spike Encoding-Decoding Structure
A Novel Preprocessing Unit for Effective Deep Learning based Classification and Grading of Diabetic Retinopathy
SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems
Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations
A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes
Score-based Membership Inference on Diffusion Models
Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting
AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
CharGen: Fast and Fluent Portrait Modification
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
LayerD: Decomposing Raster Graphic Designs into Layers
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
Learning to Infer Unseen Single-/Multi-Attribute-Object Compositions with Graph Networks
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
Fast Real-Time Pipeline for Robust Arm Gesture Recognition
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification
Triangle Splatting+: Differentiable Rendering with Opaque Triangles
Score Distillation of Flow Matching Models
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection
Personalized Vision via Visual In-Context Learning
Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos
PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
Visual Jigsaw Post-Training Improves MLLMs
VGGT-X: When VGGT Meets Dense Novel View Synthesis
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
YOLO-based Bearing Fault Diagnosis With Continuous Wavelet Transform
VIRTUS-FPP: Virtual Sensor Modeling for Fringe Projection Profilometry in NVIDIA Isaac Sim
ReSeFlow: Rectifying SE(3)-Equivariant Policy Learning Flows
Explainable Deep Learning for Cataract Detection in Retinal Images: A Dual-Eye and Knowledge Distillation Approach
Localizing Adversarial Attacks To Produces More Imperceptible Noise
Achieving Fair Skin Lesion Detection through Skin Tone Normalization and Channel Pruning
Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
Consistency Models as Plug-and-Play Priors for Inverse Problems
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Self-driving cars: Are we there yet?
Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
Robot Learning from Any Images
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
UniPrototype: Humn-Robot Skill Learning with Uniform Prototypes
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models
DiffTex: Differentiable Texturing for Architectural Proxy Models
Graph Your Own Prompt
CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
Temporal Generalization: A Reality Check
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Automated design of compound lenses with discrete-continuous optimization
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
Evaluating Temperature Scaling Calibration Effectiveness for CNNs under Varying Noise Levels in Brain Tumour Detection
Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots
Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning
On-the-Fly Data Augmentation for Brain Tumor Segmentation
Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel
SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation
PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
LVT: Large-Scale Scene Reconstruction via Local View Transformers
CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation
TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models
SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics
BFSM: 3D Bidirectional Face-Skull Morphable Model
Comprehensive Benchmarking of YOLOv11 Architectures for Scalable and Granular Peripheral Blood Cell Detection
Biomechanical-phase based Temporal Segmentation in Sports Videos: a Demonstration on Javelin-Throw
FreeRet: MLLMs as Training-Free Retrievers
Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement
Learning Object-Centric Representations Based on Slots in Real World Scenarios
VNODE: A Piecewise Continuous Volterra Neural Network
Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation
Traumatic Brain Injury Segmentation using an Ensemble of Encoder-decoder Models
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
Evaluation of Polarimetric Fusion for Semantic Segmentation in Aquatic Environments
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm
ExGS: Extreme 3D Gaussian Compression with Diffusion Priors
VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Vision Function Layer in Multimodal LLMs
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
TACO-Net: Topological Signatures Triumph in 3D Object Classification
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models
PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement
ELPG-DTFS: Prior-Guided Adaptive Time-Frequency Graph Neural Network for EEG Depression Diagnosis
Vision At Night: Exploring Biologically Inspired Preprocessing For Improved Robustness Via Color And Contrast Transformations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Environment-Aware Satellite Image Generation with Diffusion Models
ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation
Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs
MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
DWGS: Enhancing Sparse-View Gaussian Splatting with Hybrid-Loss Depth Estimation and Bidirectional Warping
DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation
Accurate Cobb Angle Estimation via SVD-Based Curve Detection and Vertebral Wedging Quantification
Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
Segmentor-Guided Counterfactual Fine-Tuning for Image Synthesis
Scalable GANs with Transformers
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
Tumor Synthesis conditioned on Radiomics
Simulating Post-Neoadjuvant Chemotherapy Breast Cancer MRI via Diffusion Model with Prompt Tuning
Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
An Efficient 3D Latent Diffusion Model for T1-contrast Enhanced MRI Generation
UniVid: The Open-Source Unified Video Model
BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation
Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos
Scalable Audio-Visual Masked Autoencoders for Efficient Affective Video Facial Analysis
EVLF-FM: Explainable Vision Language Foundation Model for Medicine
FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation
Latent Visual Reasoning
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
S$^2$NN: Sub-bit Spiking Neural Networks
Cycle Diffusion Model for Counterfactual Image Generation
Skeleton-based Robust Registration Framework for Corrupted 3D Point Clouds
Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context
ASIA: Adaptive 3D Segmentation using Few Image Annotations
SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
Towards Foundation Models for Cryo-ET Subtomogram Analysis
Similarity-Aware Selective State-Space Modeling for Semantic Correspondence
TP-MVCC: Tri-plane Multi-view Fusion Model for Silkie Chicken Counting
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis
An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
UI-UG: A Unified MLLM for UI Understanding and Generation
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Real-Aware Residual Model Merging for Deepfake Detection
From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
DINOReg: Strong Point Cloud Registration with Vision Foundation Model
Mask Clustering-based Annotation Engine for Large-Scale Submeter Land Cover Mapping
REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
PCICF: A Pedestrian Crossing Identification and Classification Framework
RapidMV: Leveraging Spatio-Angular Representations for Efficient and Consistent Text-to-Multi-View Synthesis
CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers
A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models
Proxy-GS: Efficient 3D Gaussian Splatting via Proxy Mesh
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark
NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
Generalist Multi-Class Anomaly Detection via Distillation to Two Heterogeneous Student Networks
LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
Performance-Efficiency Trade-off for Fashion Image Retrieval
Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
Robust Multimodal Semantic Segmentation with Balanced Modality Contributions
Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis
Foggy Crowd Counting: Combining Physical Priors and KAN-Graph
NeMo: Needle in a Montage for Video-Language Understanding
Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation
2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC
Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
FairViT-GAN: A Hybrid Vision Transformer with Adversarial Debiasing for Fair and Explainable Facial Beauty Prediction
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection
Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction
AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
LifeCLEF Plant Identification Task 2015
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering
LifeCLEF Plant Identification Task 2014
EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging
Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
MoReact: Generating Reactive Motion from Textual Descriptions
Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation
Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks
SAR-KnowLIP: Towards Multimodal Foundation Models for Remote Sensing
AutoPrune: Each Complexity Deserves a Pruning Policy
CrashSplat: 2D to 3D Vehicle Damage Segmentation in Gaussian Splatting
HunyuanImage 3.0 Technical Report
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
Reinforcement Learning with Inverse Rewards for World Model Post-training
A Novel Hybrid Deep Learning and Chaotic Dynamics Approach for Thyroid Cancer Classification
VFSI: Validity First Spatial Intelligence for Constraint-Guided Traffic Diffusion
Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution
RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction
Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss
FrameMind: Frame-Interleaved Chain-of-Thought for Video Reasoning via Reinforcement Learning
Generalized Category Discovery in Hyperspectral Images via Prototype Subspace Modeling
Hazy Pedestrian Trajectory Prediction via Physical Priors and Graph-Mamba
$\mathbf{R}^3$: Reconstruction, Raw, and Rain: Deraining Directly in the Bayer Domain
Joint Superpixel and Self-Representation Learning for Scalable Hyperspectral Image Clustering
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding
Autoregressive Video Generation beyond Next Frames Prediction
Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow
SVAC: Scaling Is All You Need For Referring Video Object Segmentation
GANji: A Framework for Introductory AI Image Generation
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
EYE-DEX: Eye Disease Detection and EXplanation System
Analysis of Bias in Deep Learning Facial Beauty Regressors
Asymmetric VAE for One-Step Video Super-Resolution Acceleration
Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis
High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
Evaluating point-light biological motion in multimodal large language models
Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis
Calibrated and Resource-Aware Super-Resolution for Reliable Driver Behavior Analysis
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations
Pancreas Part Segmentation under Federated Learning Paradigm
Towards Interpretable Visual Decoding with Attention to Brain Representations
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement
Multi-Level Heterogeneous Knowledge Transfer Network on Forward Scattering Center Model for Limited Samples SAR ATR
VAMamba: An Efficient Visual Adaptive Mamba for Image Restoration
Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery
MAN: Latent Diffusion Enhanced Multistage Anti-Noise Network for Efficient and High-Quality Low-Dose CT Image Denoising
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
FlowLUT: Efficient Image Enhancement via Differentiable LUTs and Iterative Flow Matching
InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects
BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
RIV: Recursive Introspection Mask Diffusion Vision Language Model
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing
LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
EfficientMIL: Efficient Linear-Complexity MIL Method for WSI Classification
From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving
Griffin: Generative Reference and Layout Guided Image Composition
Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures
Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection
Confidence Aware SSD Ensemble with Weighted Boxes Fusion for Weapon Detection
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception
CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement
PD-Diag-Net: Clinical-Priors guided Network on Brain MRI for Auxiliary Diagnosis of Parkinson's Disease
DiffPCN: Latent Diffusion Model Based on Multi-view Depth Images for Point Cloud Completion
Video Panels for Long Video Understanding
M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State
ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning
Poivre: Self-Refining Visual Pointing with Reinforcement Learning
PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block
UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning
A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation
Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
Real-World Transferable Adversarial Attack on Face-Recognition Systems
UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
TATTOO: Training-free AesTheTic-aware Outfit recOmmendation
Increasing the Diversity in RGB-to-Thermal Image Translation for Automotive Applications
LiDAR-based Human Activity Recognition through Laplacian Spectral Analysis
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting
Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection
Seeing the Unseen in Low-light Spike Streams
Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion
Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning
DDP: Dual-Decoupled Prompting for Multi-Label Class-Incremental Learning
LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning
DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
Test-time Uncertainty Estimation for Medical Image Registration via Transformation Equivariance
GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval
CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation
UniPose: Unified Cross-modality Pose Prior Propagation towards RGB-D data for Weakly Supervised 3D Human Pose Estimation
Generative Modeling of Shape-Dependent Self-Contact Human Poses
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
Enhanced Fracture Diagnosis Based on Critical Regional and Scale Aware in YOLO
FracDetNet: Advanced Fracture Detection via Dual-Focus Attention and Multi-scale Calibration in Medical X-ray Imaging
SPIKE-RL: Video-LLMs meet Bayesian Surprise
FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation
FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing
3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras
No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation
Robust Multi-Modal Face Anti-Spoofing with Domain Adaptation: Tackling Missing Modalities, Noisy Pseudo-Labels, and Model Degradation
RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update
Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
Streamline pathology foundation model by cross-magnification distillation
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
Deep Learning for Oral Health: Benchmarking ViT, DeiT, BEiT, ConvNeXt, and Swin Transformer
HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing
Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM
Stochastic Interpolants via Conditional Dependent Coupling
Benchmarking DINOv3 for Multi-Task Stroke Analysis on Non-Contrast CT
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
WeatherCycle: Unpaired Multi-Weather Restoration via Color Space Decoupled Cycle Learning
Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction
TRAX: TRacking Axles for Accurate Axle Count Estimation
Confidence-Calibrating Regularization for Robust Brain MRI Segmentation Under Domain Shift
Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss
Pathological Truth Bias in Vision-Language Models
Scale and Rotation Estimation of Similarity-Transformed Images via Cross-Correlation Maximization Based on Auxiliary Function Method
Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization
Graph-Theoretic Consistency for Robust and Topology-Aware Semi-Supervised Histopathology Segmentation
A review of Recent Techniques for Person Re-Identification
Sequential Token Merging: Revisiting Hidden States
Deep Learning Empowered Super-Resolution: A Comprehensive Survey and Future Prospects
Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning
GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
PartCo: Part-Level Correspondence Priors Enhance Category Discovery
DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models
VideoScore2: Think before You Score in Generative Video Evaluation
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
MMPB: It's Time for Multi-Modal Personalization
Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
Multimodal Slice Interaction Network Enhanced by Transfer Learning for Precise Segmentation of Internal Gross Tumor Volume in Lung Cancer PET/CT Imaging
ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models
Learning KAN-based Implicit Neural Representations for Deformable Image Registration
Convolutional Set Transformer
TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
Learning Unified Representation of 3D Gaussian Splatting
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning
Brain Tumor Classification from MRI Scans via Transfer Learning and Enhanced Feature Representation
Hemorica: A Comprehensive CT Scan Dataset for Automated Brain Hemorrhage Classification, Segmentation, and Detection
ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View
Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition
Desensitizing for Improving Corruption Robustness in Point Cloud Classification through Adversarial Training
Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
Planning with Unified Multimodal Models
Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy
Perceptual Influence: Improving the Perceptual Loss Design for Low-Dose CT Enhancement
Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
Activation Matching for Explanation Generation
InfoDet: A Dataset for Infographic Element Detection
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
Reward Model Overoptimisation in Iterated RLHF
TabularGSM: Understanding the Limitations of LLMs in Tabular Math Reasoning
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Comba: Improving Bilinear RNNs with Closed-loop Control
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
InstructPro: Natural Language Guided Ligand-Binding Protein Design
One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence
Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
Discrete Audio Tokens: More Than a Survey!
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
OmniGen2: Exploration to Advanced Multimodal Generation
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
One Token to Fool LLM-as-a-Judge
MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Probabilistic Soundness Guarantees in LLM Reasoning Chains
A Markov Categorical Framework for Language Modeling
Can Language Models Discover Scaling Laws?
CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent
Trainable Dynamic Mask Sparse Attention
AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models
Attention Layers Add Into Low-Dimensional Residual Subspaces
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
Patterns in the Transition From Founder-Leadership to Community Governance of Open Source
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism
Semantic-guided Diverse Decoding for Large Language Model
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Making Language Model a Hierarchical Classifier
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
The Ever-Evolving Science Exam
CTTS: Collective Test-Time Scaling
Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Causal Attention with Lookahead Keys
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Agentic Reinforcement Learning with Implicit Step Rewards
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Position: Towards Bidirectional Human-AI Alignment
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning
A Neurosymbolic Fast and Slow Architecture for Graph Coloring
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
vCache: Verified Semantic Prompt Caching
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
Reasoning to Learn from Latent Thoughts
MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
Do Larger Language Models Generalize Better? A Scaling Law for Implicit Reasoning at Pretraining Time
Visual Planning: Let's Think Only with Images
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
OViP: Online Vision-Language Preference Learning for VLM Hallucination
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
LLMs Are In-Context Bandit Reinforcement Learners
AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Adapting Chat Language Models Using Only Target Unlabeled Language Data
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation
A Partition Cover Approach to Tokenization
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models
ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability Analysis
Beyond checkmate: exploring the creative chokepoints in AI text
Which Words Matter Most in Zero-Shot Prompts?
UltraIF: Advancing Instruction Following from the Wild
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
Confidence Improves Self-Consistency in LLMs
PAFT: Prompt-Agnostic Fine-Tuning
B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
PropXplain: Can LLMs Enable Explainable Propaganda Detection?
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
DataPuzzle: Breaking Free from the Hallucinated Promise of LLMs in Data Analysis
Efficient Reasoning Models: A Survey
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
Dynamic Early Exit in Reasoning Models
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
The Counting Power of Transformers
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
AdaBoN: Adaptive Best-of-N Alignment
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Automatically Advancing LLM Expertise in Technology Judgment
Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
Mechanistic Fine-tuning for In-context Learning
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
Multilingual Prompting for Improving LLM Generation Diversity
Generalizable Process Reward Models via Formally Verified Training Data
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation
ToDi: Token-wise Distillation via Fine-Grained Divergence Control
Nested Named Entity Recognition as Single-Pass Sequence Labeling
A Survey on Stereotype Detection in Natural Language Processing
BRIT: Bidirectional Retrieval over Unified Image-Text Graph
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Semi-structured LLM Reasoners Can Be Rigorously Audited
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
Answer Convergence as a Signal for Early Stopping in Reasoning
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Improving LLM Reasoning through Interpretable Role-Playing Steering
What Do Indonesians Really Need from Language Technology? A Nationwide Survey
AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Curriculum-Guided Layer Scaling for Language Model Pretraining
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
BOW: Reinforcement Learning for Bottlenecked Next Word Prediction
Long-Context Generalization with Sparse Attention
GRAF: Multi-turn Jailbreaking via Global Refinement and Active Fabrication
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
NeMo: Needle in a Montage for Video-Language Understanding
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
On the Self-awareness of Large Reasoning Models' Capability Boundaries
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
Neural network embeddings recover value dimensions from psychometric survey items on par with human data
MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern
Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study
Scaling with Collapse: Efficient and Predictable Training of LLM Families
ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Rethinking Entropy Regularization in Large Reasoning Models
The Era of Real-World Human Interaction: RL from User Conversations
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
WordAlchemy: A transformer-based Reverse Dictionary
Continual Dialogue State Tracking via Example-Guided Question Answering
CGELBank Annotation Manual v1.2
Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora
Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
CiteFusion: An Ensemble Framework for Citation Intent Classification Harnessing Dual-Model Binary Couples and SHAP Analyses
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
SPIKE-RL: Video-LLMs meet Bayesian Surprise
FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing
MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
Temporal Generalization: A Reality Check
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
RIV: Recursive Introspection Mask Diffusion Vision Language Model
From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
Towards a Comprehensive Scaling Law of Mixture-of-Experts
HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Holdem
Anchored Supervised Fine-Tuning
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Knowledge Homophily in Large Language Models
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Detecting and Rectifying Noisy Labels: A Similarity-based Approach
The Role of Logic and Automata in Understanding Transformers
Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Metamorphic Testing for Audio Content Moderation Software
Learning to Ponder: Adaptive Reasoning in Latent Space
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Latent Visual Reasoning
Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
PAME-AI: Patient Messaging Creation and Optimization using Agentic AI
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Overview of SCIDOCA 2025 Shared Task on Citation Prediction, Discovery, and Placement
SCI-Verifier: Scientific Verifier with Thinking
Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
Reinforcement Mid-Training
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
LLaDA-MoE: A Sparse MoE Diffusion Language Model
Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling
Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents
CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task
Alternatives To Next Token Prediction In Text Generation - A Survey
Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset
A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
Knowledge Editing with Subspace-Aware Key-Value Mappings
Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
Inducing Dyslexia in Vision Language Models
HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition
Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Understanding the Dilemma of Unlearning for Large Language Models
Reference-Free Rating of LLM Responses via Latent Information
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution
ProxyAttn: Guided Sparse Attention via Representative Heads
LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space
SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models
Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
DiaCDM: Cognitive Diagnosis in Teacher-Student Dialogues using the Initiation-Response-Evaluation Framework
SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
Hierarchical Error Correction for Large Language Models: A Systematic Framework for Domain-Specific AI Quality Enhancement
Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
Expanding Computation Spaces of LLMs at Inference Time
BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications
How Well Do LLMs Imitate Human Writing Style?
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
The Dialogue That Heals: A Comprehensive Evaluation of Doctor Agents' Inquiry Capability
SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
Circuit Distillation
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
GateMABSA: Aspect-Image Gated Fusion for Multimodal Aspect-based Sentiment Analysis
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Confidence-Guided Error Correction for Disordered Speech Recognition
An empirical study on the limitation of Transformers in program trace generation
Scaling Generalist Data-Analytic Agents
jina-reranker-v3: Last but Not Late Interaction for Document Reranking
Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs
Towards Personalized Deep Research: Benchmarks and Evaluations
Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?
Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection
Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
Pretraining Large Language Models with NVFP4
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
Incentive-Aligned Multi-Source LLM Summaries
Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
InfoAgent: Advancing Autonomous Information-Seeking Agents
CAOTE: KV Cache Selection for LLMs via Attention Output Error-Based Token Eviction
Multiplicative-Additive Constrained Models:Toward Joint Visualization of Interactive and Independent Effects
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
VideoScore2: Think before You Score in Generative Video Evaluation
Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research
Adaptive Margin RLHF via Preference over Preferences
Patient-specific Biomolecular Instruction Tuning
JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
Not only a helper, but also a teacher: Interactive LLM Cascade
Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
Causally-Enhanced Reinforcement Policy Optimization
Multiplayer Nash Preference Optimization
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
$p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
Comparison of Scoring Rationales Between Large Language Models and Human Raters
Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
The Impact of Role Design in In-Context Learning for Large Language Models
AraS2P: Arabic Speech-to-Phonemes System
From Human Annotation to Automation: LLM-in-the-Loop Active Learning for Arabic Sentiment Analysis
On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
Automatic Speech Recognition for Greek Medical Dictation
Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales
Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks
LLM Hallucination Detection: HSAD
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Fast Thinking for Large Language Models
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
Aligning LLMs for Multilingual Consistency in Enterprise Applications
TF-Bench: Evaluating Program Semantics Reasoning with Type Inference in System F
VIVA+: Human-Centered Situational Decision-Making
Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion
Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions
Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
Open-DeBias: Toward Mitigating Open-Set Bias in Language Models
SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step
Assessing Large Language Models in Updating Their Forecasts with New Information
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation
Toward Preference-aligned Large Language Models via Residual-based Model Steering
The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact
The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
Sequential Diffusion Language Models
SparseD: Sparse Attention for Diffusion Language Models
ResFormer: All-Time Reservoir Memory for Long Sequence Classification
Ensembling Multilingual Transformers for Robust Sentiment Analysis of Tweets
Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
GEAR: A General Evaluation Framework for Abductive Reasoning
BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Distributional Semantics
Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
Retrieval-augmented GUI Agents with Generative Guidelines
Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
PET: Preference Evolution Tracking with LLM-Generated Explainable Distribution
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Can Large Language Models Express Uncertainty Like Human?
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
ScenarioBench: Trace-Grounded Compliance Evaluation for Text-to-SQL and RAG
MoVa: Towards Generalizable Classification of Human Morals and Values
Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents
Prompt and Parameter Co-Optimization for Large Language Models
MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
LOGOS: LLM-driven End-to-End Grounded Theory Development and Schema Induction for Qualitative Research
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey
Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
Multi-Modal Sentiment Analysis with Dynamic Attention Fusion
Enabling Approximate Joint Sampling in Diffusion LMs
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
MIRAGE: Multi-hop Reasoning with Ambiguity Evaluation for Illusory Questions
ML2B: Multi-Lingual ML Benchmark For AutoML
ArFake: A Multi-Dialect Benchmark and Baselines for Arabic Spoof-Speech Detection
EditGRPO: Reinforcement Learning with Post -Rollout Edits for Clinically Accurate Chest X-Ray Report Generation
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
Towards Generalizable Implicit In-Context Learning with Attention Routing
The Bias is in the Details: An Assessment of Cognitive Bias in LLMs
Lexicon-Enriched Graph Modeling for Arabic Document Readability Prediction
HEART: Emotionally-driven test-time scaling of Language Models
Infusing Theory of Mind into Socially Intelligent LLM Agents
Extract-0: A Specialized Language Model for Document Information Extraction
Large language models management of medications: three performance analyses
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
Emergent morpho-phonological representations in self-supervised speech models
Same Content, Different Representations: A Controlled Study for Table QA
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents
The Geometry of Creative Variability: How Credal Sets Expose Calibration Gaps in Language Models
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
How to Make Large Language Models Generate 100% Valid Molecules?
Non-Collaborative User Simulators for Tool Agents
Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Pretraining LLM with Latent Thoughts in Continuous Space
Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
Estimating the strength and timing of syntactic structure building in naturalistic reading
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs
Global Beats, Local Tongue: Studying Code Switching in K-pop Hits on Billboard Charts
Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
Fin-ExBERT: User Intent based Text Extraction in Financial Context using Graph-Augmented BERT and trainable Plugin
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
Learning to Reason in Structured In-context Environments with Reinforcement Learning
C-Evolve: Consensus-based Evolution for Prompt Groups
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
Are you sure? Measuring models bias in content moderation through uncertainty
AccessEval: Benchmarking Disability Bias in Large Language Models
RAR$^2$: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
Diffusion models for multivariate subsurface generation and efficient probabilistic inversion
When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
Trainable Dynamic Mask Sparse Attention
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
PakBBQ: A Culturally Adapted Bias Benchmark for QA
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
Flow Matching for Efficient and Scalable Data Assimilation
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
Transduction is All You Need for Structured Data Workflows
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
Constrained Decoding for Robotics Foundation Models
MAUSAM: An Observations-focused assessment of Global AI Weather Prediction Models During the South Asian Monsoon
Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance
The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
Code2MCP: Transforming Code Repositories into MCP Services
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Imagined Autocurricula
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Multi-Scenario Highway Lane-Change Intention Prediction: A Physics-Informed AI Framework for Three-Class Classification
Diversity Boosts AI-Generated Text Detection
Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks
Experience Deploying Containerized GenAI Services at an HPC Center
Combinatorial Creativity: A New Frontier in Generalization Abilities
Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
When Federated Learning Meets Quantum Computing: Survey and Research Opportunities
Evolution Meets Diffusion: Efficient Neural Architecture Generation
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Sobolev norm inconsistency of kernel interpolation
Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
The Counting Power of Transformers
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement
AdaBoN: Adaptive Best-of-N Alignment
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
TranSUN: A Preemptive Paradigm to Eradicate Retransformation Bias Intrinsically from Regression Models in Recommender Systems
Mechanistic Fine-tuning for In-context Learning
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
Flexible MOF Generation with Torsion-Aware Flow Matching
Boosting Open Set Recognition Performance through Modulated Representation Learning
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise
Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
Flexible and Efficient Drift Detection without Labels
AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Constant Bit-size Transformers Are Turing Complete
Do We Need Large VLMs for Spotting Soccer Actions?
Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs
R1-Ranker: Teaching LLM Rankers to Reason
Breaking Rank Bottlenecks in Knowledge Graph Embeddings
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes
IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Is Thompson Sampling Susceptible to Algorithmic Collusion?
SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals
FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction
CommonPower: A Framework for Safe Data-Driven Smart Grid Control
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Bayesian Autoregressive Online Change-Point Detection with Time-Varying Parameters
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
Attentive Dilated Convolution for Automatic Sleep Staging using Force-directed Layout
An Empirical Study on the Computation Budget of Co-Optimization of Robot Design and Control in Simulation
On the Effect of Instability on Learning Continuous-Time Linear Control Systems
Disentangling Regional Primitives for Image Generation
LLMs Are In-Context Bandit Reinforcement Learners
A quantitative Robbins-Siegmund theorem
CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning
PACER: Physics Informed Uncertainty Aware Climate Emulator
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
CART: Compositional Auto-Regressive Transformer for Image Generation
Gaussian Process Priors for Boundary Value Problems of Linear Partial Differential Equations
Break the ID-Language Barrier: An Adaption Framework for LLM-based Sequential Recommendation
Invariant Measures in Time-Delay Coordinates for Unique Dynamical System Identification
A learning-based approach to stochastic optimal control under reach-avoid constraint
Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
Training-Free Defense Against Adversarial Attacks in Deep Learning MRI Reconstruction
Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics
Gaussian Universality for Diffusion Models
MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
Nirvana AI Governance: How AI Policymaking Is Committing Three Old Fallacies
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
A Unified Information-Theoretic Framework for Meta-Learning Generalization
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
An Empirical Analysis of Machine Learning Model and Dataset Documentation, Supply Chain, and Licensing Challenges on Hugging Face
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
Noise Sensitivity and Learning Lower Bounds for Hierarchical Functions
OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting
Accelerated Parallel Tempering via Neural Transports
Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Conformal prediction of future insurance claims in the regression problem
UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
A categorical embedding discontinuity-capturing shallow neural network for anisotropic elliptic interface problems
Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials - A review
Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Contrastive Representations for Temporal Reasoning
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
Speculative Safety-Aware Decoding
Type-Compliant Adaptation Cascades: Adapting Programmatic LM Workflows to Data
What Matters in Data for DPO?
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
T-MLP: Tailed Multi-Layer Perceptron for Level-of-Detail Signal Representation
Metis: Training Large Language Models with Advanced Low-Bit Quantization
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Model Optimal Transport
Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula
Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors
Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
VQEzy: An Open-Source Dataset for Parameter Initialization in Variational Quantum Eigensolvers
Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation
Network inference via process motifs for lagged correlation in linear stochastic processes
Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis
Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
A Double Machine Learning Approach to Combining Experimental and Observational Data
Few-shot Personalized Saliency Prediction Based on Interpersonal Gaze Patterns
Information theory for data-driven model reduction in physics and biology
A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite Optimization
Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
Comba: Improving Bilinear RNNs with Closed-loop Control
QKV Projections Require a Fraction of Their Memory
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
Towards Better Generalization via Distributional Input Projection Network
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
TreeRPO: Tree Relative Policy Optimization
Flow-Attentional Graph Neural Networks
Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
InstructPro: Natural Language Guided Ligand-Binding Protein Design
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Muon Optimizes Under Spectral Norm Constraints
Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models
Adaptive Sample Scheduling for Direct Preference Optimization
Origins of Creativity in Attention-Based Diffusion Models
Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
Cooperative Sheaf Neural Networks
Learning to Segment for Vehicle Routing Problems
JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric
One Token to Fool LLM-as-a-Judge
Warm Starts Accelerate Conditional Diffusion
FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data
A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Probabilistic Soundness Guarantees in LLM Reasoning Chains
Learning to summarize user information for personalized reinforcement learning from human feedback
GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
U-Cast: Learning Hierarchical Structures for High-Dimensional Time Series Forecasting
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning
Moving Out: Physically-grounded Human-AI Collaboration
A Markov Categorical Framework for Language Modeling
Can Language Models Discover Scaling Laws?
Merging Memory and Space: A State Space Neural Operator
Signals, Concepts, and Laws: Toward Universal, Explainable Time-Series Forecasting
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Runtime Adaptive Pruning for LLM Inference
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
What Do You Need for Diverse Trajectory Composition in Diffusion Planning?
Reward Model Overoptimisation in Iterated RLHF
Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Logic Gate Neural Networks are Good for Verification
ePC: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
Variational Deep Learning via Implicit Regularization
PDFBench: A Benchmark for De novo Protein Design from Function
Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization
Equivariant Spherical Transformer for Efficient Molecular Modeling
Efficient AllReduce with Stragglers
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Vision Language Models are Biased
On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Weight-Space Linear Recurrent Neural Networks
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
What Makes a Reward Model a Good Teacher? An Optimization Perspective
On The Sample Complexity Bounds In Bilevel Reinforcement Learning
Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Reasoning to Learn from Latent Thoughts
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Efficient Generative Model Training via Embedded Representation Warmup
A Model Zoo on Phase Transitions in Neural Networks
A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Localized Diffusion Models
Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework
Visual Planning: Let's Think Only with Images
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
Hamiltonian Neural PDE Solvers through Functional Approximation
Causes and Consequences of Representational Similarity in Machine Learning Models
VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
Learning with Local Search MCMC Layers
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
Scaling Diffusion Transformers Efficiently via $\mu$P
Certified Neural Approximations of Nonlinear Dynamics
Towards Identifiability of Interventional Stochastic Differential Equations
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Scalable Graph Generative Modeling via Substructure Sequences
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
ICYM2I: The illusion of multimodal informativeness under missingness
Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and Standard Gradient
TabText: Language-Based Representations of Tabular Health Data for Predictive Modelling
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
AQuaMaM: An Autoregressive, Quaternion Manifold Model for Rapidly Estimating Complex SO(3) Distributions
Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies
Double Machine Learning Based Structure Identification from Temporal Data
EUGENE: Explainable Structure-aware Graph Edit Distance Estimation with Generalized Edit Costs
The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning
Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity
Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method
PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability
Differential Encoding for Improved Representation Learning over Graphs
Deep Time Series Models: A Comprehensive Survey and Benchmark
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Understanding Transformer Architecture through Continuous Dynamics: A Partial Differential Equation Perspective
Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients
A GREAT Architecture for Edge-Based Graph Problems Like TSP
Sparse Covariance Neural Networks
DeepONet for Solving Nonlinear Partial Differential Equations with Physics-Informed Training
Extracting Moore Machines from Transformers using Queries and Counterexamples
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
Gradient-Free Training of Quantized Neural Networks
Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
A Predictive Approach To Enhance Time-Series Forecasting
Benchmarking Computational Methods for Emerging Drug-Drug Interaction Prediction
Self-Normalized Resets for Plasticity in Continual Learning
Haar-Laplacian for directed graphs
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Euclidean Fast Attention - Machine Learning Global Atomic Representations at Linear Cost
Learning Randomized Reductions
Toward Model-centric Heterogeneous Federated Graph Learning: A Knowledge-driven Approach
Norm-Bounded Low-Rank Adaptation
Principal Components for Neural Network Initialization
Federated Sketching LoRA: A Flexible Framework for Heterogeneous Collaborative Fine-Tuning of LLMs
Vintix: Action Model via In-Context Reinforcement Learning
DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
InfoBridge: Mutual Information estimation via Bridge Matching
LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection
Progressive Binarization with Semi-Structured Pruning for LLMs
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Pre-training Epidemic Time Series Forecasters with Compartmental Prototypes
vCache: Verified Semantic Prompt Caching
Functional Complexity-adaptive Temporal Tensor Decomposition
Recurrent Memory for Online Interdomain Gaussian Processes
The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time
Comprehensive Review of Neural Differential Equations for Time Series Analysis
Learning to Explain Air Traffic Situation
Collaborative Deterministic-Probabilistic Forecasting for Diverse Spatiotemporal Systems
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
TGT: A Temporal Gating Transformer for Smartphone App Usage Prediction
Joint Value Estimation and Bidding in Repeated First-Price Auctions
Meta-Learning to Explore via Memory Density Feedback
Neuroplasticity-inspired dynamic ANNs for multi-task demand forecasting
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Training Agents Inside of Scalable World Models
Quantitative convergence of trained single layer neural networks to Gaussian processes
Bandits roaming Hilbert space
Prompting Robot Teams with Natural Language
Inducing Dyslexia in Vision Language Models
Algorithms and data structures for automatic precision estimation of neural networks
Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Reference-Free Rating of LLM Responses via Latent Information
Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering
MAD: Manifold Attracted Diffusion
Bundle Network: a Machine Learning-Based Bundle Method
ProxyAttn: Guided Sparse Attention via Representative Heads
Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG
Sparse Autoencoders Make Audio Foundation Models more Explainable
Fidelity-Aware Data Composition for Robust Robot Generalization
TACO-Net: Topological Signatures Triumph in 3D Object Classification
A Greedy PDE Router for Blending Neural Operators and Classical Methods
Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
Environment-Aware Satellite Image Generation with Diffusion Models
VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
Improved Stochastic Optimization of LogSumExp
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions
From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
Graph Theory Meets Federated Learning over Satellite Constellations: Spanning Aggregations, Network Formation, and Performance Optimization
Scalable GANs with Transformers
MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
Embedded Deep Learning for Bio-hybrid Plant Sensors to Detect Increased Heat and Ozone Levels
LVT: Large-Scale Scene Reconstruction via Local View Transformers
CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Symmetry-Aware Bayesian Optimization via Max Kernels
Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
Scaling Generalist Data-Analytic Agents
Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
Curriculum Imitation Learning of Distributed Multi-Robot Policies
On Spectral Learning for Odeco Tensors: Perturbation, Initialization, and Algorithms
Score Distillation of Flow Matching Models
The Era of Real-World Human Interaction: RL from User Conversations
Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
Pretraining Large Language Models with NVFP4
Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
Personalized Vision via Visual In-Context Learning
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes
CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection
DFG-PCN: Point Cloud Completion with Degree-Flexible Point Graph
StrucADT: Generating Structure-controlled 3D Point Clouds with Adjacency Diffusion Transformer
Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
Diff-3DCap: Shape Captioning with Diffusion Models
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Define latent spaces by example: optimisation over the outputs of generative models
Influence-Guided Concolic Testing of Transformer Robustness
A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization
Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
Toward Preference-aligned Large Language Models via Residual-based Model Steering
TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction
Sequential Diffusion Language Models
The Role of Logic and Automata in Understanding Transformers
Singleton-Optimized Conformal Prediction
GEAR: A General Evaluation Framework for Abductive Reasoning
SpeedCP: Fast Kernel-based Conditional Conformal Prediction
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
Ancestry Tree Clustering for Particle Filter Diversity Maintenance
ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings
EYE-DEX: Eye Disease Detection and EXplanation System
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
STRAPSim: A Portfolio Similarity Metric for ETF Alignment and Portfolio Trades
Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
Retrieval-augmented GUI Agents with Generative Guidelines
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations
Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
Understanding Cognitive States from Head & Hand Motion Data
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
LAMP-PRo: Label-aware Attention for Multi-label Prediction of DNA- and RNA-binding Proteins using Protein Language Models
Graph-Based Learning of Free Surface Dynamics in Generalized Newtonian Fluids using Smoothed Particle Hydrodynamics
Skeleton-based Robust Registration Framework for Corrupted 3D Point Clouds
SCI-Verifier: Scientific Verifier with Thinking
ActiveCQ: Active Estimation of Causal Quantities
PEARL: Performance-Enhanced Aggregated Representation Learning
Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Prediction-Powered Communication with Distortion Guarantees
From Sound to Setting: AI-Based Equalizer Parameter Prediction for Piano Tone Replication
FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems
Multi-Item-Query Attention for Stable Sequential Recommendation
Contrastive Learning for Correlating Network Incidents
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
Overcoming Over-Fitting in Constraint Acquisition via Query-Driven Interactive Refinement
Preference-Based Dynamic Ranking Structure Recognition
EKF-Based Fusion of Wi-Fi/LiDAR/IMU for Indoor Localization and Navigation
Impact of Environmental Factors on LoRa 2.4 GHz Time of Flight Ranging Outdoors
Statistical Inference for Gradient Boosting Regression
Conditional Risk Minimization with Side Information: A Tractable, Universal Optimal Transport Framework
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models
AI-Enhanced Distributed Channel Access for Collision Avoidance in Future Wi-Fi 8
Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective
Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
A Generative Model for Controllable Feature Heterophily in Graphs
Explicit modelling of subject dependency in BCI decoding
Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
Multifractal features of multimodal cardiac signals: Nonlinear dynamics of exercise recovery
Space Robotics Bench: Robot Learning Beyond Earth
Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models
PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
CrediBench: Building Web-Scale Network Datasets for Information Integrity
AI-Assisted Music Production: A User Study on Text-to-Music Models
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
An Accelerated Newton-GMRES Method for Multilinear PageRank
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Flow Matching for Robust Simulation-Based Inference under Model Misspecification
Optimizing the Network Topology of a Linear Reservoir Computer
Comparison of Scoring Rationales Between Large Language Models and Human Raters
Democratizing AI scientists using ToolUniverse
New Insights and Algorithms for Optimal Diagonal Preconditioning
S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras
Multi-Modal Manipulation via Multi-Modal Policy Consensus
Dynamic Trust Calibration Using Contextual Bandits
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Network-Optimised Spiking Neural Network for Event-Driven Networking
On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
End-to-End Deep Learning for Predicting Metric Space-Valued Outputs
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Node Classification via Simplicial Interaction with Augmented Maximal Clique Selection
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
Large Language Models and Futures Price Factors in China
Spatially Parallel All-optical Neural Networks
Communication-aware Wide-Area Damping Control using Risk-Constrained Reinforcement Learning
RIV: Recursive Introspection Mask Diffusion Vision Language Model
How LLMs Learn to Reason: A Complex Network Perspective
LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
Confidence Aware SSD Ensemble with Weighted Boxes Fusion for Weapon Detection
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
Risk Profiling and Modulation for LLMs
Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability
FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design
How to Make Large Language Models Generate 100% Valid Molecules?
Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
Exploring Large Language Models for Translating Romanian Computational Problems into English
BacPrep: An Experimental Platform for Evaluating LLM-Based Bacalaureat Assessment
Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests
A Culturally-Rich Romanian NLP Dataset from "Who Wants to Be a Millionaire?" Videos
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs
Stable and Interpretable Jet Physics with IRC-Safe Equivariant Feature Extraction
A Comprehensive Analysis of Churn Prediction in Telecommunications Using Machine Learning
Forecasting West Nile virus with deep graph encoders
A Comparison of Surrogate Constitutive Models for Viscoplastic Creep Simulation of HT-9 Steel
Semantic-Aware Edge Intelligence for UAV Handover in 6G Networks
PISA: An AI Pipeline for Interpretable-by-design Survival Analysis Providing Multiple Complexity-Accuracy Trade-off Models
Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments
Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization
Metadata-Guided Adaptable Frequency Scaling across Heterogeneous Applications and Devices
GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
A Data-Driven Framework for Digital Transformation in Smart Cities: Integrating AI, Dashboards, and IoT Readiness
Consistency Models as Plug-and-Play Priors for Inverse Problems
Enabling Approximate Joint Sampling in Diffusion LMs
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
Generalization Analysis for Classification on Korobov Space
Variance-Bounded Evaluation without Ground Truth: VB-Score
Concept activation vectors: a unifying view and adversarial attacks
Identifying Memory Effects in Epidemics via a Fractional SEIRD Model and Physics-Informed Neural Networks
UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
A theoretical guarantee for SyncRank
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
Text-Independent Speaker Identification Using Audio Looping With Margin Based Loss Functions
Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
Parameterized Hardness of Zonotope Containment and Neural Network Verification
Patient-specific Biomolecular Instruction Tuning
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
HEART: Emotionally-driven test-time scaling of Language Models
Mixtures Closest to a Given Measure: A Semidefinite Programming Approach
Convolutional Set Transformer
A benchmark for vericoding: formally verified program synthesis
TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Localized Uncertainty Quantification in Random Forests via Proximities
What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
Robot Learning from Any Images
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty
Activation Matching for Explanation Generation
Deep Reinforcement Learning in Action: Real-Time Control of Vortex-Induced Vibrations
Emergent World Representations in OpenVLA
Learning to Solve Optimization Problems Constrained with Partial Differential Equations
SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Learning Hamiltonian Dynamics at Scale: A Differential-Geometric Approach
Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
HyperHELM: Hyperbolic Hierarchy Encoding for mRNA Language Modeling
T-POP: Test-Time Personalization with Online Preference Feedback
FedPOB: Sample-Efficient Federated Prompt Optimization via Bandits
Circuit-Aware Reward Training: A Mechanistic Framework for Longtail Robustness in RLHF
Discrete Variational Autoencoding via Policy Search
Q-Net: Transferable Queue Length Estimation via Kalman-based Neural Networks
Beyond Softmax: A Natural Parameterization for Categorical Random Variables
Who invented deep residual learning?
A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
In-Context Learning of Temporal Point Processes with Foundation Inference Models
Neural Message-Passing on Attention Graphs for Hallucination Detection
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
Quantifying Generalisation in Imitation Learning
Assessing the risk of future Dunkelflaute events for Germany using generative deep learning
Fidel-TS: A High-Fidelity Benchmark for Multimodal Time Series Forecasting
DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting
Physics-informed learning under mixing: How physical knowledge speeds up learning
DyMoDreamer: World Modeling with Dynamic Modulation
Putnam-like dataset summary: LLMs as mathematical competition contestants
Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data
Beyond the Hook: Predicting Billboard Hot 100 Chart Inclusion with Machine Learning from Streaming, Audio Signals, and Perceptual Features
DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs Learning
Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
Towards Understanding the Shape of Representations in Protein Language Models
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Is Sequence Information All You Need for Bayesian Optimization of Antibodies?
OAT-FM: Optimal Acceleration Transport for Improved Flow Matching
Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
Intra-request branch orchestration for efficient LLM reasoning
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Sampling Complexity of TD and PPO in RKHS
Score-based Membership Inference on Diffusion Models
Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting
MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts
Bayesian Surrogates for Risk-Aware Pre-Assessment of Aging Bridge Portfolios
A multiscale analysis of mean-field transformers in the moderate interaction regime
Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
Scaling with Collapse: Efficient and Predictable Training of LLM Families
ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
Towards generalizable deep ptychography neural networks
Rethinking Entropy Regularization in Large Reasoning Models
Learning in an Echo Chamber: Online Learning with Replay Adversary
BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Chance-constrained Flow Matching for High-Fidelity Constraint-aware Generation
Does Weak-to-strong Generalization Happen under Spurious Correlations?
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
Pretraining Scaling Laws for Generative Evaluations of Language Models
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
Collaborative Device-Cloud LLM Inference through Reinforcement Learning
On The Variability of Concept Activation Vectors
In-Context Compositional Q-Learning for Offline Reinforcement Learning
A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks
Demographic-Agnostic Fairness without Harm
PEARL: Peer-Enhanced Adaptive Radio via On-Device LLM
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
ADAPT: Lightweight, Long-Range Machine Learning Force Fields Without Graphs
GeoFunFlow: Geometric Function Flow Matching for Inverse Operator Learning over Complex Geometries
HyMaTE: A Hybrid Mamba and Transformer Model for EHR Representation Learning
Echo Flow Networks
The Impossibility of Inverse Permutation Learning in Transformer Models
A signal separation view of classification
Evaluation of Machine and Deep Learning Techniques for Cyclone Trajectory Regression and Status Classification by Time Series Data
Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
Multi-Scale Geometric Autoencoder
Model Correlation Detection via Random Selection Probing
FM-FoG: A Real-Time Foundation Model-based Wearable System for Freezing-of-Gait Mitigation
Negative Pre-activations Differentiate Syntax
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis
Conda: Column-Normalized Adam for Training Large Language Models Faster
Semantic Editing with Coupled Stochastic Differential Equations
Proposing a Framework for Machine Learning Adoption on Legacy Systems
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization
Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
ELASTIQ: EEG-Language Alignment with Semantic Task Instruction and Querying
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
A study of Universal ODE approaches to predicting soil organic carbon
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates
H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Expanding Horizons of Level Diversity via Multi-objective Evolutionary Learning
Watermarking Diffusion Language Models
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
AXIS: Explainable Time Series Anomaly Detection with Large Language Models
Muon: Training and Trade-offs with Latent Attention and MoE
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
BiHDTrans: binary hyperdimensional transformer for efficient multivariate time series classification
Semantic Compression via Multimodal Representation Learning
EOE: Evolutionary Optimization of Experts for Training Language Models
Distributionally Robust Federated Learning with Outlier Resilience
Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nystr\"om Approximation
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model
LLM DNA: Tracing Model Evolution via Functional Representations
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Trading Carbon for Physics: On the Resource Efficiency of Machine Learning for Spatio-Temporal Forecasting
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
Training-Free Multimodal Guidance for Video to Audio Generation
Short window attention enables long-term memorization
EVO-LRP: Evolutionary Optimization of LRP for Interpretable Model Explanations
Sketching Low-Rank Plus Diagonal Matrices
Toward a Holistic Approach to Continual Model Merging
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
GraphIFE: Rethinking Graph Imbalance Node Classification via Invariant Learning
DRIK: Distribution-Robust Inductive Kriging without Information Leakage
PreScope: Unleashing the Power of Prefetching for Resource-Constrained MoE Inference
Virtual Nodes based Heterogeneous Graph Convolutional Neural Network for Efficient Long-Range Information Aggregation
Pure Node Selection for Imbalanced Graph Node Classification
Calibration Meets Reality: Making Machine Learning Predictions Trustworthy
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Why Alignment Must Precede Distillation: A Minimal Working Explanation
Multi-Scale Spatial-Temporal Hypergraph Network with Lead-Lag Structures for Stock Time Series Forecasting
Graph Neural Networks with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion for Multivariate Time Series Forecasting
Towards a Comprehensive Scaling Law of Mixture-of-Experts
Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
FedDAPL: Toward Client-Private Generalization in Federated Learning
Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability
Estimating Time Series Foundation Model Transferability via In-Context Learning
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
FraudTransformer: Time-Aware GPT for Transaction Fraud Detection
A Self-Adaptive Frequency Domain Network for Continuous Intraoperative Hypotension Prediction
GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
Time-Shifted Token Scheduling for Symbolic Music Generation
An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms
Anchored Supervised Fine-Tuning
SHAPoint: Task-Agnostic, Efficient, and Interpretable Point-Based Risk Scoring via Shapley Values
Knowledge Homophily in Large Language Models
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
Visual CoT Makes VLMs Smarter but More Fragile
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Tequila: Trapping-free Ternary Quantization for Large Language Models
IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
Test-time GNN Model Evaluation on Dynamic Graphs
Space Group Conditional Flow Matching
Electric Currents for Discrete Data Generation
Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know
Adversarial Diffusion for Robust Reinforcement Learning
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Gradient Flow Convergence Guarantee for General Neural Network Architectures
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
Integrated Communication and Control for Energy-Efficient UAV Swarms: A Multi-Agent Reinforcement Learning Approach
Graph Mixing Additive Networks
HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
Diffusion Models are Kelly Gamblers
Brain-language fusion enables interactive neural readout and in-silico experimentation
Efficient Identification of High Similarity Clusters in Polygon Datasets
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
DiBS-MTL: Transformation-Invariant Multitask Learning with Direction Oracles
Evaluating the Robustness of Chinchilla Compute-Optimal Scaling
Detecting and Rectifying Noisy Labels: A Similarity-based Approach
Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts
Guide: Generalized-Prior and Data Encoders for DAG Estimation
Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector Databases
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
Statistical Learning Guarantees for Group-Invariant Barron Functions
Temporal Generalization: A Reality Check
Revisiting Multivariate Time Series Forecasting with Missing Values
Beyond Outliers: A Study of Optimizers Under Quantization
Disentanglement of Variations with Multimodal Generative Modeling
Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble
Improving constraint-based discovery with robust propagation and reliable LLM priors
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Impute-MACFM: Imputation based on Mask-Aware Flow Matching
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Beyond Heuristics: Globally Optimal Configuration of Implicit Neural Representations
TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts
Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers
CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning
Deep Learning-Based Detection of Cognitive Impairment from Passive Smartphone Sensing with Routine-Aware Augmentation and Demographic Personalization
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Dense associative memory on the Bures-Wasserstein space
F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
Towards Monotonic Improvement in In-Context Reinforcement Learning
One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Deep Learning for Subspace Regression
NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
ABConformer: Physics-inspired Sliding Attention for Antibody-Antigen Interface Prediction
CREPE: Controlling Diffusion with Replica Exchange
Transfer Learning and Machine Learning for Training Five Year Survival Prognostic Models in Early Breast Cancer
Continuous-Time Reinforcement Learning for Asset-Liability Management
A Neural ODE Approach to Aircraft Flight Dynamics Modelling
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
Two-Scale Latent Dynamics for Recurrent-Depth Transformers
MELCOT: A Hybrid Learning Architecture with Marginal Preservation for Matrix-Valued Regression
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
Entering the Era of Discrete Diffusion Models: A Benchmark for Schr\"odinger Bridges and Entropic Optimal Transport
Landing with the Score: Riemannian Optimization through Denoising
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction
Graph Your Own Prompt
Planner Aware Path Learning in Diffusion Language Models Training
Mind the Links: Cross-Layer Attention for Link Prediction in Multiplex Networks
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
URS: A Unified Neural Routing Solver for Cross-Problem Zero-Shot Generalization
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
Better Hessians Matter: Studying the Impact of Curvature Approximations in Influence Functions
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
PHASE: Physics-Integrated, Heterogeneity-Aware Surrogates for Scientific Simulations
Data-Efficient Training by Evolved Sampling
Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving
Localizing Adversarial Attacks To Produces More Imperceptible Noise
In-Context Learning can Perform Continual Learning Like Humans
Communication-Efficient and Interoperable Distributed Learning
On the Capacity of Self-Attention
Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
Adaptive Margin RLHF via Preference over Preferences
Observation-Free Attacks on Online Learning to Rank
Neighborhood Sampling Does Not Learn the Same Graph Neural Network
From Noise to Knowledge: A Comparative Study of Acoustic Anomaly Detection Models in Pumped-storage Hydropower Plants
FedCF: Fair Federated Conformal Prediction
Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
Compute-Optimal Quantization-Aware Training
Understanding SOAP from the Perspective of Gradient Whitening
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Meta-Learning Fourier Neural Operators for Hessian Inversion and Enhanced Variational Data Assimilation
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces
Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic
Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders
OptiMind: Teaching LLMs to Think Like Optimization Experts
MDP modeling for multi-stage stochastic programs
T-TAMER: Provably Taming Trade-offs in ML Serving
Analysis of Variational Autoencoders
Sample-efficient Multiclass Calibration under $\ell_{p}$ Error
Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
On the Sheafification of Higher-Order Message Passing
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Understanding Catastrophic Interference On the Identifibility of Latent Representations
DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
IsingFormer: Augmenting Parallel Tempering With Learned Proposals
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Dynamics of Learning: Generative Schedules from Latent ODEs
Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting
CLAD-Net: Continual Activity Recognition in Multi-Sensor Wearable Systems
Signal Preserving Weight Initialization for Odd-Sigmoid Activations
Unleashing Flow Policies with Distributional Critics
Demystifying Network Foundation Models
Sensitivity Analysis for Diffusion Models
Causally-Enhanced Reinforcement Policy Optimization
Towards Quantum-Ready Blockchain Fraud Detection via Ensemble Graph Neural Networks
Effective Quantization of Muon Optimizer States

Research Sources: 2736 | Generated: 9/30/2025