AI RESEARCH PAPERS & ACADEMIC SOURCES
- Responsible AI Technical Report
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
- Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
- EmbeddingGemma: Powerful and Lightweight Text Representations
- i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents
- SiNGER: A Clearer Voice Distills Vision Transformers Further
- Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
- Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy
- Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
- TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization
- Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
- Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
- GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
- Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
- Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula
- Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance
- The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
- COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
- DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection
- A Systematic Survey on Large Language Models for Evolutionary Optimization: From Modeling to Solving
- TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
- FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
- MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
- Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
- Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
- TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
- Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
- Accurate and Efficient Low-Rank Model Merging in Core Space
- StefaLand: An Efficient Geoscience Foundation Model That Improves Dynamic Land-Surface Predictions
- Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
- Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory
- Self-Evolving LLMs via Continual Instruction Tuning
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
- SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
- Diversity Boosts AI-Generated Text Detection
- Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
- PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
- CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
- Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
- Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
- Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
- Vidar: Embodied Video Diffusion Model for Generalist Manipulation
- Making Language Model a Hierarchical Classifier
- Learning to summarize user information for personalized reinforcement learning from human feedback
- GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models
- Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
- The Ever-Evolving Science Exam
- When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
- Can Language Models Discover Scaling Laws?
- Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
- The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
- Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
- AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models
- PakBBQ: A Culturally Adapted Bias Benchmark for QA
- BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
- ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
- CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
- Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
- End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
- Improving LLM Reasoning through Interpretable Role-Playing Steering
- InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
- TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
- Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
- When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
- Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
- Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
- Discrete Audio Tokens: More Than a Survey!
- Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
- Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
- StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework
- Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models
- Long-Context Generalization with Sparse Attention
- Do We Need Large VLMs for Spotting Soccer Actions?
- Adaptive Sample Scheduling for Direct Preference Optimization
- From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
- GRAF: Multi-turn Jailbreaking via Global Refinement and Active Fabrication
- Improving Black-Box Generative Attacks via Generator Semantic Consistency
- OmniGen2: Exploration to Advanced Multimodal Generation
- Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
- R1-Ranker: Teaching LLM Rankers to Reason
- Enhancing Live Broadcast Engagement: A Multi-modal Approach to Short Video Recommendations Using MMGCN and User Preferences
- Semantic-guided Diverse Decoding for Large Language Model
- Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
- Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
- Learning to Segment for Vehicle Routing Problems
- Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem
- Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
- Communication-Efficient Desire Alignment for Embodied Agent-Human Adaptation
- CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
- Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
- ProxyThinker: Test-Time Guidance through Small Visual Reasoners
- TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
- WorldGym: World Model as An Environment for Policy Evaluation
- GRAM: Spatial general-purpose audio representation models for real-world applications
- Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
- Interaction Field Matching: Overcoming Limitations of Electrostatic Models
- VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
- Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
- Towards Better Generalization via Distributional Input Projection Network
- Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
- TreeRPO: Tree Relative Policy Optimization
- ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
- OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
- SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation
- How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
- Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
- Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
- Visual Planning: Let's Think Only with Images
- Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
- Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
- Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement
- AdaBoN: Adaptive Best-of-N Alignment
- VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
- MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
- Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
- Mechanistic Fine-tuning for In-context Learning
- Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
- ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
- Scaling Diffusion Transformers Efficiently via $\mu$P
- Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
- Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
- Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
- Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
- Scalable Graph Generative Modeling via Substructure Sequences
- AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
- BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
- Runtime Adaptive Pruning for LLM Inference
- InfoDet: A Dataset for Infographic Element Detection
- On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
- Reward Model Overoptimisation in Iterated RLHF
- Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access
- ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
- LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
- EnvSDD: Benchmarking Environmental Sound Deepfake Detection
- A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
- ePC: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
- Variational Deep Learning via Implicit Regularization
- GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
- PDFBench: A Benchmark for De novo Protein Design from Function
- Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
- Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
- Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
- Advanced Architectures Integrated with Agentic AI for Next-Generation Wireless Networks
- A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
- LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection
- 3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
- UltraIF: Advancing Instruction Following from the Wild
- Confidence Improves Self-Consistency in LLMs
- OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting
- Comprehensive Review of Neural Differential Equations for Time Series Analysis
- Collaborative Deterministic-Probabilistic Forecasting for Diverse Spatiotemporal Systems
- PAFT: Prompt-Agnostic Fine-Tuning
- B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
- Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
- MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
- Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective POMDPs and its Impact on Randomised Strategies
- SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
- Delta-Triplane Transformers as Occupancy World Models
- UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
- How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
- Implicit Bias-Like Patterns in Reasoning Models
- RISE: Robust Imitation through Stochastic Encoding
- What Makes a Reward Model a Good Teacher? An Optimization Perspective
- On The Sample Complexity Bounds In Bilevel Reinforcement Learning
- Reasoning to Learn from Latent Thoughts
- Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials - A review
- AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
- SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
- XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
- Beyond Synthetic Replays: Turning Diffusion Features into Few-Shot Class-Incremental Learning Knowledge
- SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
- SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
- From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
- Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
- Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
- Efficient Reasoning Models: A Survey
- IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
- Dynamic Early Exit in Reasoning Models
- Evolution Meets Diffusion: Efficient Neural Architecture Generation
- Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
- $\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
- Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
- TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
- Continual Dialogue State Tracking via Example-Guided Question Answering
- A Double Machine Learning Approach to Combining Experimental and Observational Data
- Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies
- Ocassionally Secure: A Comparative Analysis of Code Generation Assistants
- BlockFUL: Enabling Unlearning in Blockchained Federated Learning
- Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity
- FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction
- Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
- A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability
- Position: Towards Bidirectional Human-AI Alignment
- Understanding Transformer Architecture through Continuous Dynamics: A Partial Differential Equation Perspective
- Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning
- LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
- A GREAT Architecture for Edge-Based Graph Problems Like TSP
- Parse Trees Guided LLM Prompt Compression
- Distributed AI Platform for the 6G RAN
- Disentangling Regional Primitives for Image Generation
- Extracting Moore Machines from Transformers using Queries and Counterexamples
- Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
- NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
- Gradient-Free Training of Quantized Neural Networks
- DM-Codec: Distilling Multimodal Representations for Speech Tokenization
- Self-Normalized Resets for Plasticity in Continual Learning
- PACER: Physics Informed Uncertainty Aware Climate Emulator
- When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
- UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
- UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction
- Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair
- Adapting Chat Language Models Using Only Target Unlabeled Language Data
- Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization
- Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
- A Partition Cover Approach to Tokenization
- CGI: Identifying Conditional Generative Models with Example Images
- Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
- FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities
- Principal Components for Neural Network Initialization
- Beyond checkmate: exploring the creative chokepoints in AI text
- Vintix: Action Model via In-Context Reinforcement Learning
- Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
- Do Larger Language Models Generalize Better? A Scaling Law for Implicit Reasoning at Pretraining Time
- Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
- FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
- SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
- No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery
- Fuzzy Information Evolution with Three-Way Decision in Social Network Group Decision-Making
- DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
- TabularGSM: Understanding the Limitations of LLMs in Tabular Math Reasoning
- HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
- MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
- Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation
- Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
- VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
- One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence
- Efficient LLM Collaboration via Planning
- Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety
- The 4th Dimension for Scaling Model Size
- Breaking Rank Bottlenecks in Knowledge Graph Embeddings
- Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
- GTA1: GUI Test-time Scaling Agent
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
- DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
- Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
- Neuromorphic Intelligence
- Imagined Autocurricula
- Memory-QA: Answering Recall Questions Based on Multimodal Memories
- LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis
- The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging
- WordAlchemy: A transformer-based Reverse Dictionary
- Scaling Generalist Data-Analytic Agents
- jina-reranker-v3: Last but Not Late Interaction for Document Reranking
- Scaling with Collapse: Efficient and Predictable Training of LLM Families
- ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
- Towards Personalized Deep Research: Benchmarks and Evaluations
- Score Distillation of Flow Matching Models
- MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
- Rethinking Entropy Regularization in Large Reasoning Models
- Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
- Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
- Pretraining Large Language Models with NVFP4
- Chance-constrained Flow Matching for High-Fidelity Constraint-aware Generation
- GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
- GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models
- XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
- EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
- GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
- NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
- DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
- DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
- Incentive-Aligned Multi-Source LLM Summaries
- Guided Diffusion for the Discovery of New Superconductors
- InfoAgent: Advancing Autonomous Information-Seeking Agents
- Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs
- Taking control: Policies to address extinction risks from advanced AI
- Understanding the Effects of Miscalibrated AI Confidence on User Trust, Reliance, and Decision Efficacy
- Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
- VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning
- Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
- A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
- Neuro-Symbolic Entity Alignment via Variational Inference
- A Neurosymbolic Fast and Slow Architecture for Graph Coloring
- From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle
- GUI Agents: A Survey
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
- Broadening Ontologization Design: Embracing Data Pipeline Strategies
- Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
- Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs
- Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
- Putnam-like dataset summary: LLMs as mathematical competition contestants
- Evaluating SAP Joule for Code Generation
- SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
- Hierarchical Error Correction for Large Language Models: A Systematic Framework for Domain-Specific AI Quality Enhancement
- Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
- Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
- Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation
- Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs
- Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
- OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
- Segmentor-Guided Counterfactual Fine-Tuning for Image Synthesis
- When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
- Scalable GANs with Transformers
- MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
- Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
- MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
- SecInfer: Preventing Prompt Injection via Inference-time Scaling
- Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
- Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes
- Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
- CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
- AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
- Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
- Fast Real-Time Pipeline for Robust Arm Gesture Recognition
- Large Language Models for Software Testing: A Research Roadmap
- Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
- Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
- BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
- UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
- CoTune: Co-evolutionary Configuration Tuning
- SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
- T-POP: Test-Time Personalization with Online Preference Feedback
- FedPOB: Sample-Efficient Federated Prompt Optimization via Bandits
- Circuit-Aware Reward Training: A Mechanistic Framework for Longtail Robustness in RLHF
- Discrete Variational Autoencoding via Policy Search
- Q-Net: Transferable Queue Length Estimation via Kalman-based Neural Networks
- A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
- Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
- Surjective Independence of Causal Influences for Local Bayesian Network Structures
- VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
- VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
- Quantifying Generalisation in Imitation Learning
- Sparse Autoencoders Make Audio Foundation Models more Explainable
- Fidelity-Aware Data Composition for Robust Robot Generalization
- Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
- DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting
- RDD: Pareto Analysis of the Rate-Distortion-Distinguishability Trade-off
- Intelligent Optimization of Wireless Access Point Deployment for Communication-Based Train Control Systems Using Deep Reinforcement Learning
- CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
- CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
- Short window attention enables long-term memorization
- Deep Reinforcement Learning in Action: Real-Time Control of Vortex-Induced Vibrations
- AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
- Bandits roaming Hilbert space
- PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control
- Algorithms and data structures for automatic precision estimation of neural networks
- Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
- Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
- VNODE: A Piecewise Continuous Volterra Neural Network
- Community detection robustness of graph neural networks
- InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
- Understanding the Dilemma of Unlearning for Large Language Models
- Reference-Free Rating of LLM Responses via Latent Information
- Data-Driven Discrete Geofence Design Using Binary Quadratic Programming
- LAMP-PRo: Label-aware Attention for Multi-label Prediction of DNA- and RNA-binding Proteins using Protein Language Models
- Cycle Diffusion Model for Counterfactual Image Generation
- Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
- SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
- Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
- DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
- Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs
- Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
- A study of Universal ODE approaches to predicting soil organic carbon
- Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
- TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
- Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
- Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
- Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
- An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation
- UI-UG: A Unified MLLM for UI Understanding and Generation
- Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
- Watermarking Diffusion Language Models
- From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
- Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
- REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport
- HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
- Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
- LLaDA-MoE: A Sparse MoE Diffusion Language Model
- The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies
- Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents
- Hybrid Layer-Wise ANN-SNN With Surrogate Spike Encoding-Decoding Structure
- ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
- CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers
- A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models
- Multi-Item-Query Attention for Stable Sequential Recommendation
- Alternatives To Next Token Prediction In Text Generation - A Survey
- EOE: Evolutionary Optimization of Experts for Training Language Models
- An Agent-Based Framework for Automated Higher-Voice Harmony Generation
- Moravec's Paradox and Restrepo's Model: Limits of AGI Automation in Growth
- LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation
- Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
- Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
- LLM DNA: Tracing Model Evolution via Functional Representations
- Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
- Agentic Specification Generator for Move Programs
- PhysiAgent: An Embodied Agent Framework in Physical World
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
- FrameMind: Frame-Interleaved Chain-of-Thought for Video Reasoning via Reinforcement Learning
- From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures
- GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
- End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex
- PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features
- A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
- In-Context Compositional Q-Learning for Offline Reinforcement Learning
- A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
- AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
- Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding
- PEARL: Peer-Enhanced Adaptive Radio via On-Device LLM
- Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
- PerfBench: Can Agents Resolve Real-World Performance Bugs?
- GEAR: A General Evaluation Framework for Abductive Reasoning
- Ancestry Tree Clustering for Particle Filter Diversity Maintenance
- The Impossibility of Inverse Permutation Learning in Transformer Models
- BOSfM: A View Planning Framework for Optimal 3D Reconstruction of Agricultural Scenes
- ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings
- EYE-DEX: Eye Disease Detection and EXplanation System
- Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
- TENET: Leveraging Tests Beyond Validation for Code Generation
- Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
- Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
- LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis
- Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
- Retrieval-augmented GUI Agents with Generative Guidelines
- Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
- Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
- AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
- Chat to Chip: Large Language Model Based Design of Arbitrarily Shaped Metasurfaces
- Can Large Language Models Express Uncertainty Like Human?
- Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
- BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation
- BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
- Metamorphic Testing for Audio Content Moderation Software
- Conda: Column-Normalized Adam for Training Large Language Models Faster
- ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
- Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
- ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
- SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
- Prompt and Parameter Co-Optimization for Large Language Models
- Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization
- SHAPoint: Task-Agnostic, Efficient, and Interpretable Point-Based Risk Scoring via Shapley Values
- Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
- Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
- From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization
- Knowledge Homophily in Large Language Models
- Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
- GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
- From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation
- Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
- FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
- Tequila: Trapping-free Ternary Quantization for Large Language Models
- Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models
- IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
- A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
- Space Group Conditional Flow Matching
- HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
- Adversarial Diffusion for Robust Reinforcement Learning
- GSID: Generative Semantic Indexing for E-Commerce Product Understanding
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
- Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
- Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification
- Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
- Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription
- PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
- Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction
- Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
- Gradient Flow Convergence Guarantee for General Neural Network Architectures
- Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
- Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
- Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition
- EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging
- Continual Learning to Generalize Forwarding Strategies for Diverse Mobile Wireless Networks
- Graph Mixing Additive Networks
- Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step
- HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
- Diffusion Models are Kelly Gamblers
- Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
- Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
- Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues
- MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control
- Toward Preference-aligned Large Language Models via Residual-based Model Steering
- The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact
- Guide: Generalized-Prior and Data Encoders for DAG Estimation
- The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis
- MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
- RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
- Node Classification via Simplicial Interaction with Augmented Maximal Clique Selection
- Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting
- Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence
- Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales
- ML-Asset Management: Curation, Discovery, and Utilization
- Improving the Efficiency of LLM Agent Systems through Trajectory Reduction
- Toward a Holistic Approach to Continual Model Merging
- Timber: Training-free Instruct Model Refining with Base via Effective Rank
- Multi-Level Heterogeneous Knowledge Transfer Network on Forward Scattering Center Model for Limited Samples SAR ATR
- Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
- InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects
- GraphIFE: Rethinking Graph Imbalance Node Classification via Invariant Learning
- BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
- Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment
- RIV: Recursive Introspection Mask Diffusion Vision Language Model
- LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
- ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
- Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
- Aligning LLMs for Multilingual Consistency in Enterprise Applications
- Pure Node Selection for Imbalanced Graph Node Classification
- Calibration Meets Reality: Making Machine Learning Predictions Trustworthy
- Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
- Graph Neural Networks with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion for Multivariate Time Series Forecasting
- RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
- Towards a Comprehensive Scaling Law of Mixture-of-Experts
- Joint Hybrid Beamforming and Artificial Noise Design for Secure Multi-UAV ISAC Networks
- Estimating Time Series Foundation Model Transferability via In-Context Learning
- CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement
- Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
- AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
- Video Panels for Long Video Understanding
- AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
- M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
- LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
- HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
- Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
- LocoFormer: Generalist Locomotion via Long-context Adaptation
- Poivre: Self-Refining Visual Pointing with Reinforcement Learning
- PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block
- Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
- PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
- DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
- ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following
- Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
- Dual-Space Smoothness for Robust and Balanced LLM Unlearning
- AI Education in Higher Education: A Taxonomy for Curriculum Reform and the Mission of Knowledge
- MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
- Graph Your Own Prompt
- CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
- Train Once, Answer All: Many Pretraining Experiments for the Cost of One
- Enhanced Fracture Diagnosis Based on Critical Regional and Scale Aware in YOLO
- PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
- Hybrid Graph Embeddings and Louvain Algorithm for Unsupervised Community Detection
- Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
- Enhancing Communication Efficiency in FL with Adaptive Gradient Quantization and Communication Frequency Optimization
- NeuroBridge: Using Generative AI to Bridge Cross-neurotype Communication Differences through Neurotypical Perspective-taking
- AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
- S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
- Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
- AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
- Data-Efficient Training by Evolved Sampling
- Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
- Multi-Modal Manipulation via Multi-Modal Policy Consensus
- Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
- Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
- Revisiting Multivariate Time Series Forecasting with Missing Values
- The Impact of Role Design in In-Context Learning for Large Language Models
- Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update
- From Human Annotation to Automation: LLM-in-the-Loop Active Learning for Arabic Sentiment Analysis
- Evaluating point-light biological motion in multimodal large language models
- ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
- Privy: Envisioning and Mitigating Privacy Risks for Consumer-facing AI Product Concepts
- Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis
- On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
- End-to-End Deep Learning for Predicting Metric Space-Valued Outputs
- Disentanglement of Variations with Multimodal Generative Modeling
- Automatic Speech Recognition for Greek Medical Dictation
- Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble
- Pancreas Part Segmentation under Federated Learning Paradigm
- HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing
- Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
- Liaohe-CobotMagic-PnP: an Imitation Learning Dataset of Intelligent Robot for Industrial Applications
- RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
- C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
- Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
- Deep Learning-Based Detection of Cognitive Impairment from Passive Smartphone Sensing with Routine-Aware Augmentation and Demographic Personalization
- Dense associative memory on the Bures-Wasserstein space
- TRAX: TRacking Axles for Accurate Axle Count Estimation
- WARBERT: A Hierarchical BERT-based Model for Web API Recommendation
- PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
- Towards Monotonic Improvement in In-Context Reinforcement Learning
- One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
- Leave No Observation Behind: Real-time Correction for VLA Action Chunks
- SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
- Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
- Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
- Online Dynamic Goal Recognition in Gym Environments
- Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
- Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
- Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
- Continuous-Time Reinforcement Learning for Asset-Liability Management
- A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
- A Neural ODE Approach to Aircraft Flight Dynamics Modelling
- Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
- MELCOT: A Hybrid Learning Architecture with Marginal Preservation for Matrix-Valued Regression
- Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
- Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
- Space Robotics Bench: Robot Learning Beyond Earth
- Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting
- Signal Preserving Weight Initialization for Odd-Sigmoid Activations
- Causally-Enhanced Reinforcement Policy Optimization
- CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
- Towards Quantum-Ready Blockchain Fraud Detection via Ensemble Graph Neural Networks
- Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
- Adaptive Margin RLHF via Preference over Preferences
- Patient-specific Biomolecular Instruction Tuning
- Observation-Free Attacks on Online Learning to Rank
- Scalable Wi-Fi RSS-Based Indoor Localization via Automatic Vision-Assisted Calibration
- From Noise to Knowledge: A Comparative Study of Acoustic Anomaly Detection Models in Pumped-storage Hydropower Plants
- Convolutional Set Transformer
- Extract-0: A Specialized Language Model for Document Information Extraction
- TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
- Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
- Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
- Large language models management of medications: three performance analyses
- MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
- Compute-Optimal Quantization-Aware Training
- Unsupervised Speech Enhancement using Data-defined Priors
- What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
- Tiny-QMoE
- Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic
- ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
- Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
- MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
- LLM Watermark Evasion via Bias Inversion
- Tracing the Representation Geometry of Language Models from Pretraining to Post-training
- DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
- Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
- GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
- Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
- IsingFormer: Augmenting Parallel Tempering With Learned Proposals
- MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
- Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
- Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
- Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
- Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
- From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents
- Localizing Adversarial Attacks To Produces More Imperceptible Noise
- TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
- IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
- LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
- A Data-Driven Framework for Digital Transformation in Smart Cities: Integrating AI, Dashboards, and IoT Readiness
- A Meta-Analysis of LLM Effects on Students across Qualification, Socialisation, and Subjectification
- Prompt-aware classifier free guidance for diffusion models
- Multi-Modal Sentiment Analysis with Dynamic Attention Fusion
- Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
- Rebuild AC Power Flow Models with Graph Attention Networks
- Automated Formative Feedback for Short-form Writing: An LLM-Driven Approach and Adoption Analysis
- Regulating the Agency of LLM-based Agents
- Consistency Models as Plug-and-Play Priors for Inverse Problems
- CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
- Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
- Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation
- Societal Capacity Assessment Framework: Measuring Resilience to Inform Advanced AI Risk Management
- Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
- Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
- MIRAGE: Multi-hop Reasoning with Ambiguity Evaluation for Illusory Questions
- Variance-Bounded Evaluation without Ground Truth: VB-Score
- Self-driving cars: Are we there yet?
- Persistent Autoregressive Mapping with Traffic Rules for Autonomous Driving
- Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security
- MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
- UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
- In-Context Learning can Perform Continual Learning Like Humans
- A theoretical guarantee for SyncRank
- Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
- Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data
- VideoScore2: Think before You Score in Generative Video Evaluation
- MTRec: Learning to Align with User Preferences via Mental Reward Models
- MMPB: It's Time for Multi-Modal Personalization
- Dynamic Buffers: Cost-Efficient Planning for Tabletop Rearrangement with Stacking
- Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
- Bridging Language Models and Formal Methods for Intent-Driven Optical Network Design
- Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
- Multimodal Slice Interaction Network Enhanced by Transfer Learning for Precise Segmentation of Internal Gross Tumor Volume in Lung Cancer PET/CT Imaging
- On the Self-awareness of Large Reasoning Models' Capability Boundaries
- Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG
- From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning
- TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
- Query Circuits: Explaining How Language Models Answer User Prompts
- Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
- PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System
- The Emergence of Social Science of Large Language Models
- RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
- Neural network embeddings recover value dimensions from psychometric survey items on par with human data
- Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
- MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
- When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?
- KIRETT - A wearable device to support rescue operations using artificial intelligence to improve first aid
- Agentic Exploration of Physics Models
- CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
- Scaling Synthetic Task Generation for Agents via Exploration
- Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
- HeDA: An Intelligent Agent System for Heatwave Risk Discovery through Automated Knowledge Graph Construction and Multi-layer Risk Propagation Analysis
- From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
- The Era of Real-World Human Interaction: RL from User Conversations
- Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
- Visual serial processing deficits explain divergences in human and VLM reasoning
- UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
- Who's Your Judge? On the Detectability of LLM-Generated Judgments
- Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
- BenLOC: A Benchmark for Learning to Configure MIP Optimizers
- YOLO-based Bearing Fault Diagnosis With Continuous Wavelet Transform
- Agentic DDQN-Based Scheduling for Licensed and Unlicensed Band Allocation in Sidelink Networks
- Green Learning for STAR-RIS mmWave Systems with Implicit CSI
- How are Scientific Concepts Birthed? Typing Rules of Concept Formation in Theoretical Physics Reasoning
- Sustainable LSTM-Based Precoding for RIS-Aided mmWave MIMO Systems with Implicit CSI
- GOAT: A Large Dataset of Paired Guitar Audio Recordings and Tablatures
- How good are LLMs at Retrieving Documents in a Specific Domain?
- Fairness for niche users and providers: algorithmic choice and profile portability
- Next Point-of-interest (POI) Recommendation Model Based on Multi-modal Spatio-temporal Context Feature Embedding
- PISA: An AI Pipeline for Interpretable-by-design Survival Analysis Providing Multiple Complexity-Accuracy Trade-off Models
- Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
- Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
- Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization
- AccessEval: Benchmarking Disability Bias in Large Language Models
- Intelligent Load Balancing in Cloud Computer Systems
- GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
- AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education through Automated Question Generation and Interactive Assessment
- Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
- AgentGuard: Runtime Verification of AI Agents
- Rethinking Reward Miscalibration of GRPO in Agentic RL
- Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B
- From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
- Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
- Automatic selection of primary studies in systematic reviews with evolutionary rule-based classification
- TusoAI: Agentic Optimization for Scientific Methods
- LLM/Agent-as-Data-Analyst: A Survey
- Future-Proofing Programmers: Optimal Knowledge Tracing for AI-Assisted Personalized Education
- Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
- Transparent, Evaluable, and Accessible Data Agents: A Proof-of-Concept Framework
- Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
- Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
- Humanline: Online Alignment as Perceptual Loss
- ELHPlan: Efficient Long-Horizon Task Planning for Multi-Agent Collaboration
- Learning to Ponder: Adaptive Reasoning in Latent Space
- Model Merging Scaling Laws in Large Language Models
- SpecExit: Accelerating Large Reasoning Model via Speculative Exit
- Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations
- Rethinking and Benchmarking Large Language Models for Graph Reasoning
- Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
- PAME-AI: Patient Messaging Creation and Optimization using Agentic AI
- AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
- G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
- SCI-Verifier: Scientific Verifier with Thinking
- Experience Paper: Adopting Activity Recognition in On-demand Food Delivery Business
- MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning
- humancompatible.detect: a Python Toolkit for Detecting Bias in AI Models
- Fin-Ally: Pioneering the Development of an Advanced, Commonsense-Embedded Conversational AI for Money Matters
- From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision
- Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs
- Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
- A Systematic Review of Digital Twin-Driven Predictive Maintenance in Industrial Engineering: Taxonomy, Architectural Elements, and Future Research Directions
- ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
- Overcoming Over-Fitting in Constraint Acquisition via Query-Driven Interactive Refinement
- Neuroplasticity-inspired dynamic ANNs for multi-task demand forecasting
- Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design
- Training Agents Inside of Scalable World Models
- BPMN Assistant: An LLM-Based Approach to Business Process Modeling
- LTL$_f$ Learning Meets Boolean Set Cover
- "Stop replacing salt with sugar!'': Towards Intuitive Human-Agent Teaching
- Successful Misunderstandings: Learning to Coordinate Without Being Understood
- SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems
- MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
- Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
- AI-Enhanced Distributed Channel Access for Collision Avoidance in Future Wi-Fi 8
- Limit Analysis for Symbolic Multi-step Reasoning Tasks with Information Propagation Rules Based on Transformers
- Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
- AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
- $p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
- Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions
- Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
- GUI-PRA: Process Reward Agent for GUI Tasks
- Socio-Economic Model of AI Agents
- Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
- Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
- From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
- Democratizing AI scientists using ToolUniverse
- Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
- ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
- GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models
- Accurate Predictions in Education with Discrete Variational Inference
- Mapping Overlaps in Benchmarks through Perplexity in the Wild
- Dynamic Trust Calibration Using Contextual Bandits
- Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
- DOoM: Difficult Olympiads of Math
- Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks
- Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
- A Hierarchical Structure-Enhanced Personalized Recommendation Model for Traditional Chinese Medicine Formulas Based on KG Diffusion Guidance
- Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
- BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
- PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
- Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
- How LLMs Learn to Reason: A Complex Network Perspective
- Game-Oriented ASR Error Correction via RAG-Enhanced LLM
- From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
- SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
- Measuring Sparse Autoencoder Feature Sensitivity
- MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models
- EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance
- Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark
- GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks
- Transparent Visual Reasoning via Object-Centric Agent Collaboration
- From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
- Falcon: A Cross-Modal Evaluation Dataset for Comprehensive Safety Perception
- From Frustration to Fun: An Adaptive Problem-Solving Puzzle Game Powered by Genetic Algorithm
- Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
- Can Large Language Models Develop Gambling Addiction?
- Hilbert: Recursively Building Formal Proofs with Informal Reasoning
- Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research
- JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
- Not only a helper, but also a teacher: Interactive LLM Cascade
- Towards Strategic Persuasion with Language Models
- AI Noether -- Bridging the Gap Between Scientific Laws Derived by AI Systems and Canonical Knowledge via Abductive Inference
- Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems
- Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
- Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
- Risk Profiling and Modulation for LLMs
- Multiplayer Nash Preference Optimization
- Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
- AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
- Exploring LLM-based Frameworks for Fault Diagnosis
- Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges
- Learning Smooth State-Dependent Traversability from Dense Point Clouds
- ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
- VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
- Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
- Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
- Origins of Creativity in Attention-Based Diffusion Models
- RAM-W1K: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis
- Warm Starts Accelerate Conditional Diffusion
- Vidar: Embodied Video Diffusion Model for Generalist Manipulation
- The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
- MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
- 3D-LATTE: Latent Space 3D Editing from Textual Instructions
- Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
- Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
- Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
- HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
- Implicit-ARAP: Efficient Handle-Guided Neural Field Deformation via Local Patch Meshing
- Differential Encoding for Improved Representation Learning over Graphs
- Attentive Dilated Convolution for Automatic Sleep Staging using Force-directed Layout
- Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
- Chronic Obstructive Pulmonary Disease Prediction Using Deep Convolutional Network
- Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature
- Towards agile multi-robot systems in the real world: Fast onboard tracking of active blinking markers for relative localization
- Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model
- GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
- Reconstruct Anything Model: a lightweight foundation model for computational imaging
- AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
- In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
- Sharpness-Aware Minimization with Z-Score Gradient Filtering
- RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
- Visual Planning: Let's Think Only with Images
- MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
- Scaling Diffusion Transformers Efficiently via $\mu$P
- AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
- Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
- ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction
- Vision Language Models are Biased
- Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
- Interaction Field Matching: Overcoming Limitations of Electrostatic Models
- Bridging Semantic Logic Gaps: A Cognition Inspired Multimodal Boundary Preserving Network for Image Manipulation Localization
- BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
- G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
- Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
- Semantic Discrepancy-aware Detector for Image Forgery Identification
- SpotEdit: Evaluating Visually-Guided Image Editing Methods
- Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders
- SemaMIL: Semantic-Aware Multiple Instance Learning with Retrieval-Guided State Space Modeling for Whole Slide Images
- Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
- Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
- Physics-Guided Null-Space Diffusion with Sparse Masking for Corrective Sparse-View CT Reconstruction
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
- Fracture Detection In X-rays Using Custom Convolutional Neural Network (CNN) And Transfer Learning Models
- DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection
- Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
- MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
- Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation
- ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
- Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
- VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
- VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
- Vid2World: Crafting Video Diffusion Models to Interactive World Models
- Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
- Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
- OViP: Online Vision-Language Preference Learning for VLM Hallucination
- Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
- InfoDet: A Dataset for Infographic Element Detection
- T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models
- Boosting Open Set Recognition Performance through Modulated Representation Learning
- CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
- Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
- ReDDiT: Rehashing Noise for Discrete Visual Generation
- GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
- Score Replacement with Bounded Deviation for Rare Prompt Generation
- MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on
- Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
- Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
- CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis
- ProxyThinker: Test-Time Guidance through Small Visual Reasoners
- EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
- EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM
- EgoVIS@CVPR: PAIR-Net: Enhancing Egocentric Speaker Detection via Pretrained Audio-Visual Fusion and Alignment Loss
- METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
- Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models
- Training-Free Diffusion Framework for Stylized Image Generation with Identity Preservation
- Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
- ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
- MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis
- Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
- DART: Differentiable Dynamic Adaptive Region Tokenizer for Vision Foundation Models
- Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models
- FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
- Do We Need Large VLMs for Spotting Soccer Actions?
- From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
- Improving Black-Box Generative Attacks via Generator Semantic Consistency
- OmniGen2: Exploration to Advanced Multimodal Generation
- Light of Normals: Unified Feature Representation for Universal Photometric Stereo
- SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution
- XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
- Controllable Reference Guided Diffusion with Local Global Fusion for Real World Remote Sensing Image Super Resolution
- Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
- FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection
- Counterfactual Visual Explanation via Causally-Guided Adversarial Steering
- 3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving
- NoiseSDF2NoiseSDF: Learning Clean Neural Fields from Noisy Supervision
- CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
- Disentangling Regional Primitives for Image Generation
- DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
- CART: Compositional Auto-Regressive Transformer for Image Generation
- Continuous Speculative Decoding for Autoregressive Image Generation
- Open-Vocabulary Online Semantic Mapping for SLAM
- GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing
- LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
- Measurement of Medial Elbow Joint Space using Landmark Detection
- Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
- PERSE: Personalized 3D Generative Avatars from A Single Portrait
- Training-Free Defense Against Adversarial Attacks in Deep Learning MRI Reconstruction
- MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
- CGI: Identifying Conditional Generative Models with Example Images
- Med-PU: Point Cloud Upsampling for High-Fidelity 3D Medical Shape Reconstruction
- DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
- Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
- 3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
- PoI: A Filter to Extract Pixel of Interest from Novel View Synthesis for Scene Coordinate Regression
- Bidirectional Uncertainty-Aware Region Learning for Semi-Supervised Medical Image Segmentation
- IM360: Large-scale Indoor Mapping with 360 Cameras
- VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
- Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks
- High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
- Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
- Exploring Reprensentation Invariance in Finetuning
- UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
- Controllable Adversarial Makeup for Privacy via Text-Guided Diffusion
- A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
- Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
- DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
- Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
- Efficient Self-Supervised Adaptation for Medical Image Analysis
- Audio-centric Video Understanding Benchmark without Text Shortcut
- Beyond Synthetic Replays: Turning Diffusion Features into Few-Shot Class-Incremental Learning Knowledge
- SCRAMBLe : Enhancing Multimodal LLM Compositionality with Synthetic Preference Data
- From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
- Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
- HSACNet: Hierarchical Scale-Aware Consistency Regularized Semi-Supervised Change Detection
- Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
- DreamO: A Unified Framework for Image Customization
- S2S-Net: Addressing the Domain Gap of Heterogeneous Sensor Systems in LiDAR-Based Collective Perception
- ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
- Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search
- QVGen: Pushing the Limit of Quantized Video Generative Models
- ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing
- Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
- Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
- DFG-PCN: Point Cloud Completion with Degree-Flexible Point Graph
- StrucADT: Generating Structure-controlled 3D Point Clouds with Adjacency Diffusion Transformer
- Diff-3DCap: Shape Captioning with Diffusion Models
- GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
- Transparent Visual Reasoning via Object-Centric Agent Collaboration
- Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
- ReLumix: Extending Image Relighting to Video via Video Diffusion Models
- FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
- AISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition Baselines
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
- Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
- Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition
- A University of Texas Medical Branch Case Study on Aortic Calcification Detection
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
- GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
- End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex
- AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
- Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
- Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
- Neural Visibility of Point Sets
- Semantic Editing with Coupled Stochastic Differential Equations
- Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
- PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
- Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
- ReCon-GS: Continuum-Preserved Guassian Streaming for Fast and Compact Reconstruction of Dynamic Scenes
- TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
- Wavelet-Assisted Mamba for Satellite-Derived Sea Surface Temperature Super-Resolution
- Hybrid Layer-Wise ANN-SNN With Surrogate Spike Encoding-Decoding Structure
- A Novel Preprocessing Unit for Effective Deep Learning based Classification and Grading of Diabetic Retinopathy
- SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems
- Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
- CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations
- A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
- VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
- Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
- DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
- Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes
- Score-based Membership Inference on Diffusion Models
- Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting
- AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
- CharGen: Fast and Fluent Portrait Modification
- Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
- MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
- LayerD: Decomposing Raster Graphic Designs into Layers
- Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
- Learning to Infer Unseen Single-/Multi-Attribute-Object Compositions with Graph Networks
- TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
- Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry
- fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
- Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
- VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
- Fast Real-Time Pipeline for Robust Arm Gesture Recognition
- A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
- GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction
- BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
- UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
- MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification
- Triangle Splatting+: Differentiable Rendering with Opaque Triangles
- Score Distillation of Flow Matching Models
- TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
- Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
- VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
- GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
- Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
- Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
- YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection
- Personalized Vision via Visual In-Context Learning
- Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding
- GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
- DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
- DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
- PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos
- PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images
- FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
- Visual Jigsaw Post-Training Improves MLLMs
- VGGT-X: When VGGT Meets Dense Novel View Synthesis
- Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
- YOLO-based Bearing Fault Diagnosis With Continuous Wavelet Transform
- VIRTUS-FPP: Virtual Sensor Modeling for Fringe Projection Profilometry in NVIDIA Isaac Sim
- ReSeFlow: Rectifying SE(3)-Equivariant Policy Learning Flows
- Explainable Deep Learning for Cataract Detection in Retinal Images: A Dual-Eye and Knowledge Distillation Approach
- Localizing Adversarial Attacks To Produces More Imperceptible Noise
- Achieving Fair Skin Lesion Detection through Skin Tone Normalization and Channel Pruning
- Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
- Consistency Models as Plug-and-Play Priors for Inverse Problems
- Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
- Self-driving cars: Are we there yet?
- Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model
- MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
- LLMs Behind the Scenes: Enabling Narrative Scene Illustration
- Robot Learning from Any Images
- ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
- UniPrototype: Humn-Robot Skill Learning with Uniform Prototypes
- AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
- Leave No Observation Behind: Real-time Correction for VLA Action Chunks
- Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
- Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
- Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models
- DiffTex: Differentiable Texturing for Architectural Proxy Models
- Graph Your Own Prompt
- CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
- S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
- Temporal Generalization: A Reality Check
- RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
- Automated design of compound lenses with discrete-continuous optimization
- BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
- StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
- Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
- Evaluating Temperature Scaling Calibration Effectiveness for CNNs under Varying Noise Levels in Brain Tumour Detection
- Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots
- Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning
- On-the-Fly Data Augmentation for Brain Tumor Segmentation
- Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel
- SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation
- PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
- LVT: Large-Scale Scene Reconstruction via Local View Transformers
- CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
- GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
- STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation
- TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models
- SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics
- BFSM: 3D Bidirectional Face-Skull Morphable Model
- Comprehensive Benchmarking of YOLOv11 Architectures for Scalable and Granular Peripheral Blood Cell Detection
- Biomechanical-phase based Temporal Segmentation in Sports Videos: a Demonstration on Javelin-Throw
- FreeRet: MLLMs as Training-Free Retrievers
- Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
- RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement
- Learning Object-Centric Representations Based on Slots in Real World Scenarios
- VNODE: A Piecewise Continuous Volterra Neural Network
- Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation
- Traumatic Brain Injury Segmentation using an Ensemble of Encoder-decoder Models
- SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
- Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility
- IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
- Evaluation of Polarimetric Fusion for Semantic Segmentation in Aquatic Environments
- Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
- Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm
- ExGS: Extreme 3D Gaussian Compression with Diffusion Priors
- VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
- SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment
- LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
- Vision Function Layer in Multimodal LLMs
- Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
- TACO-Net: Topological Signatures Triumph in 3D Object Classification
- UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
- Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models
- PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement
- ELPG-DTFS: Prior-Guided Adaptive Time-Frequency Graph Neural Network for EEG Depression Diagnosis
- Vision At Night: Exploring Biologically Inspired Preprocessing For Improved Robustness Via Color And Contrast Transformations
- StreamForest: Efficient Online Video Understanding with Persistent Event Memory
- Environment-Aware Satellite Image Generation with Diffusion Models
- ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation
- Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs
- MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
- VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
- DWGS: Enhancing Sparse-View Gaussian Splatting with Hybrid-Loss Depth Estimation and Bidirectional Warping
- DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation
- Accurate Cobb Angle Estimation via SVD-Based Curve Detection and Vertebral Wedging Quantification
- Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer
- OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
- Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
- Segmentor-Guided Counterfactual Fine-Tuning for Image Synthesis
- Scalable GANs with Transformers
- Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
- Tumor Synthesis conditioned on Radiomics
- Simulating Post-Neoadjuvant Chemotherapy Breast Cancer MRI via Diffusion Model with Prompt Tuning
- Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection
- An Efficient 3D Latent Diffusion Model for T1-contrast Enhanced MRI Generation
- UniVid: The Open-Source Unified Video Model
- BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation
- Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos
- Scalable Audio-Visual Masked Autoencoders for Efficient Affective Video Facial Analysis
- EVLF-FM: Explainable Vision Language Foundation Model for Medicine
- FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation
- Latent Visual Reasoning
- When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
- S$^2$NN: Sub-bit Spiking Neural Networks
- Cycle Diffusion Model for Counterfactual Image Generation
- Skeleton-based Robust Registration Framework for Corrupted 3D Point Clouds
- Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context
- ASIA: Adaptive 3D Segmentation using Few Image Annotations
- SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation
- FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
- OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
- Towards Foundation Models for Cryo-ET Subtomogram Analysis
- Similarity-Aware Selective State-Space Modeling for Semantic Correspondence
- TP-MVCC: Tri-plane Multi-view Fusion Model for Silkie Chicken Counting
- Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
- Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
- NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis
- An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation
- DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
- UI-UG: A Unified MLLM for UI Understanding and Generation
- Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
- Real-Aware Residual Model Merging for Deepfake Detection
- From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
- DINOReg: Strong Point Cloud Registration with Vision Foundation Model
- Mask Clustering-based Annotation Engine for Large-Scale Submeter Land Cover Mapping
- REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport
- Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
- PCICF: A Pedestrian Crossing Identification and Classification Framework
- RapidMV: Leveraging Spatio-Angular Representations for Efficient and Consistent Text-to-Multi-View Synthesis
- CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers
- A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models
- Proxy-GS: Efficient 3D Gaussian Splatting via Proxy Mesh
- Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
- UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark
- NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
- Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
- Generalist Multi-Class Anomaly Detection via Distillation to Two Heterogeneous Student Networks
- LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation
- Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
- Performance-Efficiency Trade-off for Fashion Image Retrieval
- Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
- Robust Multimodal Semantic Segmentation with Balanced Modality Contributions
- Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency
- CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
- CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
- Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis
- Foggy Crowd Counting: Combining Physical Priors and KAN-Graph
- NeMo: Needle in a Montage for Video-Language Understanding
- Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
- Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation
- 2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC
- Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric
- CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
- FairViT-GAN: A Hybrid Vision Transformer with Adversarial Debiasing for Fair and Explainable Facial Beauty Prediction
- Sim-DETR: Unlock DETR for Temporal Sentence Grounding
- Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
- PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
- Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection
- Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction
- AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
- LifeCLEF Plant Identification Task 2015
- Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
- Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering
- LifeCLEF Plant Identification Task 2014
- EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging
- Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation
- EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
- MoReact: Generating Reactive Motion from Textual Descriptions
- Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
- Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
- Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
- DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation
- Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks
- SAR-KnowLIP: Towards Multimodal Foundation Models for Remote Sensing
- AutoPrune: Each Complexity Deserves a Pruning Policy
- CrashSplat: 2D to 3D Vehicle Damage Segmentation in Gaussian Splatting
- HunyuanImage 3.0 Technical Report
- ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
- Reinforcement Learning with Inverse Rewards for World Model Post-training
- A Novel Hybrid Deep Learning and Chaotic Dynamics Approach for Thyroid Cancer Classification
- VFSI: Validity First Spatial Intelligence for Constraint-Guided Traffic Diffusion
- Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution
- RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization
- Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
- TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction
- Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
- SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss
- FrameMind: Frame-Interleaved Chain-of-Thought for Video Reasoning via Reinforcement Learning
- Generalized Category Discovery in Hyperspectral Images via Prototype Subspace Modeling
- Hazy Pedestrian Trajectory Prediction via Physical Priors and Graph-Mamba
- $\mathbf{R}^3$: Reconstruction, Raw, and Rain: Deraining Directly in the Bayer Domain
- Joint Superpixel and Self-Representation Learning for Scalable Hyperspectral Image Clustering
- A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
- Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding
- Autoregressive Video Generation beyond Next Frames Prediction
- Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow
- SVAC: Scaling Is All You Need For Referring Video Object Segmentation
- GANji: A Framework for Introductory AI Image Generation
- Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
- EYE-DEX: Eye Disease Detection and EXplanation System
- Analysis of Bias in Deep Learning Facial Beauty Regressors
- Asymmetric VAE for One-Step Video Super-Resolution Acceleration
- Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
- LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis
- High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
- Evaluating point-light biological motion in multimodal large language models
- Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis
- Calibrated and Resource-Aware Super-Resolution for Reliable Driver Behavior Analysis
- OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
- From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations
- Pancreas Part Segmentation under Federated Learning Paradigm
- Towards Interpretable Visual Decoding with Attention to Brain Representations
- RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
- VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement
- Multi-Level Heterogeneous Knowledge Transfer Network on Forward Scattering Center Model for Limited Samples SAR ATR
- VAMamba: An Efficient Visual Adaptive Mamba for Image Restoration
- Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery
- MAN: Latent Diffusion Enhanced Multistage Anti-Noise Network for Efficient and High-Quality Low-Dose CT Image Denoising
- VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
- FlowLUT: Efficient Image Enhancement via Differentiable LUTs and Iterative Flow Matching
- InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects
- BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
- DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
- RIV: Recursive Introspection Mask Diffusion Vision Language Model
- Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
- MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing
- LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
- EfficientMIL: Efficient Linear-Complexity MIL Method for WSI Classification
- From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving
- Griffin: Generative Reference and Layout Guided Image Composition
- Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures
- Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
- ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
- LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
- HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
- Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
- RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
- MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy
- QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
- HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection
- Confidence Aware SSD Ensemble with Weighted Boxes Fusion for Weapon Detection
- INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception
- CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement
- PD-Diag-Net: Clinical-Priors guided Network on Brain MRI for Auxiliary Diagnosis of Parkinson's Disease
- DiffPCN: Latent Diffusion Model Based on Multi-view Depth Images for Point Cloud Completion
- Video Panels for Long Video Understanding
- M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
- LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
- FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention
- HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
- GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State
- ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning
- Poivre: Self-Refining Visual Pointing with Reinforcement Learning
- PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block
- UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
- GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning
- A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning
- Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
- GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
- From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation
- Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance
- A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
- Real-World Transferable Adversarial Attack on Face-Recognition Systems
- UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
- Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
- Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
- TATTOO: Training-free AesTheTic-aware Outfit recOmmendation
- Increasing the Diversity in RGB-to-Thermal Image Translation for Automotive Applications
- LiDAR-based Human Activity Recognition through Laplacian Spectral Analysis
- OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting
- Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
- SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction
- Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
- Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection
- Seeing the Unseen in Low-light Spike Streams
- Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification
- Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
- C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
- Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion
- Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning
- DDP: Dual-Decoupled Prompting for Multi-Label Class-Incremental Learning
- LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning
- DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
- Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
- Test-time Uncertainty Estimation for Medical Image Registration via Transformation Equivariance
- GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval
- CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation
- UniPose: Unified Cross-modality Pose Prior Propagation towards RGB-D data for Weakly Supervised 3D Human Pose Estimation
- Generative Modeling of Shape-Dependent Self-Contact Human Poses
- WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
- Enhanced Fracture Diagnosis Based on Critical Regional and Scale Aware in YOLO
- FracDetNet: Advanced Fracture Detection via Dual-Focus Attention and Multi-scale Calibration in Medical X-ray Imaging
- SPIKE-RL: Video-LLMs meet Bayesian Surprise
- FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation
- FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing
- 3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras
- No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation
- Robust Multi-Modal Face Anti-Spoofing with Domain Adaptation: Tackling Missing Modalities, Noisy Pseudo-Labels, and Model Degradation
- RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
- Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
- Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
- Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update
- Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis
- FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
- Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
- Streamline pathology foundation model by cross-magnification distillation
- CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
- Deep Learning for Oral Health: Benchmarking ViT, DeiT, BEiT, ConvNeXt, and Swin Transformer
- HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing
- Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM
- Stochastic Interpolants via Conditional Dependent Coupling
- Benchmarking DINOv3 for Multi-Task Stroke Analysis on Non-Contrast CT
- Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
- WeatherCycle: Unpaired Multi-Weather Restoration via Color Space Decoupled Cycle Learning
- Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction
- TRAX: TRacking Axles for Accurate Axle Count Estimation
- Confidence-Calibrating Regularization for Robust Brain MRI Segmentation Under Domain Shift
- Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss
- Pathological Truth Bias in Vision-Language Models
- Scale and Rotation Estimation of Similarity-Transformed Images via Cross-Correlation Maximization Based on Auxiliary Function Method
- Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization
- Graph-Theoretic Consistency for Robust and Topology-Aware Semi-Supervised Histopathology Segmentation
- A review of Recent Techniques for Person Re-Identification
- Sequential Token Merging: Revisiting Hidden States
- Deep Learning Empowered Super-Resolution: A Comprehensive Survey and Future Prospects
- Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
- Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning
- GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
- IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
- LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
- CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
- MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
- UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
- PartCo: Part-Level Correspondence Priors Enhance Category Discovery
- DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models
- VideoScore2: Think before You Score in Generative Video Evaluation
- TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
- MMPB: It's Time for Multi-Modal Personalization
- Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
- Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
- Multimodal Slice Interaction Network Enhanced by Transfer Learning for Precise Segmentation of Internal Gross Tumor Volume in Lung Cancer PET/CT Imaging
- ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models
- Learning KAN-based Implicit Neural Representations for Deformable Image Registration
- Convolutional Set Transformer
- TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
- Learning Unified Representation of 3D Gaussian Splatting
- Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
- FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning
- Brain Tumor Classification from MRI Scans via Transfer Learning and Enhanced Feature Representation
- Hemorica: A Comprehensive CT Scan Dataset for Automated Brain Hemorrhage Classification, Segmentation, and Detection
- ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View
- Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition
- Desensitizing for Improving Corruption Robustness in Point Cloud Classification through Adversarial Training
- Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
- Planning with Unified Multimodal Models
- Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy
- Perceptual Influence: Improving the Perceptual Loss Design for Low-Dose CT Enhancement
- Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
- GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization
- MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
- Activation Matching for Explanation Generation
- InfoDet: A Dataset for Infographic Element Detection
- On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
- Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
- Reward Model Overoptimisation in Iterated RLHF
- TabularGSM: Understanding the Limitations of LLMs in Tabular Math Reasoning
- HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
- Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
- ProxyThinker: Test-Time Guidance through Small Visual Reasoners
- Comba: Improving Bilinear RNNs with Closed-loop Control
- VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
- InstructPro: Natural Language Guided Ligand-Binding Protein Design
- One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence
- Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
- Discrete Audio Tokens: More Than a Survey!
- Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
- OmniGen2: Exploration to Advanced Multimodal Generation
- Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
- One Token to Fool LLM-as-a-Judge
- MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
- Probabilistic Soundness Guarantees in LLM Reasoning Chains
- A Markov Categorical Framework for Language Modeling
- Can Language Models Discover Scaling Laws?
- CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent
- Trainable Dynamic Mask Sparse Attention
- AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models
- Attention Layers Add Into Low-Dimensional Residual Subspaces
- Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
- Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
- Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
- FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
- TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
- Patterns in the Transition From Founder-Leadership to Community Governance of Open Source
- PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
- The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging
- Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
- Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism
- Semantic-guided Diverse Decoding for Large Language Model
- Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
- PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
- CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
- ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
- Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
- Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- Making Language Model a Hierarchical Classifier
- LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
- The Ever-Evolving Science Exam
- CTTS: Collective Test-Time Scaling
- Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
- Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports
- Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
- CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
- Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
- When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
- Causal Attention with Lookahead Keys
- ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
- Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
- Agentic Reinforcement Learning with Implicit Step Rewards
- Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
- Position: Towards Bidirectional Human-AI Alignment
- A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
- NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
- Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
- CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning
- A Neurosymbolic Fast and Slow Architecture for Graph Coloring
- Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
- Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
- Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
- vCache: Verified Semantic Prompt Caching
- SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
- Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
- Reasoning to Learn from Latent Thoughts
- MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
- Do Larger Language Models Generalize Better? A Scaling Law for Implicit Reasoning at Pretraining Time
- Visual Planning: Let's Think Only with Images
- Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
- AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
- Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
- OViP: Online Vision-Language Preference Learning for VLM Hallucination
- AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
- DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
- LLMs Are In-Context Bandit Reinforcement Learners
- AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment
- DM-Codec: Distilling Multimodal Representations for Speech Tokenization
- When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
- Adapting Chat Language Models Using Only Target Unlabeled Language Data
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation
- A Partition Cover Approach to Tokenization
- A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models
- ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability Analysis
- Beyond checkmate: exploring the creative chokepoints in AI text
- Which Words Matter Most in Zero-Shot Prompts?
- UltraIF: Advancing Instruction Following from the Wild
- Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
- Confidence Improves Self-Consistency in LLMs
- PAFT: Prompt-Agnostic Fine-Tuning
- B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
- PropXplain: Can LLMs Enable Explainable Propaganda Detection?
- MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
- Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
- How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
- Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
- SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
- XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
- SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
- AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
- A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content
- Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
- DataPuzzle: Breaking Free from the Hallucinated Promise of LLMs in Data Analysis
- Efficient Reasoning Models: A Survey
- IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
- Dynamic Early Exit in Reasoning Models
- TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
- Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
- $\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
- References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
- OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
- VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
- Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
- The Counting Power of Transformers
- Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
- AdaBoN: Adaptive Best-of-N Alignment
- MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
- Automatically Advancing LLM Expertise in Technology Judgment
- Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
- Mechanistic Fine-tuning for In-context Learning
- Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
- Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
- ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
- Multilingual Prompting for Improving LLM Generation Diversity
- Generalizable Process Reward Models via Formally Verified Training Data
- Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
- Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation
- ToDi: Token-wise Distillation via Fine-Grained Divergence Control
- Nested Named Entity Recognition as Single-Pass Sequence Labeling
- A Survey on Stereotype Detection in Natural Language Processing
- BRIT: Bidirectional Retrieval over Unified Image-Text Graph
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
- A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
- From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
- TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
- Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration
- SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
- Evaluating and Steering Modality Preferences in Multimodal Large Language Model
- Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
- Semi-structured LLM Reasoners Can Be Rigorously Audited
- Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
- Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
- Answer Convergence as a Signal for Early Stopping in Reasoning
- Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
- Improving LLM Reasoning through Interpretable Role-Playing Steering
- What Do Indonesians Really Need from Language Technology? A Nationwide Survey
- AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
- Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
- Curriculum-Guided Layer Scaling for Language Model Pretraining
- Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
- BOW: Reinforcement Learning for Bottlenecked Next Word Prediction
- Long-Context Generalization with Sparse Attention
- GRAF: Multi-turn Jailbreaking via Global Refinement and Active Fabrication
- Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
- Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
- Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design
- LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
- NeMo: Needle in a Montage for Video-Language Understanding
- OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
- On the Self-awareness of Large Reasoning Models' Capability Boundaries
- VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
- Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
- Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
- MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
- Neural network embeddings recover value dimensions from psychometric survey items on par with human data
- MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
- When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
- DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern
- Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study
- Scaling with Collapse: Efficient and Predictable Training of LLM Families
- ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
- From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
- MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
- Rethinking Entropy Regularization in Large Reasoning Models
- The Era of Real-World Human Interaction: RL from User Conversations
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
- TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
- GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
- SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
- WordAlchemy: A transformer-based Reverse Dictionary
- Continual Dialogue State Tracking via Example-Guided Question Answering
- CGELBank Annotation Manual v1.2
- Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora
- Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives
- Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
- Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
- CiteFusion: An Ensemble Framework for Citation Intent Classification Harnessing Dual-Model Binary Couples and SHAP Analyses
- LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
- Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
- SPIKE-RL: Video-LLMs meet Bayesian Surprise
- FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing
- MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
- Temporal Generalization: A Reality Check
- Mapping Overlaps in Benchmarks through Perplexity in the Wild
- Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
- Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
- RIV: Recursive Introspection Mask Diffusion Vision Language Model
- From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation
- RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
- From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
- Towards a Comprehensive Scaling Law of Mixture-of-Experts
- HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection
- SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
- Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Holdem
- Anchored Supervised Fine-Tuning
- From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
- Knowledge Homophily in Large Language Models
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
- PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
- Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
- Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
- Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
- Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
- Detecting and Rectifying Noisy Labels: A Similarity-based Approach
- The Role of Logic and Automata in Understanding Transformers
- Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
- Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
- Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
- Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
- Metamorphic Testing for Audio Content Moderation Software
- Learning to Ponder: Adaptive Reasoning in Latent Space
- SpecExit: Accelerating Large Reasoning Model via Speculative Exit
- Latent Visual Reasoning
- Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
- PAME-AI: Patient Messaging Creation and Optimization using Agentic AI
- AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
- Overview of SCIDOCA 2025 Shared Task on Citation Prediction, Discovery, and Placement
- SCI-Verifier: Scientific Verifier with Thinking
- Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
- MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
- Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
- Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
- Reinforcement Mid-Training
- HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
- LLaDA-MoE: A Sparse MoE Diffusion Language Model
- Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling
- Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents
- CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task
- Alternatives To Next Token Prediction In Text Generation - A Survey
- Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset
- A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems
- Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
- GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
- Knowledge Editing with Subspace-Aware Key-Value Mappings
- Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
- AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
- Inducing Dyslexia in Vision Language Models
- HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition
- Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
- InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
- Understanding the Dilemma of Unlearning for Large Language Models
- Reference-Free Rating of LLM Responses via Latent Information
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
- Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution
- ProxyAttn: Guided Sparse Attention via Representative Heads
- LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space
- SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models
- Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
- KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
- DiaCDM: Cognitive Diagnosis in Teacher-Student Dialogues using the Initiation-Response-Evaluation Framework
- SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
- Hierarchical Error Correction for Large Language Models: A Systematic Framework for Domain-Specific AI Quality Enhancement
- Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
- Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning
- Expanding Computation Spaces of LLMs at Inference Time
- BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications
- How Well Do LLMs Imitate Human Writing Style?
- MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
- The Dialogue That Heals: A Comprehensive Evaluation of Doctor Agents' Inquiry Capability
- SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
- Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
- Circuit Distillation
- Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
- GateMABSA: Aspect-Image Gated Fusion for Multimodal Aspect-based Sentiment Analysis
- Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
- Confidence-Guided Error Correction for Disordered Speech Recognition
- An empirical study on the limitation of Transformers in program trace generation
- Scaling Generalist Data-Analytic Agents
- jina-reranker-v3: Last but Not Late Interaction for Document Reranking
- Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs
- Towards Personalized Deep Research: Benchmarks and Evaluations
- Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?
- Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection
- Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
- Pretraining Large Language Models with NVFP4
- EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
- NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
- Incentive-Aligned Multi-Source LLM Summaries
- Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
- InfoAgent: Advancing Autonomous Information-Seeking Agents
- CAOTE: KV Cache Selection for LLMs via Attention Output Error-Based Token Eviction
- Multiplicative-Additive Constrained Models:Toward Joint Visualization of Interactive and Independent Effects
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
- DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
- VideoScore2: Think before You Score in Generative Video Evaluation
- Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research
- Adaptive Margin RLHF via Preference over Preferences
- Patient-specific Biomolecular Instruction Tuning
- JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
- Not only a helper, but also a teacher: Interactive LLM Cascade
- Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
- Tracing the Representation Geometry of Language Models from Pretraining to Post-training
- Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
- Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
- Causally-Enhanced Reinforcement Policy Optimization
- Multiplayer Nash Preference Optimization
- RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
- C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
- SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
- $p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
- Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
- Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
- PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
- Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT
- Train Once, Answer All: Many Pretraining Experiments for the Cost of One
- No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization
- Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
- Comparison of Scoring Rationales Between Large Language Models and Human Raters
- Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
- Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models
- Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
- The Impact of Role Design in In-Context Learning for Large Language Models
- AraS2P: Arabic Speech-to-Phonemes System
- From Human Annotation to Automation: LLM-in-the-Loop Active Learning for Arabic Sentiment Analysis
- On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
- Automatic Speech Recognition for Greek Medical Dictation
- Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales
- Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks
- LLM Hallucination Detection: HSAD
- Timber: Training-free Instruct Model Refining with Base via Effective Rank
- Fast Thinking for Large Language Models
- Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
- Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
- Aligning LLMs for Multilingual Consistency in Enterprise Applications
- TF-Bench: Evaluating Program Semantics Reasoning with Type Inference in System F
- VIVA+: Human-Centered Situational Decision-Making
- Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion
- Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
- Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
- Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
- Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
- From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization
- Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions
- Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
- Open-DeBias: Toward Mitigating Open-Set Bias in Language Models
- SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
- Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
- DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning
- Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step
- Assessing Large Language Models in Updating Their Forecasts with New Information
- Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
- Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues
- HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
- ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation
- Toward Preference-aligned Large Language Models via Residual-based Model Steering
- The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact
- The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis
- MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
- Sequential Diffusion Language Models
- SparseD: Sparse Attention for Diffusion Language Models
- ResFormer: All-Time Reservoir Memory for Long Sequence Classification
- Ensembling Multilingual Transformers for Robust Sentiment Analysis of Tweets
- Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
- GEAR: A General Evaluation Framework for Abductive Reasoning
- BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
- Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Distributional Semantics
- Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems
- EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
- Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
- Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
- Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
- Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
- Retrieval-augmented GUI Agents with Generative Guidelines
- Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
- PET: Preference Evolution Tracking with LLM-Generated Explainable Distribution
- AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
- Can Large Language Models Express Uncertainty Like Human?
- BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
- ScenarioBench: Trace-Grounded Compliance Evaluation for Text-to-SQL and RAG
- MoVa: Towards Generalizable Classification of Human Morals and Values
- Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents
- Prompt and Parameter Co-Optimization for Large Language Models
- MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation
- SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
- Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
- LOGOS: LLM-driven End-to-End Grounded Theory Development and Schema Induction for Qualitative Research
- DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
- Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs
- Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
- Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey
- Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
- AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
- Multi-Modal Sentiment Analysis with Dynamic Attention Fusion
- Enabling Approximate Joint Sampling in Diffusion LMs
- Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
- MIRAGE: Multi-hop Reasoning with Ambiguity Evaluation for Illusory Questions
- ML2B: Multi-Lingual ML Benchmark For AutoML
- ArFake: A Multi-Dialect Benchmark and Baselines for Arabic Spoof-Speech Detection
- EditGRPO: Reinforcement Learning with Post -Rollout Edits for Clinically Accurate Chest X-Ray Report Generation
- Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
- ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
- Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
- Towards Generalizable Implicit In-Context Learning with Attention Routing
- The Bias is in the Details: An Assessment of Cognitive Bias in LLMs
- Lexicon-Enriched Graph Modeling for Arabic Document Readability Prediction
- HEART: Emotionally-driven test-time scaling of Language Models
- Infusing Theory of Mind into Socially Intelligent LLM Agents
- Extract-0: A Specialized Language Model for Document Information Extraction
- Large language models management of medications: three performance analyses
- LLMs Behind the Scenes: Enabling Narrative Scene Illustration
- What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
- Emergent morpho-phonological representations in self-supervised speech models
- Same Content, Different Representations: A Controlled Study for Table QA
- ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
- AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
- Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate
- Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
- From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents
- The Geometry of Creative Variability: How Credal Sets Expose Calibration Gaps in Language Models
- d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
- How to Make Large Language Models Generate 100% Valid Molecules?
- Non-Collaborative User Simulators for Tool Agents
- Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning
- Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models
- Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
- Pretraining LLM with Latent Thoughts in Continuous Space
- Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
- Estimating the strength and timing of syntactic structure building in naturalistic reading
- From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs
- Global Beats, Local Tongue: Studying Code Switching in K-pop Hits on Billboard Charts
- Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2
- PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
- A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks
- Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
- Fin-ExBERT: User Intent based Text Extraction in Financial Context using Graph-Augmented BERT and trainable Plugin
- A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
- Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
- Learning to Reason in Structured In-context Environments with Reinforcement Learning
- C-Evolve: Consensus-based Evolution for Prompt Groups
- Dual-Space Smoothness for Robust and Balanced LLM Unlearning
- MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
- CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
- Are you sure? Measuring models bias in content moderation through uncertainty
- AccessEval: Benchmarking Disability Bias in Large Language Models
- RAR$^2$: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval
- TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
- DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
- MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
- BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
- LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
- Diffusion models for multivariate subsurface generation and efficient probabilistic inversion
- When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
- Trainable Dynamic Mask Sparse Attention
- Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
- The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
- Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
- PakBBQ: A Culturally Adapted Bias Benchmark for QA
- BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
- Flow Matching for Efficient and Scalable Data Assimilation
- ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
- Transduction is All You Need for Structured Data Workflows
- Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
- Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
- SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
- Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation
- Constrained Decoding for Robotics Foundation Models
- MAUSAM: An Observations-focused assessment of Global AI Weather Prediction Models During the South Asian Monsoon
- Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
- Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance
- The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
- Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
- Code2MCP: Transforming Code Repositories into MCP Services
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
- COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
- HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
- Do Natural Language Descriptions of Model Activations Convey Privileged Information?
- Imagined Autocurricula
- TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
- Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
- Multi-Scenario Highway Lane-Change Intention Prediction: A Physics-Informed AI Framework for Three-Class Classification
- Diversity Boosts AI-Generated Text Detection
- Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks
- Experience Deploying Containerized GenAI Services at an HPC Center
- Combinatorial Creativity: A New Frontier in Generalization Abilities
- Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
- XL-Suite: Cross-Lingual Synthetic Training and Evaluation Data for Open-Ended Generation
- SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
- A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content
- Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
- Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
- When Federated Learning Meets Quantum Computing: Survey and Research Opportunities
- Evolution Meets Diffusion: Efficient Neural Architecture Generation
- TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
- Sobolev norm inconsistency of kernel interpolation
- Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
- $\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
- ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
- References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
- OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
- The Counting Power of Transformers
- Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
- Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement
- AdaBoN: Adaptive Best-of-N Alignment
- FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
- Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
- TranSUN: A Preemptive Paradigm to Eradicate Retransformation Bias Intrinsically from Regression Models in Recommender Systems
- Mechanistic Fine-tuning for In-context Learning
- Vid2World: Crafting Video Diffusion Models to Interactive World Models
- Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
- Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
- ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
- Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
- Flexible MOF Generation with Torsion-Aware Flow Matching
- Boosting Open Set Recognition Performance through Modulated Representation Learning
- PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
- HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
- SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
- Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise
- Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models
- Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
- ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
- Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
- Flexible and Efficient Drift Detection without Labels
- AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
- When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
- Constant Bit-size Transformers Are Turing Complete
- Do We Need Large VLMs for Spotting Soccer Actions?
- Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs
- R1-Ranker: Teaching LLM Rankers to Reason
- Breaking Rank Bottlenecks in Knowledge Graph Embeddings
- XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
- Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
- Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes
- IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
- Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
- CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
- Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- Is Thompson Sampling Susceptible to Algorithmic Collusion?
- SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals
- FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction
- CommonPower: A Framework for Safe Data-Driven Smart Grid Control
- fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
- Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions
- Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
- Bayesian Autoregressive Online Change-Point Detection with Time-Varying Parameters
- A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
- Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning
- LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
- Attentive Dilated Convolution for Automatic Sleep Staging using Force-directed Layout
- An Empirical Study on the Computation Budget of Co-Optimization of Robot Design and Control in Simulation
- On the Effect of Instability on Learning Continuous-Time Linear Control Systems
- Disentangling Regional Primitives for Image Generation
- LLMs Are In-Context Bandit Reinforcement Learners
- A quantitative Robbins-Siegmund theorem
- CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning
- PACER: Physics Informed Uncertainty Aware Climate Emulator
- When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
- UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
- CART: Compositional Auto-Regressive Transformer for Image Generation
- Gaussian Process Priors for Boundary Value Problems of Linear Partial Differential Equations
- Break the ID-Language Barrier: An Adaption Framework for LLM-based Sequential Recommendation
- Invariant Measures in Time-Delay Coordinates for Unique Dynamical System Identification
- A learning-based approach to stochastic optimal control under reach-avoid constraint
- Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization
- Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
- Training-Free Defense Against Adversarial Attacks in Deep Learning MRI Reconstruction
- Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics
- Gaussian Universality for Diffusion Models
- MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
- Nirvana AI Governance: How AI Policymaking Is Committing Three Old Fallacies
- Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
- A Unified Information-Theoretic Framework for Meta-Learning Generalization
- DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
- Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
- An Empirical Analysis of Machine Learning Model and Dataset Documentation, Supply Chain, and Licensing Challenges on Hugging Face
- Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
- Noise Sensitivity and Learning Lower Bounds for Hierarchical Functions
- OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting
- Accelerated Parallel Tempering via Neural Transports
- Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
- Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
- Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
- Conformal prediction of future insurance claims in the regression problem
- UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
- A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
- Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks
- Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
- A categorical embedding discontinuity-capturing shallow neural network for anisotropic elliptic interface problems
- Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials - A review
- Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation
- SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
- MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
- DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
- Contrastive Representations for Temporal Reasoning
- Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
- Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
- Speculative Safety-Aware Decoding
- Type-Compliant Adaptation Cascades: Adapting Programmatic LM Workflows to Data
- What Matters in Data for DPO?
- End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
- T-MLP: Tailed Multi-Layer Perceptron for Level-of-Detail Signal Representation
- Metis: Training Large Language Models with Advanced Low-Bit Quantization
- GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
- Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Model Optimal Transport
- Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula
- Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors
- Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication
- Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
- TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
- Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
- VQEzy: An Open-Source Dataset for Parameter Initialization in Variational Quantum Eigensolvers
- Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
- PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
- PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation
- Network inference via process motifs for lagged correlation in linear stochastic processes
- Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis
- Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
- A Double Machine Learning Approach to Combining Experimental and Observational Data
- Few-shot Personalized Saliency Prediction Based on Interpersonal Gaze Patterns
- Information theory for data-driven model reduction in physics and biology
- A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite Optimization
- Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora
- Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
- Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
- Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
- Comba: Improving Bilinear RNNs with Closed-loop Control
- QKV Projections Require a Fraction of Their Memory
- Interaction Field Matching: Overcoming Limitations of Electrostatic Models
- Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
- Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
- Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
- Towards Better Generalization via Distributional Input Projection Network
- Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
- TreeRPO: Tree Relative Policy Optimization
- Flow-Attentional Graph Neural Networks
- Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?
- InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
- Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
- Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
- InstructPro: Natural Language Guided Ligand-Binding Protein Design
- Foundation Models for Causal Inference via Prior-Data Fitted Networks
- Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
- Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
- Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
- Muon Optimizes Under Spectral Norm Constraints
- Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models
- Adaptive Sample Scheduling for Direct Preference Optimization
- Origins of Creativity in Attention-Based Diffusion Models
- Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?
- Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
- Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
- Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
- Cooperative Sheaf Neural Networks
- Learning to Segment for Vehicle Routing Problems
- JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling
- Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
- Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
- Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric
- One Token to Fool LLM-as-a-Judge
- Warm Starts Accelerate Conditional Diffusion
- FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data
- A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction
- Vidar: Embodied Video Diffusion Model for Generalist Manipulation
- Probabilistic Soundness Guarantees in LLM Reasoning Chains
- Learning to summarize user information for personalized reinforcement learning from human feedback
- GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models
- Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
- U-Cast: Learning Hierarchical Structures for High-Dimensional Time Series Forecasting
- Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
- GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning
- Moving Out: Physically-grounded Human-AI Collaboration
- A Markov Categorical Framework for Language Modeling
- Can Language Models Discover Scaling Laws?
- Merging Memory and Space: A State Space Neural Operator
- Signals, Concepts, and Laws: Toward Universal, Explainable Time-Series Forecasting
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
- Runtime Adaptive Pruning for LLM Inference
- On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
- Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
- What Do You Need for Diverse Trajectory Composition in Diffusion Planning?
- Reward Model Overoptimisation in Iterated RLHF
- Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access
- ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
- LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
- Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
- Logic Gate Neural Networks are Good for Verification
- ePC: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
- Variational Deep Learning via Implicit Regularization
- PDFBench: A Benchmark for De novo Protein Design from Function
- Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization
- Equivariant Spherical Transformer for Efficient Molecular Modeling
- Efficient AllReduce with Stragglers
- Continuous Chain of Thought Enables Parallel Exploration and Reasoning
- Vision Language Models are Biased
- On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
- Weight-Space Linear Recurrent Neural Networks
- Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
- What Makes a Reward Model a Good Teacher? An Optimization Perspective
- On The Sample Complexity Bounds In Bilevel Reinforcement Learning
- Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
- Reasoning to Learn from Latent Thoughts
- AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
- Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
- Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
- Efficient Generative Model Training via Embedded Representation Warmup
- A Model Zoo on Phase Transitions in Neural Networks
- A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation
- Sharpness-Aware Minimization with Z-Score Gradient Filtering
- Localized Diffusion Models
- Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations
- RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
- Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
- The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework
- Visual Planning: Let's Think Only with Images
- Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
- Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
- MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
- AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
- Hamiltonian Neural PDE Solvers through Functional Approximation
- Causes and Consequences of Representational Similarity in Machine Learning Models
- VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
- Learning with Local Search MCMC Layers
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
- Scaling Diffusion Transformers Efficiently via $\mu$P
- Certified Neural Approximations of Nonlinear Dynamics
- Towards Identifiability of Interventional Stochastic Differential Equations
- Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
- Scalable Graph Generative Modeling via Substructure Sequences
- AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
- ICYM2I: The illusion of multimodal informativeness under missingness
- Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and Standard Gradient
- TabText: Language-Based Representations of Tabular Health Data for Predictive Modelling
- Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
- AQuaMaM: An Autoregressive, Quaternion Manifold Model for Rapidly Estimating Complex SO(3) Distributions
- Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies
- Double Machine Learning Based Structure Identification from Temporal Data
- EUGENE: Explainable Structure-aware Graph Edit Distance Estimation with Generalized Edit Costs
- The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning
- Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity
- Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method
- PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
- A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability
- Differential Encoding for Improved Representation Learning over Graphs
- Deep Time Series Models: A Comprehensive Survey and Benchmark
- Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
- Understanding Transformer Architecture through Continuous Dynamics: A Partial Differential Equation Perspective
- Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients
- A GREAT Architecture for Edge-Based Graph Problems Like TSP
- Sparse Covariance Neural Networks
- DeepONet for Solving Nonlinear Partial Differential Equations with Physics-Informed Training
- Extracting Moore Machines from Transformers using Queries and Counterexamples
- Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
- Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
- NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs
- Gradient-Free Training of Quantized Neural Networks
- Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning
- A Predictive Approach To Enhance Time-Series Forecasting
- Benchmarking Computational Methods for Emerging Drug-Drug Interaction Prediction
- Self-Normalized Resets for Plasticity in Continual Learning
- Haar-Laplacian for directed graphs
- Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
- Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
- Euclidean Fast Attention - Machine Learning Global Atomic Representations at Linear Cost
- Learning Randomized Reductions
- Toward Model-centric Heterogeneous Federated Graph Learning: A Knowledge-driven Approach
- Norm-Bounded Low-Rank Adaptation
- Principal Components for Neural Network Initialization
- Federated Sketching LoRA: A Flexible Framework for Heterogeneous Collaborative Fine-Tuning of LLMs
- Vintix: Action Model via In-Context Reinforcement Learning
- DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits
- A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
- InfoBridge: Mutual Information estimation via Bridge Matching
- LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection
- Progressive Binarization with Semi-Structured Pruning for LLMs
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
- Pre-training Epidemic Time Series Forecasters with Compartmental Prototypes
- vCache: Verified Semantic Prompt Caching
- Functional Complexity-adaptive Temporal Tensor Decomposition
- Recurrent Memory for Online Interdomain Gaussian Processes
- The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time
- Comprehensive Review of Neural Differential Equations for Time Series Analysis
- Learning to Explain Air Traffic Situation
- Collaborative Deterministic-Probabilistic Forecasting for Diverse Spatiotemporal Systems
- SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
- TGT: A Temporal Gating Transformer for Smartphone App Usage Prediction
- Joint Value Estimation and Bidding in Repeated First-Price Auctions
- Meta-Learning to Explore via Memory Density Feedback
- Neuroplasticity-inspired dynamic ANNs for multi-task demand forecasting
- CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
- Training Agents Inside of Scalable World Models
- Quantitative convergence of trained single layer neural networks to Gaussian processes
- Bandits roaming Hilbert space
- Prompting Robot Teams with Natural Language
- Inducing Dyslexia in Vision Language Models
- Algorithms and data structures for automatic precision estimation of neural networks
- Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
- InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
- Reference-Free Rating of LLM Responses via Latent Information
- Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering
- MAD: Manifold Attracted Diffusion
- Bundle Network: a Machine Learning-Based Bundle Method
- ProxyAttn: Guided Sparse Attention via Representative Heads
- Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG
- Sparse Autoencoders Make Audio Foundation Models more Explainable
- Fidelity-Aware Data Composition for Robust Robot Generalization
- TACO-Net: Topological Signatures Triumph in 3D Object Classification
- A Greedy PDE Router for Blending Neural Operators and Classical Methods
- Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
- Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size
- Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
- Environment-Aware Satellite Image Generation with Diffusion Models
- VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
- Improved Stochastic Optimization of LogSumExp
- Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
- When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
- Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions
- From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
- A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
- Graph Theory Meets Federated Learning over Satellite Constellations: Spanning Aggregations, Network Formation, and Performance Optimization
- Scalable GANs with Transformers
- MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
- Embedded Deep Learning for Bio-hybrid Plant Sensors to Detect Increased Heat and Ozone Levels
- LVT: Large-Scale Scene Reconstruction via Local View Transformers
- CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
- VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
- Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
- Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
- Symmetry-Aware Bayesian Optimization via Max Kernels
- Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
- Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
- Scaling Generalist Data-Analytic Agents
- Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
- Curriculum Imitation Learning of Distributed Multi-Robot Policies
- On Spectral Learning for Odeco Tensors: Perturbation, Initialization, and Algorithms
- Score Distillation of Flow Matching Models
- The Era of Real-World Human Interaction: RL from User Conversations
- Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
- Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
- Pretraining Large Language Models with NVFP4
- Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
- Personalized Vision via Visual In-Context Learning
- GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
- Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes
- CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection
- DFG-PCN: Point Cloud Completion with Degree-Flexible Point Graph
- StrucADT: Generating Structure-controlled 3D Point Clouds with Adjacency Diffusion Transformer
- Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
- Diff-3DCap: Shape Captioning with Diffusion Models
- LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
- VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation
- Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
- Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
- Define latent spaces by example: optimisation over the outputs of generative models
- Influence-Guided Concolic Testing of Transformer Robustness
- A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
- Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
- Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
- Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization
- Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
- Toward Preference-aligned Large Language Models via Residual-based Model Steering
- TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction
- Sequential Diffusion Language Models
- The Role of Logic and Automata in Understanding Transformers
- Singleton-Optimized Conformal Prediction
- GEAR: A General Evaluation Framework for Abductive Reasoning
- SpeedCP: Fast Kernel-based Conditional Conformal Prediction
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
- Ancestry Tree Clustering for Particle Filter Diversity Maintenance
- ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings
- EYE-DEX: Eye Disease Detection and EXplanation System
- Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
- Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework
- STRAPSim: A Portfolio Similarity Metric for ETF Alignment and Portfolio Trades
- Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
- Retrieval-augmented GUI Agents with Generative Guidelines
- AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
- BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
- ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
- Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
- Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
- SpecExit: Accelerating Large Reasoning Model via Speculative Exit
- Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations
- Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
- Understanding Cognitive States from Head & Hand Motion Data
- VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference
- Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
- LAMP-PRo: Label-aware Attention for Multi-label Prediction of DNA- and RNA-binding Proteins using Protein Language Models
- Graph-Based Learning of Free Surface Dynamics in Generalized Newtonian Fluids using Smoothed Particle Hydrodynamics
- Skeleton-based Robust Registration Framework for Corrupted 3D Point Clouds
- SCI-Verifier: Scientific Verifier with Thinking
- ActiveCQ: Active Estimation of Causal Quantities
- PEARL: Performance-Enhanced Aggregated Representation Learning
- Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks
- Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
- DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
- Prediction-Powered Communication with Distortion Guarantees
- From Sound to Setting: AI-Based Equalizer Parameter Prediction for Piano Tone Replication
- FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems
- Multi-Item-Query Attention for Stable Sequential Recommendation
- Contrastive Learning for Correlating Network Incidents
- Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
- Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
- Overcoming Over-Fitting in Constraint Acquisition via Query-Driven Interactive Refinement
- Preference-Based Dynamic Ranking Structure Recognition
- EKF-Based Fusion of Wi-Fi/LiDAR/IMU for Indoor Localization and Navigation
- Impact of Environmental Factors on LoRa 2.4 GHz Time of Flight Ranging Outdoors
- Statistical Inference for Gradient Boosting Regression
- Conditional Risk Minimization with Side Information: A Tractable, Universal Optimal Transport Framework
- MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
- Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models
- AI-Enhanced Distributed Channel Access for Collision Avoidance in Future Wi-Fi 8
- Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective
- Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction
- UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
- A Generative Model for Controllable Feature Heterophily in Graphs
- Explicit modelling of subject dependency in BCI decoding
- Learning Regional Monsoon Patterns with a Multimodal Attention U-Net
- Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces
- Multifractal features of multimodal cardiac signals: Nonlinear dynamics of exercise recovery
- Space Robotics Bench: Robot Learning Beyond Earth
- Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models
- PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
- CrediBench: Building Web-Scale Network Datasets for Information Integrity
- AI-Assisted Music Production: A User Study on Text-to-Music Models
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
- An Accelerated Newton-GMRES Method for Multilinear PageRank
- Train Once, Answer All: Many Pretraining Experiments for the Cost of One
- Flow Matching for Robust Simulation-Based Inference under Model Misspecification
- Optimizing the Network Topology of a Linear Reservoir Computer
- Comparison of Scoring Rationales Between Large Language Models and Human Raters
- Democratizing AI scientists using ToolUniverse
- New Insights and Algorithms for Optimal Diagonal Preconditioning
- S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
- AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
- 3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras
- Multi-Modal Manipulation via Multi-Modal Policy Consensus
- Dynamic Trust Calibration Using Contextual Bandits
- Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
- Network-Optimised Spiking Neural Network for Event-Driven Networking
- On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
- End-to-End Deep Learning for Predicting Metric Space-Valued Outputs
- RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
- Node Classification via Simplicial Interaction with Augmented Maximal Clique Selection
- BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
- Large Language Models and Futures Price Factors in China
- Spatially Parallel All-optical Neural Networks
- Communication-aware Wide-Area Damping Control using Risk-Constrained Reinforcement Learning
- RIV: Recursive Introspection Mask Diffusion Vision Language Model
- How LLMs Learn to Reason: A Complex Network Perspective
- LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
- Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
- Confidence Aware SSD Ensemble with Weighted Boxes Fusion for Weapon Detection
- FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
- Risk Profiling and Modulation for LLMs
- Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability
- FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design
- How to Make Large Language Models Generate 100% Valid Molecules?
- Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
- GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models
- TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
- XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
- SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
- Exploring Large Language Models for Translating Romanian Computational Problems into English
- BacPrep: An Experimental Platform for Evaluating LLM-Based Bacalaureat Assessment
- Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests
- A Culturally-Rich Romanian NLP Dataset from "Who Wants to Be a Millionaire?" Videos
- VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
- MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
- GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs
- Stable and Interpretable Jet Physics with IRC-Safe Equivariant Feature Extraction
- A Comprehensive Analysis of Churn Prediction in Telecommunications Using Machine Learning
- Forecasting West Nile virus with deep graph encoders
- A Comparison of Surrogate Constitutive Models for Viscoplastic Creep Simulation of HT-9 Steel
- Semantic-Aware Edge Intelligence for UAV Handover in 6G Networks
- PISA: An AI Pipeline for Interpretable-by-design Survival Analysis Providing Multiple Complexity-Accuracy Trade-off Models
- Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments
- Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
- Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization
- Metadata-Guided Adaptable Frequency Scaling across Heterogeneous Applications and Devices
- GZSL-MoE: Apprentissage G{\'e}n{\'e}ralis{\'e} Z{\'e}ro-Shot bas{\'e} sur le M{\'e}lange d'Experts pour la Segmentation S{\'e}mantique de Nuages de Points 3DAppliqu{\'e} {\`a} un Jeu de Donn{\'e}es d'Environnement de Collaboration Humain-Robot
- IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism
- LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning
- A Data-Driven Framework for Digital Transformation in Smart Cities: Integrating AI, Dashboards, and IoT Readiness
- Consistency Models as Plug-and-Play Priors for Inverse Problems
- Enabling Approximate Joint Sampling in Diffusion LMs
- Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
- Generalization Analysis for Classification on Korobov Space
- Variance-Bounded Evaluation without Ground Truth: VB-Score
- Concept activation vectors: a unifying view and adversarial attacks
- Identifying Memory Effects in Epidemics via a Fractional SEIRD Model and Physics-Informed Neural Networks
- UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation
- A theoretical guarantee for SyncRank
- Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
- What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs
- Hilbert: Recursively Building Formal Proofs with Informal Reasoning
- Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
- Text-Independent Speaker Identification Using Audio Looping With Margin Based Loss Functions
- Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
- Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
- Parameterized Hardness of Zonotope Containment and Neural Network Verification
- Patient-specific Biomolecular Instruction Tuning
- Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
- HEART: Emotionally-driven test-time scaling of Language Models
- Mixtures Closest to a Given Measure: A Semidefinite Programming Approach
- Convolutional Set Transformer
- A benchmark for vericoding: formally verified program synthesis
- TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
- Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification
- Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
- Localized Uncertainty Quantification in Random Forests via Proximities
- What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
- Robot Learning from Any Images
- ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning
- Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty
- Activation Matching for Explanation Generation
- Deep Reinforcement Learning in Action: Real-Time Control of Vortex-Induced Vibrations
- Emergent World Representations in OpenVLA
- Learning to Solve Optimization Problems Constrained with Partial Differential Equations
- SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems
- CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
- Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves
- OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
- Learning Hamiltonian Dynamics at Scale: A Differential-Geometric Approach
- Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
- HyperHELM: Hyperbolic Hierarchy Encoding for mRNA Language Modeling
- T-POP: Test-Time Personalization with Online Preference Feedback
- FedPOB: Sample-Efficient Federated Prompt Optimization via Bandits
- Circuit-Aware Reward Training: A Mechanistic Framework for Longtail Robustness in RLHF
- Discrete Variational Autoencoding via Policy Search
- Q-Net: Transferable Queue Length Estimation via Kalman-based Neural Networks
- Beyond Softmax: A Natural Parameterization for Categorical Random Variables
- Who invented deep residual learning?
- A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
- Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
- In-Context Learning of Temporal Point Processes with Foundation Inference Models
- Neural Message-Passing on Attention Graphs for Hallucination Detection
- MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
- Quantifying Generalisation in Imitation Learning
- Assessing the risk of future Dunkelflaute events for Germany using generative deep learning
- Fidel-TS: A High-Fidelity Benchmark for Multimodal Time Series Forecasting
- DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting
- Physics-informed learning under mixing: How physical knowledge speeds up learning
- DyMoDreamer: World Modeling with Dynamic Modulation
- Putnam-like dataset summary: LLMs as mathematical competition contestants
- Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data
- Beyond the Hook: Predicting Billboard Hot 100 Chart Inclusion with Machine Learning from Streaming, Audio Signals, and Perceptual Features
- DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs Learning
- Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation
- Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
- Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
- Towards Understanding the Shape of Representations in Protein Language Models
- When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
- Is Sequence Information All You Need for Bayesian Optimization of Antibodies?
- OAT-FM: Optimal Acceleration Transport for Improved Flow Matching
- Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
- Intra-request branch orchestration for efficient LLM reasoning
- Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
- Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models
- Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
- Sampling Complexity of TD and PPO in RKHS
- Score-based Membership Inference on Diffusion Models
- Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting
- MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts
- Bayesian Surrogates for Risk-Aware Pre-Assessment of Aging Bridge Portfolios
- A multiscale analysis of mean-field transformers in the moderate interaction regime
- Efficient Hyperparameter Tuning via Trajectory Invariance Principle
- Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
- Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
- Scaling with Collapse: Efficient and Predictable Training of LLM Families
- ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
- Towards generalizable deep ptychography neural networks
- Rethinking Entropy Regularization in Large Reasoning Models
- Learning in an Echo Chamber: Online Learning with Replay Adversary
- BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression
- High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
- Chance-constrained Flow Matching for High-Fidelity Constraint-aware Generation
- Does Weak-to-strong Generalization Happen under Spurious Correlations?
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
- Pretraining Scaling Laws for Generative Evaluations of Language Models
- GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
- Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
- Collaborative Device-Cloud LLM Inference through Reinforcement Learning
- On The Variability of Concept Activation Vectors
- In-Context Compositional Q-Learning for Offline Reinforcement Learning
- A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
- AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring
- A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks
- Demographic-Agnostic Fairness without Harm
- PEARL: Peer-Enhanced Adaptive Radio via On-Device LLM
- Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
- ADAPT: Lightweight, Long-Range Machine Learning Force Fields Without Graphs
- GeoFunFlow: Geometric Function Flow Matching for Inverse Operator Learning over Complex Geometries
- HyMaTE: A Hybrid Mamba and Transformer Model for EHR Representation Learning
- Echo Flow Networks
- The Impossibility of Inverse Permutation Learning in Transformer Models
- A signal separation view of classification
- Evaluation of Machine and Deep Learning Techniques for Cyclone Trajectory Regression and Status Classification by Time Series Data
- Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
- Multi-Scale Geometric Autoencoder
- Model Correlation Detection via Random Selection Probing
- FM-FoG: A Real-Time Foundation Model-based Wearable System for Freezing-of-Gait Mitigation
- Negative Pre-activations Differentiate Syntax
- Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
- MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis
- Conda: Column-Normalized Adam for Training Large Language Models Faster
- Semantic Editing with Coupled Stochastic Differential Equations
- Proposing a Framework for Machine Learning Adoption on Legacy Systems
- Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
- ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
- Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization
- Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
- ELASTIQ: EEG-Language Alignment with Semantic Task Instruction and Querying
- Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
- A study of Universal ODE approaches to predicting soil organic carbon
- Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
- AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates
- H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning
- Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
- Expanding Horizons of Level Diversity via Multi-objective Evolutionary Learning
- Watermarking Diffusion Language Models
- Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
- AXIS: Explainable Time Series Anomaly Detection with Large Language Models
- Muon: Training and Trade-offs with Latent Attention and MoE
- ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
- BiHDTrans: binary hyperdimensional transformer for efficient multivariate time series classification
- Semantic Compression via Multimodal Representation Learning
- EOE: Evolutionary Optimization of Experts for Training Language Models
- Distributionally Robust Federated Learning with Outlier Resilience
- Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nystr\"om Approximation
- FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
- One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
- Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model
- LLM DNA: Tracing Model Evolution via Functional Representations
- Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
- Trading Carbon for Physics: On the Resource Efficiency of Machine Learning for Spatio-Temporal Forecasting
- LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
- Training-Free Multimodal Guidance for Video to Audio Generation
- Short window attention enables long-term memorization
- EVO-LRP: Evolutionary Optimization of LRP for Interpretable Model Explanations
- Sketching Low-Rank Plus Diagonal Matrices
- Toward a Holistic Approach to Continual Model Merging
- Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
- Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
- GraphIFE: Rethinking Graph Imbalance Node Classification via Invariant Learning
- DRIK: Distribution-Robust Inductive Kriging without Information Leakage
- PreScope: Unleashing the Power of Prefetching for Resource-Constrained MoE Inference
- Virtual Nodes based Heterogeneous Graph Convolutional Neural Network for Efficient Long-Range Information Aggregation
- Pure Node Selection for Imbalanced Graph Node Classification
- Calibration Meets Reality: Making Machine Learning Predictions Trustworthy
- Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
- Why Alignment Must Precede Distillation: A Minimal Working Explanation
- Multi-Scale Spatial-Temporal Hypergraph Network with Lead-Lag Structures for Stock Time Series Forecasting
- Graph Neural Networks with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion for Multivariate Time Series Forecasting
- Towards a Comprehensive Scaling Law of Mixture-of-Experts
- Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning
- Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
- FedDAPL: Toward Client-Private Generalization in Federated Learning
- Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability
- Estimating Time Series Foundation Model Transferability via In-Context Learning
- Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
- FraudTransformer: Time-Aware GPT for Transaction Fraud Detection
- A Self-Adaptive Frequency Domain Network for Continuous Intraoperative Hypotension Prediction
- GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
- Time-Shifted Token Scheduling for Symbolic Music Generation
- An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms
- Anchored Supervised Fine-Tuning
- SHAPoint: Task-Agnostic, Efficient, and Interpretable Point-Based Risk Scoring via Shapley Values
- Knowledge Homophily in Large Language Models
- Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
- Visual CoT Makes VLMs Smarter but More Fragile
- Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
- STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
- FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
- Tequila: Trapping-free Ternary Quantization for Large Language Models
- IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
- Test-time GNN Model Evaluation on Dynamic Graphs
- Space Group Conditional Flow Matching
- Electric Currents for Discrete Data Generation
- Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know
- Adversarial Diffusion for Robust Reinforcement Learning
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
- Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
- Gradient Flow Convergence Guarantee for General Neural Network Architectures
- Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
- Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
- Integrated Communication and Control for Energy-Efficient UAV Swarms: A Multi-Agent Reinforcement Learning Approach
- Graph Mixing Additive Networks
- HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
- Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms
- Diffusion Models are Kelly Gamblers
- Brain-language fusion enables interactive neural readout and in-silico experimentation
- Efficient Identification of High Similarity Clusters in Polygon Datasets
- Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
- DiBS-MTL: Transformation-Invariant Multitask Learning with Direction Oracles
- Evaluating the Robustness of Chinchilla Compute-Optimal Scaling
- Detecting and Rectifying Noisy Labels: A Similarity-based Approach
- Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts
- Guide: Generalized-Prior and Data Encoders for DAG Estimation
- Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector Databases
- Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
- Statistical Learning Guarantees for Group-Invariant Barron Functions
- Temporal Generalization: A Reality Check
- Revisiting Multivariate Time Series Forecasting with Missing Values
- Beyond Outliers: A Study of Optimizers Under Quantization
- Disentanglement of Variations with Multimodal Generative Modeling
- Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble
- Improving constraint-based discovery with robust propagation and reliable LLM priors
- RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
- Impute-MACFM: Imputation based on Mask-Aware Flow Matching
- C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
- Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
- Beyond Heuristics: Globally Optimal Configuration of Implicit Neural Representations
- TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts
- Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers
- CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning
- Deep Learning-Based Detection of Cognitive Impairment from Passive Smartphone Sensing with Routine-Aware Augmentation and Demographic Personalization
- ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
- Dense associative memory on the Bures-Wasserstein space
- F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning
- ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
- CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy
- Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
- Towards Monotonic Improvement in In-Context Reinforcement Learning
- One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
- WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning
- SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
- More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression
- Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
- Deep Learning for Subspace Regression
- NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
- ABConformer: Physics-inspired Sliding Attention for Antibody-Antigen Interface Prediction
- CREPE: Controlling Diffusion with Replica Exchange
- Transfer Learning and Machine Learning for Training Five Year Survival Prognostic Models in Early Breast Cancer
- Continuous-Time Reinforcement Learning for Asset-Liability Management
- A Neural ODE Approach to Aircraft Flight Dynamics Modelling
- ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
- Two-Scale Latent Dynamics for Recurrent-Depth Transformers
- MELCOT: A Hybrid Learning Architecture with Marginal Preservation for Matrix-Valued Regression
- LLM Interpretability with Identifiable Temporal-Instantaneous Representation
- Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
- Entering the Era of Discrete Diffusion Models: A Benchmark for Schr\"odinger Bridges and Entropic Optimal Transport
- Landing with the Score: Riemannian Optimization through Denoising
- Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
- Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction
- Graph Your Own Prompt
- Planner Aware Path Learning in Diffusion Language Models Training
- Mind the Links: Cross-Layer Attention for Link Prediction in Multiplex Networks
- PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
- URS: A Unified Neural Routing Solver for Cross-Problem Zero-Shot Generalization
- LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
- Better Hessians Matter: Studying the Impact of Curvature Approximations in Influence Functions
- Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
- PHASE: Physics-Integrated, Heterogeneity-Aware Surrogates for Scientific Simulations
- Data-Efficient Training by Evolved Sampling
- Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
- Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving
- Localizing Adversarial Attacks To Produces More Imperceptible Noise
- In-Context Learning can Perform Continual Learning Like Humans
- Communication-Efficient and Interoperable Distributed Learning
- On the Capacity of Self-Attention
- Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
- Adaptive Margin RLHF via Preference over Preferences
- Observation-Free Attacks on Online Learning to Rank
- Neighborhood Sampling Does Not Learn the Same Graph Neural Network
- From Noise to Knowledge: A Comparative Study of Acoustic Anomaly Detection Models in Pumped-storage Hydropower Plants
- FedCF: Fair Federated Conformal Prediction
- Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders
- Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
- MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
- Compute-Optimal Quantization-Aware Training
- Understanding SOAP from the Perspective of Gradient Whitening
- SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
- Meta-Learning Fourier Neural Operators for Hessian Inversion and Enhanced Variational Data Assimilation
- GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
- Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
- Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces
- Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic
- Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders
- OptiMind: Teaching LLMs to Think Like Optimization Experts
- MDP modeling for multi-stage stochastic programs
- T-TAMER: Provably Taming Trade-offs in ML Serving
- Analysis of Variational Autoencoders
- Sample-efficient Multiclass Calibration under $\ell_{p}$ Error
- Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
- MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
- On the Sheafification of Higher-Order Message Passing
- Tracing the Representation Geometry of Language Models from Pretraining to Post-training
- Understanding Catastrophic Interference On the Identifibility of Latent Representations
- DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
- GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
- IsingFormer: Augmenting Parallel Tempering With Learned Proposals
- Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
- Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
- Dynamics of Learning: Generative Schedules from Latent ODEs
- Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting
- CLAD-Net: Continual Activity Recognition in Multi-Sensor Wearable Systems
- Signal Preserving Weight Initialization for Odd-Sigmoid Activations
- Unleashing Flow Policies with Distributional Critics
- Demystifying Network Foundation Models
- Sensitivity Analysis for Diffusion Models
- Causally-Enhanced Reinforcement Policy Optimization
- Towards Quantum-Ready Blockchain Fraud Detection via Ensemble Graph Neural Networks
- Effective Quantization of Muon Optimizer States
Research Sources: 2736 | Generated: 9/30/2025