AI RESEARCH PAPERS & ACADEMIC SOURCES
- APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
- Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)
- Recent Advancements in Microscopy Image Enhancement using Deep Learning: A Survey
- SeamCrafter: Enhancing Mesh Seam Generation for Artist UV Unwrapping via Reinforcement Learning
- FERD: Fairness-Enhanced Data-Free Robustness Distillation
- Differential-Integral Neural Operator for Long-Term Turbulence Forecasting
- TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
- Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
- Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
- SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
- Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
- pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
- NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
- STQE: Spatial-Temporal Attribute Quality Enhancement for G-PCC Compressed Dynamic Point Clouds
- DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
- $A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement
- Content-Aware Mamba for Learned Image Compression
- Small Dents, Big Impact: A Dataset and Deep Learning Approach for Vehicle Dent Detection
- Re-Densification Meets Cross-Scale Propagation: Real-Time Neural Compression of LiDAR Point Clouds
- Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
- GLEAM: Learning to Match and Explain in Cross-View Geo-Localization
- Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy
- Diffence: Fencing Membership Privacy With Diffusion Models
- Metric-Guided Conformal Bounds for Probabilistic Image Reconstruction
- STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery
- Diverse Subset Selection via Norm-Based Sampling and Orthogonality
- Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models
- Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
- Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
- Surgical Vision World Model
- Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
- Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
- Can Diffusion Models Disentangle? A Theoretical Perspective
- Geometry aware inference of steady state PDEs using Equivariant Neural Fields representations
- Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
- Mobi-$\pi$: Mobilizing Your Robot Learning Policy
- iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos
- NeuVAS: Neural Implicit Surfaces for Variational Shape Modeling
- Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
- Multi-View Hypercomplex Learning for Breast Cancer Screening
- Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution
- Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
- Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
- Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
- Large Pre-Training Datasets Don't Always Guarantee Robustness after Fine-Tuning
- TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
- DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
- Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
- Self-Guidance: Boosting Flow and Diffusion Generation on Their Own
- LOGen: Toward Lidar Object Generation by Point Diffusion
- UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint
- Unforgettable Lessons from Forgettable Images: Intra-Class Memorability Matters in Computer Vision
- LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
- Single-weight Model Editing for Post-hoc Spurious Correlation Neutralization
- PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval
- VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
- GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
- SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
- Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying
- DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images
- CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
- Image Recognition with Online Lightweight Vision Transformer: A Survey
- Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
- OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection
- Intentional Gesture: Deliver Your Intentions with Gestures for Speech
- Octic Vision Transformers: Quicker ViTs Through Equivariance
- PhyMAGIC: Physical Motion-Aware Generative Inference with Confidence-guided LLM
- Deeper Diffusion Models Amplify Bias
- DVD-Quant: Data-free Video Diffusion Transformers Quantization
- ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
- CARE: Confidence-aware Ratio Estimation for Medical Biomarkers
- Mamba-Driven Topology Fusion for Monocular 3D Human Pose Estimation
- Towards Scalable Language-Image Pre-training for 3D Medical Imaging
- Pose-free 3D Gaussian splatting via shape-ray estimation
- Physics-Guided Motion Loss for Video Generation Model
- ReSpace: Text-Driven 3D Indoor Scene Synthesis and Editing with Preference Alignment
- Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
- Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers
- DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
- Structure before the Machine: Input Space is the Prerequisite for Concepts
- HiSin: A Sinogram-Aware Framework for Efficient High-Resolution Inpainting
- VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
- LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
- Think With Videos For Agentic Long-Video Understanding
- video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
- HyCoVAD: A Hybrid SSL-LLM Model for Complex Video Anomaly Detection
- JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
- SpikeMatch: Semi-Supervised Learning with Temporal Dynamics of Spiking Neural Networks
- Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
- LongLive: Real-time Interactive Long Video Generation
- SPARK: Synergistic Policy And Reward Co-Evolving Framework
- CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach
- UML-CoT: Structured Reasoning and Planning with Unified Modeling Language for Robotic Room Cleaning
- LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
- Training-Free Synthetic Data Generation with Dual IP-Adapter Guidance
- Scale-Wise VAR is Secretly Discrete Diffusion
- Hierarchical Representation Matching for CLIP-based Class-Incremental Learning
- Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs
- CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
- RefAM: Attention Magnets for Zero-Shot Referral Segmentation
- SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
- Cross-Modal Retrieval with Cauchy-Schwarz Divergence
- Language-in-the-Loop Culvert Inspection on the Erie Canal
- Are Hallucinations Bad Estimations?
- VISION: Prompting Ocean Vertical Velocity Reconstruction from Incomplete Observations
- SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models
- DistillKac: Few-Step Image Generation via Damped Wave Equations
- TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
- Patch-Based Diffusion for Data-Efficient, Radiologist-Preferred MRI Reconstruction
- ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering
- Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
- Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
- Closing the Oracle Gap: Increment Vector Transformation for Class Incremental Learning
- Comparative Analysis of GAN and Diffusion for MRI-to-CT translation
- Enriching Knowledge Distillation with Intra-Class Contrastive Learning
- Guidance Watermarking for Diffusion Models
- Rigidity-Aware 3D Gaussian Deformation from a Single Image
- Aerial Path Planning for Urban Geometry and Texture Co-Capture
- COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
- Clinical Uncertainty Impacts Machine Learning Evaluations
- RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation
- Deep Learning-Based Cross-Anatomy CT Synthesis Using Adapted nnResU-Net with Anatomical Feature Prioritized Loss
- Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data
- JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
- Activation Function Design Sustains Plasticity in Continual Learning
- MINT-RVAE: Multi-Cues Intention Prediction of Human-Robot Interaction using Human Pose and Emotion Information from RGB-only Camera Data
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- WoW: Towards a World omniscient World model Through Embodied Interaction
- VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
- Pixel Motion Diffusion is What We Need for Robot Control
- See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
- Self-Supervised Point Cloud Completion based on Multi-View Augmentations of Single Partial Point Cloud
- REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation
- Joint graph entropy knowledge distillation for point cloud classification and robustness against corruptions
- MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
- DragGANSpace: Latent Space Exploration and Control for GANs
- MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
- Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
- Polysemous Language Gaussian Splatting via Matching-based Mask Lifting
- UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
- A Tale of Two Experts: Cooperative Learning for Source-Free Unsupervised Domain Adaptation
- FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing
- Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
- UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
- GS-2M: Gaussian Splatting for Joint Mesh Reconstruction and Material Decomposition
- MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning
- Rule-Based Reinforcement Learning for Document Image Classification with Vision Language Models
- Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
- HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
- Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
- NIFTY: a Non-Local Image Flow Matching for Texture Synthesis
- RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
- Pedestrian Attribute Recognition via Hierarchical Cross-Modality HyperGraph Learning
- CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
- HierLight-YOLO: A Hierarchical and Lightweight Object Detection Network for UAV Photography
- Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental Results
- GPT-4 for Occlusion Order Recovery
- Gradient-based multi-focus image fusion with focus-aware saliency enhancement
- Text Adversarial Attacks with Dynamic Outputs
- Integrating Background Knowledge in Medical Semantic Segmentation with Logic Tensor Networks
- Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
- RAU: Reference-based Anatomical Understanding with Vision Language Models
- FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
- LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
- Explaining multimodal LLMs via intra-modal token interactions
- U-MAN: U-Net with Multi-scale Adaptive KAN Network for Medical Image Segmentation
- $\gamma$-Quant: Towards Learnable Quantization for Low-bit Pattern Recognition
- SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion
- B\'ezier Meets Diffusion: Robust Generation Across Domains for Medical Image Segmentation
- PSTTS: A Plug-and-Play Token Selector for Efficient Event-based Spatio-temporal Representation Learning
- Group Critical-token Policy Optimization for Autoregressive Image Generation
- Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
- Color Names in Vision-Language Models
- EfficientDepth: A Fast and Detail-Preserving Monocular Depth Estimation Model
- Category Discovery: An Open-World Perspective
- MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
- LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE
- MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation
- DiTraj: training-free trajectory control for video diffusion transformer
- A Comprehensive Evaluation of Transformer-Based Question Answering Models and RAG-Enhanced Design
- Dynamic Novel View Synthesis in High Dynamic Range
- SRHand: Super-Resolving Hand Images and 3D Shapes via View/Pose-aware Neural Image Representations and Explicit 3D Meshes
- Deepfakes: we need to re-think the concept of "real" images
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
- StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing
- Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
- Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
- LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
- TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation
- Enhancing Vehicle Detection under Adverse Weather Conditions with Contrastive Learning
- Taming Flow-based I2V Models for Creative Video Editing
- Multi-View Crowd Counting With Self-Supervised Learning
- Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
- PANICL: Mitigating Over-Reliance on Single Prompt in Visual In-Context Learning
- SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference
- DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation
- SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet
- Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
- MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning
- PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
- No-Reference Image Contrast Assessment with Customized EfficientNet-B0
- Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
- Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models
- Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm
- From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
- Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
- WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
- DualFocus: Depth from Focus with Spatio-Focal Dual Variational Constraints
- Rate-Distortion Optimized Communication for Collaborative Perception
- FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
- Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors
- CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
- Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
- EgoInstruct: An Egocentric Video Dataset of Face-to-face Instructional Interactions with Multi-modal LLM Benchmarking
- High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
- SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection
- Large Material Gaussian Model for Relightable 3D Generation
- On the Status of Foundation Models for SAR Imagery
- UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments
- LFA-Net: A Lightweight Network with LiteFusion Attention for Retinal Vessel Segmentation
- Incorporating Scene Context and Semantic Labels for Enhanced Group-level Emotion Recognition
- KG-SAM: Injecting Anatomical Knowledge into Segment Anything Models via Conditional Random Fields
- UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models
- CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
- Training-Free Multimodal Deepfake Detection via Graph Reasoning
- Prompt-guided Representation Disentanglement for Action Recognition
- DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images
- R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
- Resource Consumption Red-Teaming for Large Vision-Language Models
- Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
- EigenBench: A Comparative Behavioral Measure of Value Alignment
- Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
- TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
- Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
- Random Direct Preference Optimization for Radiography Report Generation
- Improving Autism Detection with Multimodal Behavioral Analysis
- KV-Efficient VLA: A Method of Speed up Vision Language Model with RNN-Gated Chunked KV Cache
- Phrase-grounded Fact-checking for Automatically Generated Chest X-ray Reports
- MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification
- Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
- A Mutual Learning Method for Salient Object Detection with intertwined Multi-Supervision--Revised
- MAJORScore: A Novel Metric for Evaluating Multimodal Relevance via Joint Representation
- Safety Assessment of Scaffolding on Construction Site using AI
- Automated Prompt Generation for Creative and Counterfactual Text-to-image Synthesis
- In silico Deep Learning Protocols for Label-Free Super-Resolution Microscopy: A Comparative Study of Network Architectures and SNR Dependence
- Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
- SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders
- Coreset selection based on Intra-class diversity
- The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms
- Assessing the Alignment of Popular CNNs to the Brain for Valence Appraisal
- Debugging Concept Bottleneck Models through Removal and Retraining
- ShipwreckFinder: A QGIS Tool for Shipwreck Detection in Multibeam Sonar Data
- Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence
- TUN3D: Towards Real-World Scene Understanding from Unposed Images
- Large AI Model-Enabled Generative Semantic Communications for Image Transmission
- mmHSense: Multi-Modal and Distributed mmWave ISAC Datasets for Human Sensing
- Skeleton Sparsification and Densification Scale-Spaces
- Downscaling climate projections to 1 km with single-image super resolution
- JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation
- Overview of ExpertLifeCLEF 2018: how far automated identification systems are from the best experts?
- QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
- DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation
- VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
- Residual Vector Quantization For Communication-Efficient Multi-Agent Perception
- Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models
- Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Moderation
- Learning GUI Grounding with Spatial Reasoning from Visual Feedback
- X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
- Unsupervised Defect Detection for Surgical Instruments
- No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models
- Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms
- X-Streamer: Unified Human World Modeling with Audiovisual Interaction
- What Happens Next? Anticipating Future Motion by Generating Point Trajectories
- Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis
- VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment
- A Data-driven Typology of Vision Models from Integrated Representational Metrics
- FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
- MORPH: Shape-agnostic PDE Foundation Models
- MS-YOLO: Infrared Object Detection for Edge Deployment via MobileNetV4 and SlideLoss
- Motion-Aware Transformer for Multi-Object Tracking
- DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
- InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning
- RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems
- Probing Neural Topology of Large Language Models
- Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
- Personalized LLM Decoding via Contrasting Personal Preference
- MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
- KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
- VAT-KG: Knowledge-Intensive Multimodal Knowledge Graph Dataset for Retrieval-Augmented Generation
- WildSpeech-Bench: Benchmarking End-to-End SpeechLLMs in the Wild
- Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
- Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
- Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
- Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
- What Factors Affect LLMs and RLLMs in Financial Question Answering?
- KV Cache Steering for Controlling Frozen LLMs
- The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
- LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
- Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
- DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
- MLP Memory: A Retriever-Pretrained Memory for Large Language Models
- Conflict-Aware Soft Prompting for Retrieval-Augmented Generation
- Influence-driven Curriculum Learning for Pre-training on Limited Data
- Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling
- JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer
- CMRAG: Co-modality-based visual document retrieval and question answering
- Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought
- Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning
- Positional Encoding via Token-Aware Phase Attention
- DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models
- Distribution-Aligned Decoding for Efficient LLM Task Adaptation
- RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
- HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST
- A Critical Look At Tokenwise Reward-Guided Text Generation
- Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data
- On the Within-class Variation Issue in Alzheimer's Disease Detection
- Development and Validation of a Large Language Model for Generating Fully-Structured Radiology Reports
- Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
- Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
- Detecting and Interpreting NSFW Prompts in Text-to-Image Models through Uncovering Harmful Semantics
- Process Reinforcement through Implicit Rewards
- GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
- Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
- Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
- TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
- Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
- The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
- ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
- Domain-Aware Tensor Network Structure Search
- Think With Videos For Agentic Long-Video Understanding
- Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
- video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
- Latent Concept Disentanglement in Transformer-based Language Models
- MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
- Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
- From Roots to Rewards: Dynamic Tree Reasoning with Reinforcement Learning
- The Invisible Leash: Why RLVR May or May Not Escape Its Origin
- Library Hallucinations in LLMs: Risk Analysis Grounded in Developer Queries
- InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
- PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
- Can Synthetic Query Rewrites Capture User Intent Better than Humans in Retrieval-Augmented Generation?
- Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
- MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark
- IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
- Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Theory
- Does AI Coaching Prepare us for Workplace Negotiations?
- Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
- EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
- IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
- LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
- Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback
- Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs
- CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
- See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
- Constituency Parsing using LLMs
- TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text
- Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
- LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
- Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models
- Stuffed Mamba: Oversized States Lead to the Inability to Forget
- Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
- Vulnerability of LLMs to Vertically Aligned Text Manipulations
- Semantic Component Analysis: Introducing Multi-Topic Distributions to Clustering-Based Topic Modeling
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
- AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
- Can LLMs be Good Graph Judge for Knowledge Graph Construction?
- Demystifying Domain-adaptive Post-training for Financial LLMs
- Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
- Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
- LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization
- RuCCoD: Towards Automated ICD Coding in Russian
- How LLMs Fail to Support Fact-Checking
- Adaptively profiling models with task elicitation
- Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent
- Improving LLM-as-a-Judge Inference with the Judgment Distribution
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
- Cost-Optimal Grouped-Query Attention for Long-Context Modeling
- Retrieval-Augmented Generation with Hierarchical Knowledge
- CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
- SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offs
- MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
- Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
- LLM-OptiRA: LLM-Driven Optimization of Resource Allocation for Non-Convex Problems in Wireless Communications
- Follow the Path: Reasoning over Knowledge Graph Paths to Improve LLM Factuality
- SuperCoder: Assembly Program Superoptimization with Large Language Models
- HiddenBench: Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks
- ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
- HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
- ExpertSteer: Intervening in LLMs through Expert Knowledge
- Shadow-FT: Tuning Instruct Model via Training on Paired Base Model
- Language-Specific Latent Process Hinders Cross-Lingual Performance
- UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models
- UniErase: Towards Balanced and Precise Unlearning in Language Models
- Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs
- Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
- Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems
- Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
- BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models
- BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
- EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian
- Table-R1: Inference-Time Scaling for Table Reasoning
- FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding
- FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction
- Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
- Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs
- Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
- Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs
- Transformers Can Learn Connectivity in Some Graphs but Not Others
- The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling
- Conversational Implicatures: Modelling Relevance Theory Probabilistically
- CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models
- Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models
- What Is The Political Content in LLMs' Pre- and Post-Training Data?
- Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding
- Detecting (Un)answerability in Large Language Models with Linear Directions
- Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
- NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use
- Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving
- JGU Mainz's Submission to the WMT25 Shared Task on LLMs with Limited Resources for Slavic Languages: MT and QA
- Representing LLMs in Prompt Semantic Task Space
- We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong
- InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
- Think Socially via Cognitive Reasoning
- Retrieval-Augmented Guardrails for AI-Drafted Patient-Portal Messages: Error Taxonomy Construction and Large-Scale Evaluation
- Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
- ArabJobs: A Multinational Corpus of Arabic Job Ads
- From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
- Capturing Opinion Shifts in Deliberative Discourse through Frequency-based Quantum deep learning methods
- From tests to effect sizes: Quantifying uncertainty and statistical variability in multilingual and multitask NLP evaluation benchmarks
- StateX: Enhancing RNN Recall via Post-training State Expansion
- Variational Reasoning for Language Models
- Language Models Can Learn from Verbal Feedback Without Scalar Rewards
- Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
- WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
- VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
- Accelerate Creation of Product Claims Using Generative AI
- HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores
- Towards mitigating information leakage when evaluating safety monitors
- Random Direct Preference Optimization for Radiography Report Generation
- ReGeS: Reciprocal Retrieval-Generation Synergy for Conversational Recommender Systems
- LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?
- ARTI-6: Towards Six-dimensional Articulatory Speech Encoding
- VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
- Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models
- Are Hallucinations Bad Estimations?
- LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?
- Uncertainty-Aware Knowledge Tracing Models
- C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset
- Learning GUI Grounding with Spatial Reasoning from Visual Feedback
- Leveraging Big Data Frameworks for Spam Detection in Amazon Reviews
- AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit
- InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?
- UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments
- UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
- DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images
- Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics
- SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models
- What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness
- You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
- AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans
- Evaluating Open-Source Large Language Models for Technical Telecom Question Answering
- RISK: A Framework for GUI Agents in E-commerce Risk Management
- From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
- The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging
- A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning
- Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
- SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios
- MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
- Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data
- In Their Own Words: Reasoning Traces Tailored for Small Models Make Them Better Reasoners
- Influence Guided Context Selection for Effective Retrieval-Augmented Generation
- Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs
- How Large Language Models Need Symbolism
- One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
- LLM-Based Support for Diabetes Diagnosis: Opportunities, Scenarios, and Challenges with GPT-5
- Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes
- A State-of-the-Art SQL Reasoning Model using RLVR
- Learning to Reason with Mixture of Tokens
- Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning
- On Code-Induced Reasoning in LLMs
- Agribot: agriculture-specific question answer system
- Domain-Aware Speaker Diarization On African-Accented English
- Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution
- Comparative Personalization for Multi-document Summarization
- Vision Language Models Cannot Plan, but Can They Formalize?
- "Be My Cheese?": Assessing Cultural Nuance in Multilingual LLM Translations
- Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective
- OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
- Towards Transparent AI: A Survey on Explainable Language Models
- ReviewScore: Misinformed Peer Review Detection with Large Language Models
- GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures
- Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval
- ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
- How Accurate Are LLMs at Multi-Question Answering on Conversational Transcripts?
- Self-Speculative Biased Decoding for Faster Live Translation
- Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
- SynerGen: Contextualized Generative Recommender for Unified Search and Recommendation
- Navigating the Impact of Structured Output Format on Large Language Models through the Compass of Causal Inference
- Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
- Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
- Towards Minimal Causal Representations for Human Multimodal Language Understanding
- Can LLMs Solve and Generate Linguistic Olympiad Puzzles?
- ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
- Semantic Agreement Enables Efficient Open-Ended LLM Cascades
- Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models
- KnowMT-Bench: Benchmarking Knowledge-Intensive Long-Form Question Answering in Multi-Turn Dialogues
- Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations
- LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals
- No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
- QoNext: Towards Next-generation QoE for Foundation Models
- Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
- A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs
- AutoSCORE: Enhancing Automated Scoring with Multi-Agent Large Language Models via Structured Component Recognition
- SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation
- Why Chain of Thought Fails in Clinical Text Understanding
- Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration
- MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation
- Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
- GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation
- From Outliers to Topics in Language Models: Anticipating Trends in News Corpora
- Taxonomy of Comprehensive Safety for Clinical Agents
- Fuzzy Reasoning Chain (FRC): An Innovative Reasoning Framework from Fuzziness to Clarity
- RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media
- The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems
- Fine-tuning Done Right in Model Editing
- COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
- Multilingual Dialogue Generation and Localization with Dialogue Act Scripting
- S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
- Think Right, Not More: Test-Time Scaling for Numerical Claim Verification
- Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM
- Multilingual Vision-Language Models, A Survey
- FoodSEM: Large Language Model Specialized in Food Named-Entity Linking
- R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning
- Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
- NFDI4DS Shared Tasks for Scholarly Document Processing
- From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
- Mixture of Detectors: A Compact View of Machine-Generated Text Detection
- Context Parametrization with Compositional Adapters
- When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
- The Outputs of Large Language Models are Meaningless
- Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation
- StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
- A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
- Intuition emerges in Maximum Caliber models at criticality
- Mitigating Exponential Mixed Frequency Growth through Frequency Selection
- Partially Functional Dynamic Backdoor Diffusion-based Causal Model
- EigenBench: A Comparative Behavioral Measure of Value Alignment
- Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
- Scaling to Multimodal and Multichannel Heart Sound Classification: Fine-Tuning Wav2Vec 2.0 with Synthetic and Augmented Biosignals
- Recent Advancements in Microscopy Image Enhancement using Deep Learning: A Survey
- DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models
- Audio Super-Resolution with Latent Bridge Models
- Residual Off-Policy RL for Finetuning Behavior Cloning Policies
- Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
- Benchmarking LLMs in Web API Integration Tasks
- Thinking Augmented Pre-training
- IntSR: An Integrated Generative Framework for Search and Recommendation
- Data-driven Neural Networks for Windkessel Parameter Calibration
- Pre-Training Representations of Binary Code Using Contrastive Learning
- Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information
- QECO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing
- Diffence: Fencing Membership Privacy With Diffusion Models
- DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel
- Online Resource Allocation with Average Budget Constraints
- Discretization Error of Fourier Neural Operators
- On the Within-class Variation Issue in Alzheimer's Disease Detection
- Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
- A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages
- Stuffed Mamba: Oversized States Lead to the Inability to Forget
- GraphSCENE: On-Demand Critical Scenario Generation for Autonomous Vehicles in Simulation
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
- Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design
- Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
- Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
- Demystifying Domain-adaptive Post-training for Financial LLMs
- IP$^{2}$-RSNN: Bi-level Intrinsic Plasticity Enables Learning-to-learn in Recurrent Spiking Neural Networks
- Forecasting the future development in quality and value of professional football players
- VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
- Adaptively profiling models with task elicitation
- Surgical Vision World Model
- Cost-Optimal Grouped-Query Attention for Long-Context Modeling
- Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
- Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance
- Detecting Scarce and Sparse Anomalous: Solving Dual Imbalance in Multi-Instance Learning
- Do Data Valuations Make Good Data Prices?
- Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets
- Can Code Language Models Learn Clarification-Seeking Behaviors?
- CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
- LLM-OptiRA: LLM-Driven Optimization of Resource Allocation for Non-Convex Problems in Wireless Communications
- From Grunts to Lexicons: Emergent Language from Cooperative Foraging
- UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models
- Octic Vision Transformers: Quicker ViTs Through Equivariance
- Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs
- Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
- Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
- BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation
- Distillation-Enabled Knowledge Alignment Protocol for Semantic Communication in AI Agent Networks
- BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
- Transfer learning for multifidelity simulation-based inference in cosmology
- Mobi-$\pi$: Mobilizing Your Robot Learning Policy
- Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
- Scalable In-Context Q-Learning
- Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
- SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification
- Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
- HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
- MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
- pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
- Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
- A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision
- APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
- Tricks and Plug-ins for Gradient Boosting in Image Classification
- Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
- Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training
- Practical estimation of the optimal classification error with soft labels and calibration
- Spectral-inspired Operator Learning with Limited Data and Unknown Physics
- SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
- Model-Preserving Adaptive Rounding
- Domain-Aware Tensor Network Structure Search
- Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting
- Intercept Cancer: Cancer Pre-Screening with Large Scale Healthcare Foundation Models
- RsGCN: Subgraph-Based Rescaling Enhances Generalization of GCNs for Solving Traveling Salesman Problems
- WeightLoRA: Keep Only Necessary Adapters
- Sign-SGD is the Golden Gate between Multi-Node to Single-Node Learning: Significant Boost via Parameter-Free Optimization
- OrthoGrad Improves Neural Calibration
- Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum
- AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
- Caterpillar GNN: Replacing Message Passing with Efficient Aggregation
- Aircraft Trajectory Dataset Augmentation in Latent Space
- Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training
- RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
- SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
- Latent Concept Disentanglement in Transformer-based Language Models
- Online Multi-Agent Control with Adversarial Disturbances
- On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
- Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
- Whom to Trust? Adaptive Collaboration in Personalized Federated Learning
- Neural-Network solver of ideal MHD equilibria
- Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings
- Relative Entropy Pathwise Policy Optimization
- The Invisible Leash: Why RLVR May or May Not Escape Its Origin
- SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy
- Tricks and Plug-ins for Gradient Boosting with Transformers
- Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning
- ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification
- Multi-Channel Differential Transformer for Cross-Domain Sleep Stage Classification with Heterogeneous EEG and EOG
- Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
- In-Context Algorithm Emulation in Fixed-Weight Transformers
- Scalable Option Learning in High-Throughput Environments
- Challenges in Non-Polymeric Crystal Structure Prediction: Why a Geometric, Permutation-Invariant Loss is Needed
- AI for Scientific Discovery is a Social Problem
- Towards a Physics Foundation Model
- A Variational Framework for Residual-Based Adaptivity in Neural PDE Solvers and Operator Learning
- GPU Temperature Simulation-Based Testing for In-Vehicle Deep Learning Frameworks
- TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding
- The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
- Multi-View Hypercomplex Learning for Breast Cancer Screening
- Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators
- VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model
- Fast Partition-Based Cross-Validation With Centering and Scaling for $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$
- Metric-Guided Conformal Bounds for Probabilistic Image Reconstruction
- A Notion of Uniqueness for the Adversarial Bayes Classifier
- Machine Learning-Assisted Sustainable Remanufacturing, Reusing and Recycling for Lithium-ion Batteries
- Diverse Subset Selection via Norm-Based Sampling and Orthogonality
- A Critical Look At Tokenwise Reward-Guided Text Generation
- VeriFlow: Modeling Distributions for Neural Network Verification
- Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data
- Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models
- Degree-Conscious Spiking Graph for Cross-Domain Adaptation
- Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
- Measurability in the Fundamental Theorem of Statistical Learning
- Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach
- Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
- Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech
- How Strategic Agents Respond: Comparing Analytical Models with LLM-Generated Responses in Strategic Classification
- Avoiding $\mathbf{exp(R_{max})}$ scaling in RLHF through Preference-based Exploration
- Efficient Prior Selection in Gaussian Process Bandits with Thompson Sampling
- Process Reinforcement through Implicit Rewards
- GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
- ReciNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction
- Mechanisms of Projective Composition of Diffusion Models
- LDC-MTL: Balancing Multi-Task Learning through Scalable Loss Discrepancy Control
- Beyond Shallow Behavior: Task-Efficient Value-Based Multi-Task Offline MARL via Skill Discovery
- Fused Partial Gromov-Wasserstein for Structured Objects
- Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
- Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
- Multi-View Causal Discovery without Non-Gaussianity: Identifiability and Algorithms
- BPINN-EM-Post: Bayesian Physics-Informed Neural Network based Stochastic Electromigration Damage Analysis in the Post-void Phase
- MNT-TNN: Spatiotemporal Traffic Data Imputation via Compact Multimode Nonlinear Transform-based Tensor Nuclear Norm
- Can Diffusion Models Disentangle? A Theoretical Perspective
- CSF: Fixed-outline Floorplanning Based on the Conjugate Subgradient Algorithm Assisted by Q-Learning
- Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
- Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
- Geometry aware inference of steady state PDEs using Equivariant Neural Fields representations
- Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy
- Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere
- Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
- Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning
- TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
- Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
- Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
- Structured Relational Representations
- Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
- Forward-only Diffusion Probabilistic Models
- Learning Flexible Forward Trajectories for Masked Molecular Diffusion
- The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
- SPAR: Self-supervised Placement-Aware Representation Learning for Distributed Sensing
- Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
- Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
- FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
- Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study
- HD-PiSSA: High-Rank Distributed Orthogonal Adaptation
- ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
- Scale-Wise VAR is Secretly Discrete Diffusion
- Variational Reasoning for Language Models
- Language Models Can Learn from Verbal Feedback Without Scalar Rewards
- See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
- Adaptive Policy Learning to Additional Tasks
- A Random Matrix Perspective of Echo State Networks: From Precise Bias--Variance Characterization to Optimal Regularization
- Exploring the Early Universe with Deep Learning
- Comparative Analysis of GAN and Diffusion for MRI-to-CT translation
- Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation
- Incorporating priors in learning: a random matrix study under a teacher-student framework
- Multi-Agent Path Finding via Offline RL and LLM Collaboration
- DragGANSpace: Latent Space Exploration and Control for GANs
- COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
- Clinical Uncertainty Impacts Machine Learning Evaluations
- Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
- HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
- NIFTY: a Non-Local Image Flow Matching for Texture Synthesis
- Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
- Transformers Can Learn Connectivity in Some Graphs but Not Others
- Multi-channel convolutional neural quantum embedding
- Multidimensional Uncertainty Quantification via Optimal Transport
- Integrating Background Knowledge in Medical Semantic Segmentation with Logic Tensor Networks
- NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction
- Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
- Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
- CausalKANs: interpretable treatment effect estimation with Kolmogorov-Arnold networks
- Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
- Estimating the Empowerment of Language Model Agents
- Representing LLMs in Prompt Semantic Task Space
- TrueGradeAI: Retrieval-Augmented and Bias-Resistant AI for Transparent and Explainable Digital Assessments
- REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
- Smoothing-Based Conformal Prediction for Balancing Efficiency and Interpretability
- Debiased Front-Door Learners for Heterogeneous Effects
- Metrics for Parametric Families of Networks
- ConQuER: Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models
- Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement
- Nearly Tight Regret Bounds for Profit Maximization in Bilateral Trade
- Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
- Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
- From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
- SPARK: Synergistic Policy And Reward Co-Evolving Framework
- StateX: Enhancing RNN Recall via Post-training State Expansion
- Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback
- Training-Free Synthetic Data Generation with Dual IP-Adapter Guidance
- New Algorithmic Directions in Optimal Transport and Applications for Product Spaces
- Agribot: agriculture-specific question answer system
- AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
- Domain-Aware Speaker Diarization On African-Accented English
- No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models
- EEG-Based Consumer Behaviour Prediction: An Exploration from Classical Machine Learning to Graph Neural Networks
- General Pruning Criteria for Fast SBL
- IndiSeek learns information-guided disentangled representations
- What Happens Next? Anticipating Future Motion by Generating Point Trajectories
- Automated and Interpretable Survival Analysis from Multimodal Data
- VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment
- Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective
- Effective continuous equations for adaptive SGD: a stochastic analysis view
- OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
- Guiding Audio Editing with Audio Language Model
- InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?
- MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs
- Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models
- A regret minimization approach to fixed-point iterations
- Automating Sensor Characterization with Bayesian Optimization
- Generating Stable Placements via Physics-guided Diffusion Models
- MORPH: Shape-agnostic PDE Foundation Models
- HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech
- SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions
- Multi-modal Bayesian Neural Network Surrogates with Conjugate Last-Layer Estimation
- Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
- UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments
- Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription
- Self-Speculative Biased Decoding for Faster Live Translation
- Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
- Reinforcement Learning Based Traffic Signal Design to Minimize Queue Lengths
- CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
- Lifelong Learning with Behavior Consolidation for Vehicle Routing
- Navigating the Impact of Structured Output Format on Large Language Models through the Compass of Causal Inference
- SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models
- Causal-EPIG: A Prediction-Oriented Active Learning Framework for CATE Estimation
- No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
- Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
- Error Analysis of Discrete Flow with Generator Matching
- Sequential 1-bit Mean Estimation with Near-Optimal Sample Complexity
- Outlier Detection in Plantar Pressure: Human-Centered Comparison of Statistical Parametric Mapping and Explainable Machine Learning
- Learnable Conformal Prediction with Context-Aware Nonconformity Functions for Robotic Planning and Perception
- FlowDrive: moderated flow matching with data balancing for trajectory planning
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
- Bilinear relational structure fixes reversal curse and enables consistent model editing
- A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
- GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
- From Parameters to Behavior: Unsupervised Compression of the Policy Space
- Machine learning approaches to seismic event classification in the Ostrava region
- EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
- The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
- Transport Based Mean Flows for Generative Modeling
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- Quantile Advantage Estimation for Entropy-Safe Reasoning
- IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
- A Theoretical Analysis of Discrete Flow Matching Generative Models
- Learning Admissible Heuristics for A*: Theory and Practice
- Assessment of deep learning models integrated with weather and environmental variables for wildfire spread prediction and a case study of the 2023 Maui fires
- Interpretable Spectral Features Predict Conductivity in Self-Driving Doped Conjugated Polymer Labs
- Seismic Velocity Inversion from Multi-Source Shot Gathers Using Deep Segmentation Networks: Benchmarking U-Net Variants and SeismoLabV3+
- Cycle is All You Need: More Is Different
- From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification
- SGNNBench: A Holistic Evaluation of Spiking Graph Neural Network on Large-scale Graph
- Towards mitigating information leakage when evaluating safety monitors
- Spiking Neural Networks for Mental Workload Classification with a Multimodal Approach
- Accurate typhoon intensity forecasts using a non-iterative spatiotemporal transformer model
- Improving Autism Detection with Multimodal Behavioral Analysis
- Data-driven approach to the design of complexing agents for trivalent transuranium elements
- ReGeS: Reciprocal Retrieval-Generation Synergy for Conversational Recommender Systems
- Coreset selection based on Intra-class diversity
- The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms
- Debugging Concept Bottleneck Models through Removal and Retraining
- Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence
- mmHSense: Multi-Modal and Distributed mmWave ISAC Datasets for Human Sensing
- Downscaling climate projections to 1 km with single-image super resolution
- Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models
- DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation
- Foundation models for high-energy physics
- A State-of-the-Art SQL Reasoning Model using RLVR
- Enhanced Generative Machine Listener
- Learning to Reason with Mixture of Tokens
- Context-Aware Hybrid Routing in Bluetooth Mesh Networks Using Multi-Model Machine Learning and AODV Fallback
- Functional Encryption in Secure Neural Network Training: Data Leakage and Practical Mitigations
- ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
- Wavelet-Induced Rotary Encodings: RoPE Meets Graphs
- Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
- Towards a more realistic evaluation of machine learning models for bearing fault diagnosis
- Fine-Grained Uncertainty Decomposition in Large Language Models: A Spectral Approach
- Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics
- Conditional Denoising Diffusion Autoencoders for Wireless Semantic Communications
- A Multi-Level Framework for Multi-Objective Hypergraph Partitioning: Combining Minimum Spanning Tree and Proximal Gradient
- Aurora: Towards Universal Generative Multimodal Time Series Forecasting
- HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
- SoDaDE: Solvent Data-Driven Embeddings with Small Transformer Models
- Adaptive Policy Backbone via Shared Network
- Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
- Distributed Associative Memory via Online Convex Optimization
- Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning
- SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis
- Context and Diversity Matter: The Emergence of In-Context Learning in World Models
- Stochastic activations
- Neural Feature Geometry Evolves as Discrete Ricci Flow
- Investigating Faithfulness in Large Audio Language Models
- Role-Aware Multi-modal federated learning system for detecting phishing webpages
- Enhancing Credit Risk Prediction: A Meta-Learning Framework Integrating Baseline Models, LASSO, and ECOC for Superior Accuracy
- (Sometimes) Less is More: Mitigating the Complexity of Rule-based Representation for Interpretable Classification
- SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly
- Improving accuracy in short mortality rate series: Exploring Multi-step Forecasting Approaches in Hybrid Systems
- ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation
- MoveFM-R: Advancing Mobility Foundation Models via Language-driven Semantic Reasoning
- Fast-Forward Lattice Boltzmann: Learning Kinetic Behaviour with Physics-Informed Neural Operators
- One Prompt Fits All: Universal Graph Adaptation for Pretrained Models
- Partial Parameter Updates for Efficient Distributed Training
- Learning from Delayed Feedback in Games via Extra Prediction
- The Flood Complex: Large-Scale Persistent Homology on Millions of Points
- Global Convergence in Neural ODEs: Impact of Activation Functions
- Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
- Overclocking Electrostatic Generative Models
- Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator
- Nonlinear Optimization with GPU-Accelerated Neural Network Constraints
- IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
- Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining
- Bayesian Transfer Operators in Reproducing Kernel Hilbert Spaces
- OFMU: Optimization-Driven Framework for Machine Unlearning
- A Machine Learning Pipeline for Multiple Sclerosis Biomarker Discovery: Comparing explainable AI and Traditional Statistical Approaches
- Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
- Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data
- JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
- ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
- Learning to Price Bundles: A GCN Approach for Mixed Bundling
- Activation Function Design Sustains Plasticity in Continual Learning
- Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning
- On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/\epsilon)$ to Nearly $\epsilon$-Free
- Beyond Johnson-Lindenstrauss: Uniform Bounds for Sketched Bilinear Forms
- Graph of Agents: Principled Long Context Modeling by Emergent Multi-Agent Collaboration
- MolSpectLLM: A Molecular Foundation Model Bridging Spectroscopy, Molecule Elucidation, and 3D Structure Generation
- Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
- Abductive Logical Rule Induction by Bridging Inductive Logic Programming and Multimodal Large Language Models
- Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness
- Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
- Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
- Closing the Oracle Gap: Increment Vector Transformation for Class Incremental Learning
- Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
- Multiplicative-Additive Constrained Models:Toward Joint Visualization of Interactive and Independent Effects
- Generation Properties of Stochastic Interpolation under Finite Training Set
- Extracting Actionable Insights from Building Energy Data using Vision LLMs on Wavelet and 3D Recurrence Representations
- Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
- Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
- Active Attacks: Red-teaming LLMs via Adaptive Environments
- Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
- GRAM-TDI: adaptive multimodal representation learning for drug target interaction prediction
- Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
- Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning
- Concept-SAE: Active Causal Probing of Visual Model Behavior
- AEGIS: Authentic Edge Growth In Sparsity for Link Prediction in Edge-Sparse Bipartite Knowledge Graphs
- Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
- Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error
- MCGM: Multi-stage Clustered Global Modeling for Long-range Interactions in Molecules
- OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
- Latent Diffusion : Multi-Dimension Stable Diffusion Latent Space Explorer
- Convexity-Driven Projection for Point Cloud Dimensionality Reduction
- MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems
- BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning
- Enriching Knowledge Distillation with Intra-Class Contrastive Learning
- Towards Understanding Feature Learning in Parameter Transfer
- The Rogue Scalpel: Activation Steering Compromises LLM Safety
- Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning
- SHAKE-GNN: Scalable Hierarchical Kirchhoff-Forest Graph Neural Network
- Reinforcement Learning for Durable Algorithmic Recourse
- Modeling Psychological Profiles in Volleyball via Mixed-Type Bayesian Networks
- Countering adversarial evasion in regression analysis
- Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
- Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
- Slicing Wasserstein Over Wasserstein Via Functional Optimal Transport
- Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
- Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
- Efficiency Boost in Decentralized Optimization: Reimagining Neighborhood Aggregation with Minimal Overhead
- Learning Equivariant Functions via Quadratic Forms
- Mechanistic Independence: A Principle for Identifiable Disentangled Representations
- Kernel Regression of Multi-Way Data via Tensor Trains with Hadamard Overparametrization: The Dynamic Graph Flow Case
- Reversible GNS for Dissipative Fluids with Consistent Bidirectional Dynamics
- A Law of Data Reconstruction for Random Features (and Beyond)
- Automatic Discovery of One Parameter Subgroups of $SO(n)$
- Fairness-Aware Reinforcement Learning (FAReL): A Framework for Transparent and Balanced Sequential Decision-Making
- Limitations on Safe, Trusted, Artificial General Intelligence
- DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
- Differentiable Structure Learning for General Binary Data
- RED-DiffEq: Regularization by denoising diffusion models for solving inverse PDE problems with application to full waveform inversion
- A Systematic Review of Conformal Inference Procedures for Treatment Effect Estimation: Methods and Challenges
- MMPlanner: Zero-Shot Multimodal Procedural Planning with Chain-of-Thought Object State Reasoning
- Logic of Hypotheses: from Zero to Full Knowledge in Neurosymbolic Integration
- DIM: Enforcing Domain-Informed Monotonicity in Deep Neural Networks
- Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli
- SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks
- Scalable Second-order Riemannian Optimization for $K$-means Clustering
- Prophecy: Inferring Formal Properties from Neuron Activations
- SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding
- Wav2Arrest 2.0: Long-Horizon Cardiac Arrest Prediction with Time-to-Event Modeling, Identity-Invariance, and Pseudo-Lab Alignment
- Exact Subgraph Isomorphism Network for Predictive Graph Mining
- Downscaling human mobility data based on demographic socioeconomic and commuting characteristics using interpretable machine learning methods
- PQFed: A Privacy-Preserving Quality-Controlled Federated Learning Framework
- A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems
- Information-Theoretic Bayesian Optimization for Bilevel Optimization Problems
- Uncovering Alzheimer's Disease Progression via SDE-based Spatio-Temporal Graph Deep Learning on Longitudinal Brain Networks
- POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization
- Brain PathoGraph Learning
- HyperCore: Coreset Selection under Noise via Hypersphere Models
- SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection
- Reparameterizing 4DVAR with neural fields
- Machine Learning and AI Applied to fNIRS Data Reveals Novel Brain Activity Biomarkers in Stable Subclinical Multiple Sclerosis
- Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression
- FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
- Exploring the Relationships Between Physiological Signals During Automated Fatigue Detection
- ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations
- Scaling Laws for Neural Material Models
- Sharpness-Aware Minimization Can Hallucinate Minimizers
- High-Probability Analysis of Online and Federated Zero-Order Optimisation
- Neural Operators for Mathematical Modeling of Transient Fluid Flow in Subsurface Reservoir Systems
- GraphPFN: A Prior-Data Fitted Graph Foundation Model
- SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models
- Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
- Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations
- DistillKac: Few-Step Image Generation via Damped Wave Equations
- Uncertainty-Aware Knowledge Tracing Models
- $\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
- TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
- Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
- Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration
- A circuit for predicting hierarchical structure in-context in Large Language Models
- Evidence for Limited Metacognition in LLMs
- Machine Learning. The Science of Selection under Uncertainty
- Interpretable time series analysis with Gumbel dynamics
- Leveraging Big Data Frameworks for Spam Detection in Amazon Reviews
- GenUQ: Predictive Uncertainty Estimates via Generative Hyper-Networks
- Task-Agnostic Federated Continual Learning via Replay-Free Gradient Projection
- Causal Abstraction Inference under Lossy Representations
- LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning
- PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters
- Shoot from the HIP: Hessian Interatomic Potentials without derivatives
- Blockwise Hadamard high-Rank Adaptation for Parameter-Efficient LLM Fine-Tuning
- Understanding and Enhancing Mask-Based Pretraining towards Universal Representations
- Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
- GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
- StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI
- Scalable Option Learning in High-Throughput Environments
- Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
- JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer
- Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought
- Positional Encoding via Token-Aware Phase Attention
- Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
- Towards a Physics Foundation Model
- Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity
- Recent Advancements in Microscopy Image Enhancement using Deep Learning: A Survey
- Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook
- Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
- Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models
- Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations
- AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving
- Discovering and Analyzing Stochastic Processes to Reduce Waste in Food Retail
- Impact of Loss Weight and Model Complexity on Physics-Informed Neural Networks for Computational Fluid Dynamics
- LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?
- Object Identification Under Known Dynamics: A PIRNN Approach for UAV Classification
- Null-Space Filtering for Data-Free Continual Model Merging: Preserving Transparency, Promoting Fidelity
- Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope
- Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data
- Score-based Idempotent Distillation of Diffusion Models
- Are Hallucinations Bad Estimations?
- d2: Improved Techniques for Training Reasoning Diffusion Language Models
- VISION: Prompting Ocean Vertical Velocity Reconstruction from Incomplete Observations
- Filtering with Confidence: When Data Augmentation Meets Conformal Prediction
- SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
- DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
- Model-Preserving Adaptive Rounding
- Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting
- InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning
- Probing Neural Topology of Large Language Models
- Physics-Guided Motion Loss for Video Generation Model
- CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
- Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
- Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
- DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
- AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
- Position: Simulating Society Requires Simulating Thought
- VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
- Think With Videos For Agentic Long-Video Understanding
- Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
- Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training
- Personalized LLM Decoding via Contrasting Personal Preference
- Latent Concept Disentanglement in Transformer-based Language Models
- TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
- On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
- Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
- Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
- Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
- Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
- Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
- Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings
- KV Cache Steering for Controlling Frozen LLMs
- Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
- LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
- The Invisible Leash: Why RLVR May or May Not Escape Its Origin
- R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
- DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
- Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
- Geometry aware inference of steady state PDEs using Equivariant Neural Fields representations
- CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
- Neural Orchestration for Multi-Agent Systems: A Deep Learning Framework for Optimal Agent Selection in Multi-Domain Task Environments
- Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
- Follow the Path: Reasoning over Knowledge Graph Paths to Improve LLM Factuality
- SuperCoder: Assembly Program Superoptimization with Large Language Models
- HiddenBench: Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks
- Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning
- TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
- ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
- Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
- Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
- Structured Relational Representations
- Shadow-FT: Tuning Instruct Model via Training on Paired Base Model
- Learning Hierarchical Domain Models Through Environment-Grounded Interaction
- VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
- UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models
- Intentional Gesture: Deliver Your Intentions with Gestures for Speech
- Octic Vision Transformers: Quicker ViTs Through Equivariance
- UniErase: Towards Balanced and Precise Unlearning in Language Models
- Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
- Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems
- Learning Flexible Forward Trajectories for Masked Molecular Diffusion
- Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
- The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
- Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
- BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
- Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study
- HD-PiSSA: High-Rank Distributed Orthogonal Adaptation
- Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models
- Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training
- BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
- Spectral-inspired Operator Learning with Limited Data and Unknown Physics
- Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data
- Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients
- On the Within-class Variation Issue in Alzheimer's Disease Detection
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models
- Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
- Degree-Conscious Spiking Graph for Cross-Domain Adaptation
- Stuffed Mamba: Oversized States Lead to the Inability to Forget
- Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
- Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
- Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach
- Large Pre-Training Datasets Don't Always Guarantee Robustness after Fine-Tuning
- Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
- Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design
- Can LLMs be Good Graph Judge for Knowledge Graph Construction?
- Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
- Demystifying Domain-adaptive Post-training for Financial LLMs
- How Strategic Agents Respond: Comparing Analytical Models with LLM-Generated Responses in Strategic Classification
- Avoiding $\mathbf{exp(R_{max})}$ scaling in RLHF through Preference-based Exploration
- Process Reinforcement through Implicit Rewards
- VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
- Beyond Shallow Behavior: Task-Efficient Value-Based Multi-Task Offline MARL via Skill Discovery
- Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
- RuCCoD: Towards Automated ICD Coding in Russian
- How LLMs Fail to Support Fact-Checking
- Adaptively profiling models with task elicitation
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
- Cost-Optimal Grouped-Query Attention for Long-Context Modeling
- Retrieval-Augmented Generation with Hierarchical Knowledge
- Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance
- Detecting Scarce and Sparse Anomalous: Solving Dual Imbalance in Multi-Instance Learning
- Can Diffusion Models Disentangle? A Theoretical Perspective
- Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
- Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
- CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
- CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
- Toward a Physics of Deep Learning and Brains
- VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
- See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
- A critical review of methods and challenges in large language models
- Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability
- Development and Validation of a Large Language Model for Generating Fully-Structured Radiology Reports
- Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
- A Domain-Agnostic Scalable AI Safety Ensuring Framework
- Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
- From Grunts to Lexicons: Emergent Language from Cooperative Foraging
- The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
- XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
- Scalable In-Context Q-Learning
- Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization
- AgentOrchestra: Orchestrating Hierarchical Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
- LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning
- From Roots to Rewards: Dynamic Tree Reasoning with Reinforcement Learning
- MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
- Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
- EigenBench: A Comparative Behavioral Measure of Value Alignment
- The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
- The STAR-XAI Protocol: A Framework for Inducing and Verifying Agency, Reasoning, and Reliability in AI Agents
- Multi-View Hypercomplex Learning for Breast Cancer Screening
- Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators
- VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model
- Biospheric AI
- Machine Learning-Assisted Sustainable Remanufacturing, Reusing and Recycling for Lithium-ion Batteries
- Diverse Subset Selection via Norm-Based Sampling and Orthogonality
- VeriFlow: Modeling Distributions for Neural Network Verification
- CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models
- What Is The Political Content in LLMs' Pre- and Post-Training Data?
- Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach
- SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly
- Deep Learning-Based Cross-Anatomy CT Synthesis Using Adapted nnResU-Net with Anatomical Feature Prioritized Loss
- RAU: Reference-based Anatomical Understanding with Vision Language Models
- Explaining multimodal LLMs via intra-modal token interactions
- Partial Parameter Updates for Efficient Distributed Training
- An Ontology for Unified Modeling of Tasks, Actions, Environments, and Capabilities in Personal Service Robotics
- Global Convergence in Neural ODEs: Impact of Activation Functions
- Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding
- Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
- Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
- Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator
- MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark
- Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining
- Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
- Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving
- OFMU: Optimization-Driven Framework for Machine Unlearning
- A Machine Learning Pipeline for Multiple Sclerosis Biomarker Discovery: Comparing explainable AI and Traditional Statistical Approaches
- Ontological foundations for contrastive explanatory narration of robot plans
- Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Theory
- InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
- Does AI Coaching Prepare us for Workplace Negotiations?
- ConQuER: Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models
- Activation Function Design Sustains Plasticity in Continual Learning
- Retrieval-Augmented Guardrails for AI-Drafted Patient-Portal Messages: Error Taxonomy Construction and Large-Scale Evaluation
- From Parameters to Behavior: Unsupervised Compression of the Policy Space
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- Quantile Advantage Estimation for Entropy-Safe Reasoning
- Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
- IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
- A Theoretical Analysis of Discrete Flow Matching Generative Models
- Learning Admissible Heuristics for A*: Theory and Practice
- StateX: Enhancing RNN Recall via Post-training State Expansion
- Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback
- Variational Reasoning for Language Models
- Language Models Can Learn from Verbal Feedback Without Scalar Rewards
- Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
- WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
- Hierarchical Representation Matching for CLIP-based Class-Incremental Learning
- Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs
- Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data
- Polysemous Language Gaussian Splatting via Matching-based Mask Lifting
- Fairness-Aware Reinforcement Learning (FAReL): A Framework for Transparent and Balanced Sequential Decision-Making
- FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding
- ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
- Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
- Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs
- Secure and Efficient Access Control for Computer-Use Agents via Context Space
- Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
- Wavelet-Induced Rotary Encodings: RoPE Meets Graphs
- A Global Analysis of Cyber Threats to the Energy Sector: "Currents of Conflict" from a Geopolitical Perspective
- Leveraging Large Language Models for Robot-Assisted Learning of Morphological Structures in Preschool Children with Language Vulnerabilities
- Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
- Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
- HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
- HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
- Adaptive Policy Backbone via Shared Network
- Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
- Pedestrian Attribute Recognition via Hierarchical Cross-Modality HyperGraph Learning
- Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning
- Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs
- Transformers Can Learn Connectivity in Some Graphs but Not Others
- SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis
- Context and Diversity Matter: The Emergence of In-Context Learning in World Models
- Stochastic activations
- Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models
- REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation
- From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
- Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
- Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
- Teaching AI to Feel: A Collaborative, Full-Body Exploration of Emotive Communication
- Efficiency Boost in Decentralized Optimization: Reimagining Neighborhood Aggregation with Minimal Overhead
- Learning Equivariant Functions via Quadratic Forms
- MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training
- The Outputs of Large Language Models are Meaningless
- Reversible GNS for Dissipative Fluids with Consistent Bidirectional Dynamics
- Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation
- Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach
- VizGen: Data Exploration and Visualization from Natural Language via a Multi-Agent AI Architecture
- Automatic Discovery of One Parameter Subgroups of $SO(n)$
- Rigidity-Aware 3D Gaussian Deformation from a Single Image
- SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
- Why Chain of Thought Fails in Clinical Text Understanding
- SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet
- Unveiling Many Faces of Surrogate Models for Configuration Tuning: A Fitness Landscape Analysis Perspective
- Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration
- Active Attacks: Red-teaming LLMs via Adaptive Environments
- FlowDrive: moderated flow matching with data balancing for trajectory planning
- No-Reference Image Contrast Assessment with Customized EfficientNet-B0
- From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education
- Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
- Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models
- Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning
- Developing Vision-Language-Action Model from Egocentric Videos
- ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
- Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
- Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
- Latent Diffusion : Multi-Dimension Stable Diffusion Latent Space Explorer
- Fuzzy Reasoning Chain (FRC): An Innovative Reasoning Framework from Fuzziness to Clarity
- An Adaptive ICP LiDAR Odometry Based on Reliable Initial Pose
- Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks
- The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems
- The Rogue Scalpel: Activation Steering Compromises LLM Safety
- Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
- SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios
- Reinforcement Learning for Durable Algorithmic Recourse
- Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
- The AI_INFN Platform: Artificial Intelligence Development in the Cloud
- Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM
- Multi-Agent Path Finding via Offline RL and LLM Collaboration
- R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning
- Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
- DIM: Enforcing Domain-Informed Monotonicity in Deep Neural Networks
- MORPH: Shape-agnostic PDE Foundation Models
- SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks
- QueryGym: Step-by-Step Interaction with Relational Databases
- Optimizing the non-Clifford-count in unitary synthesis using Reinforcement Learning
- Not My Agent, Not My Boundary? Elicitation of Personal Privacy Boundaries in AI-Delegated Information Sharing
- Developing Strategies to Increase Capacity in AI Education
- UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments
- Uncovering Alzheimer's Disease Progression via SDE-based Spatio-Temporal Graph Deep Learning on Longitudinal Brain Networks
- POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization
- LFA-Net: A Lightweight Network with LiteFusion Attention for Retinal Vessel Segmentation
- Self-Speculative Biased Decoding for Faster Live Translation
- Brain PathoGraph Learning
- HyperCore: Coreset Selection under Noise via Hypersphere Models
- SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection
- Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
- Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
- Unbiased Binning: Fairness-aware Attribute Representation
- FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
- Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
- ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations
- DiTraj: training-free trajectory control for video diffusion transformer
- Can Large Language Models Autoformalize Kinematics?
- Beyond Johnson-Lindenstrauss: Uniform Bounds for Sketched Bilinear Forms
- Graph of Agents: Principled Long Context Modeling by Emergent Multi-Agent Collaboration
- Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
- No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
- Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
- You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
- Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
- A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs
- AutoSCORE: Enhancing Automated Scoring with Multi-Agent Large Language Models via Structured Component Recognition
- EqDiff-CT: Equivariant Conditional Diffusion model for CT Image Synthesis from CBCT
- Generation Properties of Stochastic Interpolation under Finite Training Set
- One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
- ARTI-6: Towards Six-dimensional Articulatory Speech Encoding
- A State-of-the-Art SQL Reasoning Model using RLVR
- Enhanced Generative Machine Listener
- Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models
- Score-based Idempotent Distillation of Diffusion Models
- Are Hallucinations Bad Estimations?
- Learning to Reason with Mixture of Tokens
- Neural Operators for Mathematical Modeling of Transient Fluid Flow in Subsurface Reservoir Systems
- Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning
- Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
- New Algorithmic Directions in Optimal Transport and Applications for Product Spaces
- DistillKac: Few-Step Image Generation via Damped Wave Equations
- $\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
- Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training
- Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
- Agribot: agriculture-specific question answer system
- Psychological and behavioural responses in human-agent vs. human-human interactions: a systematic review and meta-analysis
- Domain-Aware Speaker Diarization On African-Accented English
- No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models
- Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms
- What Happens Next? Anticipating Future Motion by Generating Point Trajectories
- Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis
- Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective
- LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning
- OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
- Guiding Audio Editing with Audio Language Model
- A Data-driven Typology of Vision Models from Integrated Representational Metrics
- InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?
- MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs
- Limitations on Safe, Trusted, Artificial General Intelligence
- Logic of Hypotheses: from Zero to Full Knowledge in Neurosymbolic Integration
- StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
- UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration
- Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
- From Search to Reasoning: A Five-Level RAG Capability Framework for Enterprise Data
- PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation
- Assessment of deep learning models integrated with weather and environmental variables for wildfire spread prediction and a case study of the 2023 Maui fires
- Seismic Velocity Inversion from Multi-Source Shot Gathers Using Deep Segmentation Networks: Benchmarking U-Net Variants and SeismoLabV3+
- Cross-Modal Retrieval with Cauchy-Schwarz Divergence
- Cycle is All You Need: More Is Different
- From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification
- SGNNBench: A Holistic Evaluation of Spiking Graph Neural Network on Large-scale Graph
- Random Direct Preference Optimization for Radiography Report Generation
- KV-Efficient VLA: A Method of Speed up Vision Language Model with RNN-Gated Chunked KV Cache
- Domain-Informed Genetic Superposition Programming: A Case Study on SFRC Beams
- Phrase-grounded Fact-checking for Automatically Generated Chest X-ray Reports
- A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
- MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification
- Influence Guided Context Selection for Effective Retrieval-Augmented Generation
- Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
- Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs
- A Mutual Learning Method for Salient Object Detection with intertwined Multi-Supervision--Revised
- MAJORScore: A Novel Metric for Evaluating Multimodal Relevance via Joint Representation
- Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan
- Safety Assessment of Scaffolding on Construction Site using AI
- ReGeS: Reciprocal Retrieval-Generation Synergy for Conversational Recommender Systems
- Automated Prompt Generation for Creative and Counterfactual Text-to-image Synthesis
- In silico Deep Learning Protocols for Label-Free Super-Resolution Microscopy: A Comparative Study of Network Architectures and SNR Dependence
- Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
- SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders
- Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
- Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence
- Towards Adapting Federated & Quantum Machine Learning for Network Intrusion Detection: A Survey
- MIXRAG : Mixture-of-Experts Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
- Large AI Model-Enabled Generative Semantic Communications for Image Transmission
- How Large Language Models Need Symbolism
- Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models
- PhenoMoler: Phenotype-Guided Molecular Optimization via Chemistry Large Language Model
- DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation
- Foundation models for high-energy physics
- Can AI Perceive Physical Danger and Intervene?
- Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
- Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
- Lifelong Learning with Behavior Consolidation for Vehicle Routing
- UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
- Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
- D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents
- ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
- DS-STAR: Data Science Agent via Iterative Planning and Verification
- Axiomatic Choice and the Decision-Evaluation Paradox
- DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents
- Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
- TRACE: Learning to Compute on Graphs
- GenesisGeo: Technical Report
- DyRo-MCTS: A Robust Monte Carlo Tree Search Approach to Dynamic Job Shop Scheduling
- Outlier Detection in Plantar Pressure: Human-Centered Comparison of Statistical Parametric Mapping and Explainable Machine Learning
- CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration
- RISK: A Framework for GUI Agents in E-commerce Risk Management
- Bilinear relational structure fixes reversal curse and enables consistent model editing
- GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
- The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging
- A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning
- Generalizing Multi-Objective Search via Objective-Aggregation Functions
- Ground-Truthing AI Energy Consumption: Validating CodeCarbon Against External Measurements
- Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
- Clinical Uncertainty Impacts Machine Learning Evaluations
- Evaluating LLMs for Combinatorial Optimization: One-Phase and Two-Phase Heuristics for 2D Bin-Packing
- InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
- Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
- Large Language Models as Nondeterministic Causal Models
- PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
- Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
- EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer
- Guiding Evolution of Artificial Life Using Vision-Language Models
- GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation
- InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios
- Estimating the Empowerment of Language Model Agents
- TrueGradeAI: Retrieval-Augmented and Bias-Resistant AI for Transparent and Explainable Digital Assessments
- REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
- The Emergence of Altruism in Large-Language-Model Agents Society
- Towards mitigating information leakage when evaluating safety monitors
- Correct Reasoning Paths Visit Shared Decision Pivots
- AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
- EEG-Based Consumer Behaviour Prediction: An Exploration from Classical Machine Learning to Graph Neural Networks
- GeoEvolve: Automating Geospatial Model Discovery via Multi-Agent Large Language Models
- Automated and Interpretable Survival Analysis from Multimodal Data
- Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries
Research Sources: 1438 | Generated: 9/29/2025