AI RESEARCH PAPERS & ACADEMIC SOURCES
- A deep learning framework for bone fragment classification in owl pellets using YOLOv12
- Use AI in the classroom to bring problems to life
- Margaret Boden obituary: cognitive scientist who explored how machines might emulate human imagination
- AI content is tainting preprints: how moderators are fighting back
- Predicting antidepressant response via local-global graph neural network and neuroimaging biomarkers
- An analysis of the real world performance of an artificial intelligence based autism diagnostic
- DiffVolume: Diffusion Models for Volume Generation in Limit Order Books
- CRADLE: Conversational RTL Design Space Exploration with LLM-based Multi-Agent Systems
- Subsampling Factorization Machine Annealing
- Image selective encryption analysis using mutual information in CNN based embedding space
- Sound Signal Synthesis with Auxiliary Classifier GAN, COVID-19 cough as an example
- Chartwin: a Case Study on Channel Charting-aided Localization in Dynamic Digital Network Twins
- Developing a Transferable Federated Network Intrusion Detection System
- Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems
- Tame Riemannian Stochastic Approximation
- Combat Urban Congestion via Collaboration: Heterogeneous GNN-based MARL for Coordinated Platooning and Traffic Signal Control
- A DNN Biophysics Model with Topological and Electrostatic Features
- Multi-modal Policies with Physics-informed Representations in Complex Fluid Environments
- Cross-Modal Temporal Fusion for Financial Market Forecasting
- Hyperbolic Fuzzy C-Means with Adaptive Weight-based Filtering for Efficient Clustering
- Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models
- Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
- Learning Generative Models for Climbing Aircraft from Radar Data
- Learning Optimal and Fair Policies for Online Allocation of Scarce Societal Resources from Data Collected in Deployment
- Whispers in the Machine: Confidentiality in Agentic Systems
- Evaluating lightweight unsupervised online IDS for masquerade attacks in CAN
- Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots
- Fast Tensor Completion via Approximate Richardson Iteration
- Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion
- Efficient Learning on Large Graphs using a Densifying Regularity Lemma
- Discrete and Continuous Difference of Submodular Minimization
- Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy
- Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation
- Probabilistic Emissivity Retrieval from Hyperspectral Data via Physics-Guided Variational Inference
- Comparative study of machine learning and statistical methods for automatic identification and quantification in {\gamma}-ray spectrometry
- Weather-Driven Agricultural Decision-Making Using Digital Twins Under Imperfect Conditions
- SHeRL-FL: When Representation Learning Meets Split Learning in Hierarchical Federated Learning
- Discrete Diffusion-Based Model-Level Explanation of Heterogeneous GNNs with Node Features
- Sparse Partial Optimal Transport via Quadratic Regularization
- Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems
- SHEFL: Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning
- Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training
- Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
- Expert-Guided Diffusion Planner for Auto-bidding
- Elucidating Rectified Flow with Deterministic Sampler: Polynomial Discretization Complexity for Multi and One-step Models
- Interpretable Reward Model via Sparse Autoencoder
- Differentiated Information Mining: A Semi-supervised Learning Framework for GNNs
- Flow Battery Manifold Design with Heterogeneous Inputs Through Generative Adversarial Neural Networks
- Towards Scalable Lottery Ticket Networks using Genetic Algorithms
- Stationarity Exploration for Multivariate Time Series Forecasting
- Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning
- LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks
- GRAVITY: A Controversial Graph Representation Learning for Vertex Classification
- Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss
- Low-Regret and Low-Complexity Learning for Hierarchical Inference
- MechaFormer: Sequence Learning for Kinematic Mechanism Design Automation
- FetFIDS: A Feature Embedding Attention based Federated Network Intrusion Detection Algorithm
- Causal Machine Learning for Patient-Level Intraoperative Opioid Dose Prediction from Electronic Health Records
- Meta-learning optimizes predictions of missing links in real-world networks
- Chi-Geometry: A Library for Benchmarking Chirality Prediction of GNNs
- Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
- Deep Neural Network Calibration by Reducing Classifier Shift with Stochastic Masking
- Evaluating Imputation Techniques for Short-Term Gaps in Heart Rate Data
- CFM-GP: Unified Conditional Flow Matching to Learn Gene Perturbation Across Cell Types
- Synthesize, Retrieve, and Propagate: A Unified Predictive Modeling Framework for Relational Databases
- Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
- Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation
- Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications
- Multi-Target Backdoor Attacks Against Speaker Recognition
- Flexible Prefrontal Control over Hippocampal Episodic Memory for Goal-Directed Generalization
- AI-induced sexual harassment: Investigating Contextual Characteristics and User Reactions of Sexual Harassment by a Companion Chatbot
- Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students
- Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU
- Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
- Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
- To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
- Saturation Self-Organizing Map
- RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition
- Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies
- Using LLMs to Capture Users' Temporal Context for Recommendation
- StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI
- Playing Atari Space Invaders with Sparse Cosine Optimized Policy Evolution
- LLM-Driven Adaptive 6G-Ready Wireless Body Area Networks: Survey and Framework
- M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
- AI Agents and the Law
- OmniLLP: Enhancing LLM-based Log Level Prediction with Context-Aware Retrieval
- UQGNN: Uncertainty Quantification of Graph Neural Networks for Multivariate Spatiotemporal Prediction
- Who pays the RENT? Implications of Spatial Inequality for Prediction-Based Allocation Policies
- AI Security Map: Holistic Organization of AI Security Technologies and Impacts on Stakeholders
- Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection
- QoE-Aware Service Provision for Mobile AR Rendering: An Agent-Driven Approach
- Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment
- Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics
- Imposing AI: Deceptive design patterns against sustainability
- Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem
- Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT
- Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge
- TechOps: Technical Documentation Templates for the AI Act
- Opening Musical Creativity? Embedded Ideologies in Generative-AI Music Systems
- Not in My Backyard! Temporal Voting Over Public Chores
- TempOpt -- Unsupervised Alarm Relation Learning for Telecommunication Networks
- OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads
- Wavelet Mixture of Experts for Time Series Forecasting
- Geometry-Aware Global Feature Aggregation for Real-Time Indirect Illumination
- EditMF: Drawing an Invisible Fingerprint for Your Large Language Models
- The Roots of International Perceptions: Simulating US Attitude Changes Towards China with LLM Agents
- Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models
- EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction
- Generalising Traffic Forecasting to Regions without Traffic Observations
- QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
- Urban-STA4CLC: Urban Theory-Informed Spatio-Temporal Attention Model for Predicting Post-Disaster Commercial Land Use Change
- Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion
- Rational Inverse Reasoning
- Attacks and Defenses Against LLM Fingerprinting
- Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams
- Dynamic Uncertainty-aware Multimodal Fusion for Outdoor Health Monitoring
- SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
- Towards Universal Neural Inference
- System~2 Reasoning for Human--AI Alignment: Generality and Adaptivity via ARC-AGI
- Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics
- Artificial Intelligence Software Structured to Simulate Human Working Memory, Mental Imagery, and Mental Continuity
- BELLA: Black box model Explanations by Local Linear Approximations
- Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
- Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion
- MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
- Return Prediction for Mean-Variance Portfolio Selection: How Decision-Focused Learning Shapes Forecasting Models
- Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning
- Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning
- Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
- Adaptive Informed Deep Neural Networks for Power Flow Analysis
- Chemist-aligned retrosynthesis by ensembling diverse inductive bias models
- FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning
- Forget the Data and Fine-Tuning! Just Fold the Network to Compress
- Topos Theory for Generative AI and LLMs
- Topos Causal Models
- An Efficient Application of Goal Programming to Tackle Multiobjective Problems with Recurring Fitness Landscapes
- LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models
- First Ask Then Answer: A Framework Design for AI Dialogue Based on Supplementary Questioning with Large Language Models
- What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge
- UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games
- Solver-Aided Expansion of Loops to Avoid Generate-and-Test
- OverFill: Two-Stage Models for Efficient Language Model Decoding
- A Fast GRASP Metaheuristic for the Trigger Arc TSP with MIP-Based Construction and Multi-Neighborhood Local Search
- Beyond Ordinal Preferences: Why Alignment Needs Cardinal Human Feedback
- POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem
- Large Language Models as Oracles for Ontology Alignment
- GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games
- SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering
- UGM2N: An Unsupervised and Generalizable Mesh Movement Network via M-Uniform Loss
- AgriGPT: a Large Language Model Ecosystem for Agriculture
- Diminution: On Reducing the Size of Grounding ASP Programs
- P-CAFE: Personalized Cost-Aware Incremental Feature Selection For Electronic Health Records
- Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training
- Hybrid Node-Destroyer Model with Large Neighborhood Search for Solving the Capacitated Vehicle Routing Problem
- Aryabhata: An exam-focused language model for JEE Math
- Simulating Generative Social Agents via Theory-Informed Workflow Design
- GRainsaCK: a Comprehensive Software Library for Benchmarking Explanations of Link Prediction Tasks on Knowledge Graphs
- Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
- Reducing Cognitive Load in Multi-Agent Reinforcement Learning for Mathematical Problem Solving: Decoupling Reasoning and Code Generation
- Compass-Thinker-7B Technical Report
- Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models
- Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
- Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory
- Activation Steering for Bias Mitigation: An Interpretable Approach to Safer LLMs
- A First Look at Predictability and Explainability of Pre-request Passenger Waiting Time in Ridesharing Systems
- CVCM Track Circuits Pre-emptive Failure Diagnostics for Predictive Maintenance Using Deep Neural Networks
- SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
- BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair
- A New Parallel Cooperative Landscape Smoothing Algorithm and Its Applications on TSP and UBQP
- On the Effects of Smoothing Rugged Landscape by Different Toy Problems: A Case Study on UBQP
- emg2tendon: From sEMG Signals to Tendon Control in Musculoskeletal Hands
- Towards Heterogeneity-Aware and Energy-Efficient Topology Optimization for Decentralized Federated Learning in Edge Environment
- XFMNet: Decoding Cross-Site and Nonstationary Water Patterns via Stepwise Multimodal Fusion for Long-Term Water Quality Forecasting
- Multi-grained spatial-temporal feature complementarity for accurate online cellular traffic prediction
- Understanding Transformers through the Lens of Pavlovian Conditioning
- Channel-Wise MLPs Improve the Generalization of Recurrent Convolutional Networks
- Constrained PSLQ Search for Machin-like Identities Achieving Record-Low Lehmer Measures
- Assessing the Quality of AI-Generated Exams: A Large-Scale Field Study
- EU Digital Regulation and Guatemala: AI, 5G, and Cybersecurity
- Between Fear and Desire, the Monster Artificial Intelligence (AI): Analysis through the Lenses of Monster Theory
- Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code
- Algorithmic Collusion of Pricing and Advertising on E-commerce Platforms
- Energy-Aware Code Generation with LLMs: Benchmarking Small vs. Large Language Models for Sustainable AI Programming
- Normative Moral Pluralism for AI: A Framework for Deliberation in Complex Moral Contexts
- HSA-Net: Hierarchical and Structure-Aware Framework for Efficient and Scalable Molecular Language Modeling
- Algorithmic Fairness amid Social Determinants: Reflection, Characterization, and Approach
- Do AI Companies Make Good on Voluntary Commitments to the White House?
- Fuzzy-Pattern Tsetlin Machine
- Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law
- The DNA of nuclear models: How AI predicts nuclear masses
- Generating Query-Relevant Document Summaries via Reinforcement Learning
- Fast weight programming and linear transformers: from machine learning to neurobiology
- Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations
- Empowering Children to Create AI-Enabled Augmented Reality Experiences
- When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital
- On Experiments
- Projection-based multifidelity linear regression for data-scarce applications
- In-Context Learning as Nonparametric Conditional Probability Estimation: Risk Bounds and Optimality
- Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction
- Bio-Inspired Artificial Neural Networks based on Predictive Coding
- Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport
- Regret minimization in Linear Bandits with offline data via extended D-optimal exploration
- Differentiable Cyclic Causal Discovery Under Unmeasured Confounders
- Kernel Two-Sample Testing via Directional Components Analysis
- Distributed optimization: designed for federated learning
- Sensitivity Analysis to Unobserved Confounding with Copula-based Normalizing Flows
- ReQuestNet: A Foundational Learning model for Channel Estimation
- Hi-fi functional priors by learning activations
- Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption
- Integrating attention into explanation frameworks for language and vision transformers
- Scaling Up Active Testing to Large Language Models
- fastkqr: A Fast Algorithm for Kernel Quantile Regression
- Online Covariance Estimation in Nonsmooth Stochastic Approximation
- Randomised Postiterations for Calibrated BayesCG
- Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes
- Finite-Sample Guarantees for Learning Dynamics in Zero-Sum Polymatrix Games
- Understanding Aggregations of Proper Learners in Multiclass Classification
- LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization
- Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
- A Parametric Bi-Directional Curvature-Based Framework for Image Artifact Classification and Quantification
- Adaptive High-Frequency Preprocessing for Video Coding
- GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments
- Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos
- A Robust Epipolar-Domain Regularization Algorithm for Light Field Depth Estimation
- Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
- Automatic and standardized surgical reporting for central nervous system tumors
- A Pseudo Global Fusion Paradigm-Based Cross-View Network for LiDAR-Based Place Recognition
- Shape Completion and Real-Time Visualization in Robotic Ultrasound Spine Acquisitions
- Accelerated Volumetric Compression without Hierarchies: A Fourier Feature Based Implicit Neural Representation Approach
- MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation
- UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
- Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
- Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
- TaoCache: Structure-Maintained Video Generation Acceleration
- ColorGPT: Leveraging Large Language Models for Multimodal Color Recommendation
- KFFocus: Highlighting Keyframes for Enhanced Video Understanding
- Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation
- UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
- Towards Perfection: Building Inter-component Mutual Correction for Retinex-based Low-light Image Enhancement
- Uncertainty-aware Cross-training for Semi-supervised Medical Image Segmentation
- When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges
- Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding
- Per-Query Visual Concept Learning
- ALFred: An Active Learning Framework for Real-world Semi-supervised Anomaly Detection with Adaptive Thresholds
- VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
- Scaling Learned Image Compression Models up to 1 Billion
- Addressing Bias in VLMs for Glaucoma Detection Without Protected Attribute Supervision
- Deep Learning Models for Robust Facial Liveness Detection
- Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
- HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
- MoSSDA: A Semi-Supervised Domain Adaptation Framework for Multivariate Time-Series Classification using Momentum Encoder
- Variational volume reconstruction with the Deep Ritz Method
- Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors
- Improving Facial Rig Semantics for Tracking and Retargeting
- Preprocessing Algorithm Leveraging Geometric Modeling for Scale Correction in Hyperspectral Images for Improved Unmixing Performance
- Enhanced Liver Tumor Detection in CT Images Using 3D U-Net and Bat Algorithm for Hyperparameter Optimization
- Hybrid Long and Short Range Flows for Point Cloud Filtering
- Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL
- STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision
- Exploring Palette based Color Guidance in Diffusion Models
- Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition
- DiffPhysCam: Differentiable Physics-Based Camera Simulation for Inverse Rendering and Embodied AI
- Frequency-Assisted Adaptive Sharpening Scheme Considering Bitrate and Quality Tradeoff
- VertexRegen: Mesh Generation with Continuous Level of Detail
- A new dataset and comparison for multi-camera frame synthesis
- Efficient motion-based metrics for video frame interpolation
- OpenCUA: Open Foundations for Computer-Use Agents
- Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
- Efficient Annotation of Medieval Charters
- Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text
- SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion
- Un-EVIMO: Unsupervised Event-Based Independent Motion Segmentation
- From Lab to Field: Real-World Evaluation of an AI-Driven Smart Video Solution to Enhance Community Safety
- PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud
- DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
- OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
- 3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
- SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
- A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends
- REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
- Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
- PAD-F: Prior-Aware Debiasing Framework for Long-Tailed X-ray Prohibited Item Detection
- Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video
- From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
- Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos
- Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches
- TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
- Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
- Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
- SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting
- Enhancing Wide-Angle Image Using Narrow-Angle View of the Same Scene
- Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics
- Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction
- Minimal Sensing for Orienting a Solar Panel
- SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI feedback
- LM-MCVT: A Lightweight Multi-modal Multi-view Convolutional-Vision Transformer Approach for 3D Object Recognition
- SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition
- ViStoryBench: Comprehensive Benchmark Suite for Story Visualization
- Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval
- Multiple Stochastic Prompt Tuning for Few-shot Adaptation under Extreme Domain Shift
- HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition
- Investigating the Relationship between the Weighted Figure of Merit and Rosin's Measure
- MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
- See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering
- A Data-driven Loss Weighting Scheme across Heterogeneous Tasks for Image Denoising
- Style transfer between Microscopy and Magnetic Resonance Imaging via Generative Adversarial Network in small sample size settings
- Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair
- Zero-Shot Generalization of Vision-Based RL Without Data Augmentation
- Gotta Hear Them All: Towards Sound Source Aware Audio Generation
- UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
- Automated Muscle and Fat Segmentation in Computed Tomography for Comprehensive Body Composition Analysis
- Multi-Keypoint Affordance Representation for Functional Dexterous Grasping
- Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
- Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images
- PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations
- A Fast Unsupervised Scheme for Polygonal Approximation
- When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning
- LayLens: Improving Deepfake Understanding through Simplified Explanations
- Digital and Robotic Twinning for Validation of Proximity Operations and Formation Flying
- SinLlama - A Large Language Model for Sinhala
- OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows
- Complex Logical Instruction Generation
- Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
- Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants
- Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI
- Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
- Exploring the Technical Knowledge Interaction of Global Digital Humanities: Three-decade Evidence from Bibliometric-based perspectives
- Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning
- Re:Verse -- Can Your VLM Read a Manga?
- Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
- Adaptive Personalized Conversational Information Retrieval
- MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time
- $\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models
- MultiAiTutor: Child-Friendly Educational Multilingual Speech Generation Tutor with LLMs
- Designing Memory-Augmented AR Agents for Spatiotemporal Reasoning in Personalized Task Assistance
- A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions
- Revealing the Role of Audio Channels in ASR Performance Degradation
- E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence,and Efficiency
- P/D-Device: Disaggregated Large Language Model between Cloud and Devices
- Quantifying Gender Biases Towards Politicians on Reddit
- Utilizing Large Language Models for Information Extraction from Real Estate Transactions
- From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
- AdEval: Alignment-based Dynamic Evaluation to Mitigate Data Contamination in Large Language Models
- EvoP: Robust LLM Inference via Evolutionary Pruning
- Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
- Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters
- Opioid Named Entity Recognition (ONER-2025) from Reddit
- CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
- ChatBench: From Static Benchmarks to Human-AI Evaluation
- Retrieval-Augmented Generation with Conflicting Evidence
- Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
- Mind the Gap: Benchmarking LLM Uncertainty, Discrimination, and Calibration in Specialty-Aware Clinical QA
- Unsupervised Document and Template Clustering using Multimodal Embeddings
- AIOS: LLM Agent Operating System
- VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
- Task Diversity Shortens the ICL Plateau
- A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health
- Decoding-based Regression
- Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions
- OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions
- Adaptive Computation Pruning for the Forgetting Transformer
- Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
- CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
- Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
- Evaluation of State-of-the-Art Deep Learning Techniques for Plant Disease and Pest Detection
- ImageDDI: Image-enhanced Molecular Motif Sequence Representation for Drug-Drug Interaction Prediction
- Designing Object Detection Models for TinyML: Foundations, Comparative Analysis, Challenges, and Emerging Solutions
- Neural Tangent Knowledge Distillation for Optical Convolutional Networks
- MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
- MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
- CObL: Toward Zero-Shot Ordinal Layering without User Prompting
- SharpXR: Structure-Aware Denoising for Pediatric Chest X-Rays
- VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
- Training Kindai OCR with parallel textline images and self-attention feature distance-based loss
- Calibration Attention: Instance-wise Temperature Scaling for Vision Transformers
- Boosting Generic Semi-Supervised Medical Image Segmentation via Diverse Teaching and Label Propagation
- Unlocking the Potential of Diffusion Priors in Blind Face Restoration
- Think as Cardiac Sonographers: Marrying SAM with Left Ventricular Indicators Measurements According to Clinical Guidelines
- Superclass-Guided Representation Disentanglement for Spurious Correlation Mitigation
- RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space
- DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
- QueryCraft: Transformer-Guided Query Initialization for Enhanced Human-Object Interaction Detection
- Yan: Foundational Interactive Video Generation
- Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization
- SelfHVD: Self-Supervised Handheld Video Deblurring for Mobile Phones
- Neural Artistic Style and Color Transfer Using Deep Learning
- Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
- AME: Aligned Manifold Entropy for Robust Vision-Language Distillation
- Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation
- Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization
- MMIF-AMIN: Adaptive Loss-Driven Multi-Scale Invertible Dense Network for Multimodal Medical Image Fusion
- PADReg: Physics-Aware Deformable Registration Guided by Contact Force for Ultrasound Sequences
- ROD: RGB-Only Fast and Efficient Off-road Freespace Detection
- Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos
- SafeFix: Targeted Model Repair via Controlled Image Generation
- Adaptive Confidence-Wise Loss for Improved Lens Structure Segmentation in AS-OCT
- Bridging the Gap: A Framework for Real-World Video Deepfake Detection via Social Network Compression Emulation
- SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
- DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation
- Region-Adaptive Video Sharpening via Rate-Perception Optimization
- MonoPartNeRF:Human Reconstruction from Monocular Video via Part-Based Neural Radiance Fields
- Identity-Preserving Aging and De-Aging of Faces in the StyleGAN Latent Space
- Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
- TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models
- 3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs
- Argument Quality Annotation and Gender Bias Detection in Financial Communication through Large Language Models
- TurQUaz at CheckThat! 2025: Debating Large Language Models for Scientific Web Discourse Detection
- Heartificial Intelligence: Exploring Empathy in Language Models
- Real-time News Story Identification
- TT-XAI: Trustworthy Clinical Text Explanations via Keyword Distillation and LLM Reasoning
- Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition
- MLLM-CBench:A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis
- Evaluating Contrast Localizer for Identifying Causal Unitsin Social & Mathematical Tasks in Language Models
- Objective Metrics for Evaluating Large Language Models Using External Data Sources
- MinionsLLM: a Task-adaptive Framework For The Training and Control of Multi-Agent Systems Through Natural Language
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
- Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions
- Putnam-AXIOM: A Functional and Static Benchmark
- CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation
- Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
- Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
- Enhancing Small LLM Alignment through Margin-Based Objective Modifications under Resource Constraints
- Momentum Point-Perplexity Mechanics in Large Language Models
- Steerable Pluralism: Pluralistic Alignment via Few-Shot Comparative Regression
- DeCAL Tokenwise Compression
- DepressLLM: Interpretable domain-adapted language model for depression detection from real-world narratives
- Optimizing Retrieval-Augmented Generation (RAG) for Colloquial Cantonese: A LoRA-Based Systematic Review
- InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
- Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
- LLaMA-Based Models for Aspect-Based Sentiment Analysis
- UWB at WASSA-2024 Shared Task 2: Cross-lingual Emotion Detection
- Prompt-Based Approach for Czech Sentiment Analysis
- LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement
- TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
- Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
- A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
- IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
- Magical: Medical Lay Language Generation via Semantic Invariance and Layperson-tailored Adaptation
- SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs
- DevNous: An LLM-Based Multi-Agent System for Grounding IT Project Management in Unstructured Conversation
- Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering
- Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
- TiMoE: Time-Aware Mixture of Language Experts
- An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
- Steering Towards Fairness: Mitigating Political Bias in LLMs
- BiasGym: Fantastic Biases and How to Find (and Remove) Them
- Weakly Supervised Fine-grained Span-Level Framework for Chinese Radiology Report Quality Assurance
- Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
- ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs
- Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
- Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning
- Jointly Generating and Attributing Answers using Logits of Document-Identifier Tokens
- Retrospective Sparse Attention for Efficient Long-Context Generation
- LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA
- A Survey on Training-free Alignment of Large Language Models
- LLM-as-a-Supervisor: Mistaken Therapeutic Behaviors Trigger Targeted Supervisory Feedback
- MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions
- READER: Retrieval-Assisted Drafter for Efficient LLM Inference
- CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
- Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages
- Link Prediction for Event Logs in the Process Industry
- AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Research Sources: 463 | Generated: 8/25/2025