2025-05-27

Title: Model-Distributed Inference for Large Language Models at the Edge

Title: Emotion Knowledge Enhancement for Vision Large Language Models: A Self-Verification Approach for High-Quality Emotion Instruction Data Generation

Title: Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry

Title: Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

Title: Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

Title: Decomposition of Water Demand Patterns Using Skewed Gaussian Distributions for Behavioral Insights and Operational Planning

Title: CONCORD: Concept-Informed Diffusion for Dataset Distillation

Title: Applications of Modular Co-Design for De Novo 3D Molecule Generation

Title: Taming Diffusion for Dataset Distillation with High Representativeness

Title: Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering

Title: TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Title: Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation

Title: HonestFace: Towards Honest Face Restoration with One-Step Diffusion Model

Title: Syn3DTxt: Embedding 3D Cues for Scene Text Generation

Title: The Prompt is Mightier than the Example

Title: Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

Title: Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning

Title: Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

Title: Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

Title: On Denoising Walking Videos for Gait Recognition

Title: EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models

Title: Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Title: Rethinking Causal Mask Attention for Vision-Language Inference

Title: Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter

Title: Flow Matching for Geometric Trajectory Simulation

Title: SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting

Title: ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos

Title: So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Title: DVD-Quant: Data-free Video Diffusion Transformers Quantization

Title: ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

Title: Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?

Title: Restoring Real-World Images with an Internal Detail Enhancement Diffusion Model

Title: Manifold-aware Representation Learning for Degradation-agnostic Image Restoration

Title: Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation

Title: Smart Energy Guardian: A Hybrid Deep Learning Model for Detecting Fraudulent PV Generation

Title: GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Title: Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization

Title: StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations

Title: Dual-Path Stable Soft Prompt Generation for Domain Generalization

Title: OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Title: HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

Title: Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation

Title: VORTA: Efficient Video Diffusion via Routing Sparse Attention

Title: How to build a consistency model: Learning flow maps via self-distillation

Title: Localizing Knowledge in Diffusion Transformers

Title: Eye-See-You: Reverse Pass-Through VR and Head Avatars

Title: Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Title: REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing

Title: SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes

Title: Partition Generative Modeling: Masked Modeling Without Masks

Title: PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

Title: Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time

Title: How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation

Title: CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

Title: MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Title: Protein Design with Dynamic Protein Vocabulary

Title: GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

Title: STRICT: Stress Test of Rendering Images Containing Text

Title: NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results

Title: Training-free Stylized Text-to-Image Generation with Fast Inference

Title: MMP-2K: A Benchmark Multi-Labeled Macro Photography Image Quality Assessment Database

Title: Towards Generalized Proactive Defense against Face Swappingwith Contour-Hybrid Watermark

Title: Jodi: Unification of Visual Generation and Understanding via Joint Modeling

Title: Plug-and-Play Context Feature Reuse for Efficient Masked Generation

Title: CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design

Title: Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition

Title: Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Title: Benchmarking Laparoscopic Surgical Image Restoration and Beyond

Title: Towards Understanding the Mechanisms of Classifier-Free Guidance

Title: RAISE: Realness Assessment for Image Synthesis and Evaluation

Title: DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

Title: ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

Title: Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning

Title: Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering

Title: TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis

Title: Alchemist: Turning Public Text-to-Image Data into Generative Gold

Title: Concept Reachability in Diffusion Models: Beyond Dataset Constraints

Title: Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales

Title: Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

Title: Absolute Coordinates Make Motion Generation Easy

Title: Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals

Title: Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression

Title: MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Title: LlamaSeg: Image Segmentation via Autoregressive Mask Generation

Title: Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation

Title: Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Title: Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation

Title: Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

Title: Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

Title: The Role of Video Generation in Enhancing Data-Limited Action Understanding

Title: Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models

Title: DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation

Title: Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Title: Toward Patient-specific Partial Point Cloud to Surface Completion for Pre- to Intra-operative Registration in Image-guided Liver Interventions

Title: Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift

Title: Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning

Title: TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs

Title: On scalable and efficient training of diffusion samplers

Title: Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation

Title: What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation

Title: VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models

Title: Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes

Title: Learning to Reason without External Rewards

Title: Preference Optimization by Estimating the Ratio of the Data Distribution

Title: SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows

Title: HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment

Title: Energy-based generator matching: A neural sampler for general state space

Title: ReDDiT: Rehashing Noise for Discrete Visual Generation

Title: Burst Image Super-Resolution via Multi-Cross Attention Encoding and Multi-Scan State-Space Decoding

Title: Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Title: Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation

Title: DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving

Title: Modeling Beyond MOS: Quality Assessment Models Must Integrate Context, Reasoning, and Multimodality

Title: Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

Title: On the Relation between Rectified Flows and Optimal Transport

Title: HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt Guidance

Title: Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation

Title: Discrete Markov Bridge

Title: The Missing Point in Vision Transformers for Universal Image Segmentation

Title: A Regularization-Guided Equivariant Approach for Image Restoration

Title: Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation

Title: A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking

Title: Deep Active Inference Agents for Delayed and Long-Horizon Environments

Title: Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling

Title: Deep Spectral Prior

Title: StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

Title: Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM

Title: Attention! You Vision Language Model Could Be Maliciously Manipulated

Title: An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Title: MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Title: UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space

Title: Learning to Select In-Context Demonstration Preferred by Large Language Model

Title: PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction

Title: Rethinking Probabilistic Circuit Parameter Learning

Title: NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

Title: TabPFN: One Model to Rule Them All?

Title: Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage

Title: Data-Free Class-Incremental Gesture Recognition with Prototype-Guided Pseudo Feature Replay

Title: Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

Title: PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation

Title: From Data to Modeling: Fully Open-vocabulary Scene Graph Generation

Title: Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning

Title: Proxy-Free GFlowNet

Title: MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models

Title: Understanding Generalization in Diffusion Models via Probability Flow Distance

Title: Agentic 3D Scene Generation with Spatially Contextualized VLMs

Title: FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Title: Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

Title: Fine-grained List-wise Alignment for Generative Medication Recommendation

Title: Multimodal Federated Learning With Missing Modalities through Feature Imputation Network

Title: AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models

Title: In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation

Title: ImgEdit: A Unified Image Editing Dataset and Benchmark

Title: MotionPro: A Precise Motion Controller for Image-to-Video Generation

Title: Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots

Title: Visualized Text-to-Image Retrieval

Title: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Title: DiSA: Diffusion Step Annealing in Autoregressive Image Generation