2025-03-27

Title: Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

Title: Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

Title: The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs

Title: Can Multi-modal (reasoning) LLMs work as deepfake detectors?

Title: Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning

Title: AIGC-assisted Federated Learning for Edge Intelligence: Architecture Design, Research Challenges and Future Directions

Title: Guiding Human-Object Interactions with Rich Geometry and Relations

Title: Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration

Title: Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Title: Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Title: Video Motion Graphs

Title: DINeMo: Learning Neural Mesh Models with no 3D Annotations

Title: Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

Title: ViLBench: A Suite for Vision-Language Process Reward Modeling

Title: RelTriple: Learning Plausible Indoor Layouts by Integrating Relationship Triples into the Diffusion Process

Title: Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model

Title: Wan: Open and Advanced Large-Scale Video Generative Models

Title: Progressive Focused Transformer for Single Image Super-Resolution

Title: Consistency Trajectory Matching for One-Step Generative Super-Resolution

Title: FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies

Title: Active Data Sampling and Generation for Bias Remediation

Title: Latent Beam Diffusion Models for Decoding Image Sequences

Title: Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Title: VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Title: Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models

Title: Small Object Detection: A Comprehensive Survey on Challenges, Techniques and Real-World Applications

Title: MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

Title: GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving

Title: TD-BFR: Truncated Diffusion Model for Efficient Blind Face Restoration

Title: Diffusion Counterfactuals for Image Regressors

Title: MMGen: Unified Multi-modal Image Generation and Understanding in One Go

Title: Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification

Title: AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Title: ARMO: Autoregressive Rigging for Multi-Category Objects

Title: BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Title: Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy

Title: GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Title: Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound

Title: Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty Regularization

Title: Learning Straight Flows by Learning Curved Interpolants

Title: RecTable: Fast Modeling Tabular Data with Rectified Flow

Title: High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching

Title: MindfulLIME: A Stable Solution for Explanations of Machine Learning Models with Enhanced Localization Precision -- A Medical Image Case Study

Title: Reliable algorithm selection for machine learning-guided design

Title: Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data

Title: FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

Title: Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency