2025-03-19

Title: Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection

Title: Context-aware Multimodal AI Reveals Hidden Pathways in Five Centuries of Art Evolution

Title: Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

Title: FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Title: SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint

Title: Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition

Title: TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark

Title: C2D-ISR: Optimizing Attention-based Image Super-resolution from Continuous to Discrete Scales

Title: FedVSR: Towards Model-Agnostic Federated Learning in Video Super-Resolution

Title: Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

Title: LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation

Title: FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification

Title: MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

Title: Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection

Title: SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing

Title: Less is More: Improving Motion Diffusion Models with Sparse Keyframes

Title: RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving

Title: MoK-RAG: Mixture of Knowledge Paths Enhanced Retrieval-Augmented Generation for Embodied AI Environments

Title: Where do Large Vision-Language Models Look at when Answering Questions?

Title: ChatBEV: A Visual Language Model that Understands BEV Maps

Title: Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

Title: Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation

Title: Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model

Title: SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model

Title: DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation

Title: MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Title: DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Title: MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling

Title: Boosting Semi-Supervised Medical Image Segmentation via Masked Image Consistency and Discrepancy Learning

Title: Intra and Inter Parser-Prompted Transformers for Effective Image Restoration

Title: AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark

Title: Fast Autoregressive Video Generation with Diagonal Decoding

Title: Growing a Twig to Accelerate Large Vision-Language Models

Title: Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency

Title: Towards properties of adversarial image perturbations

Title: Condensing Action Segmentation Datasets via Generative Network Inversion

Title: Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding

Title: Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Title: Speculative Decoding for Verilog: Speed and Quality, All in One

Title: RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation

Title: Decision Tree Induction Through LLMs via Semantically-Aware Evolution

Title: Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis

Title: Quantization-Free Autoregressive Action Transformer

Title: CTSR: Controllable Fidelity-Realness Trade-off Distillation for Real-World Image Super Resolution

Title: Free-Lunch Color-Texture Disentanglement for Stylized Image Generation

Title: Towards synthetic generation of realistic wooden logs

Title: Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Title: PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation

Title: DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies

Title: LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Title: EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Title: Revealing higher-order neural representations with generative artificial intelligence

Title: PENCIL: Long Thoughts with Short Memory

Title: VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Title: RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment

Title: Impossible Videos

Title: Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance

Title: MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

Title: LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

Title: Graph-CNNs for RF Imaging: Learning the Electric Field Integral Equations

Title: Bolt3D: Generating 3D Scenes in Seconds

Title: SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model

Title: Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Title: ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing

Title: DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Title: Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Title: Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Title: Deeply Supervised Flow-Based Generative Models

Title: Advances in 4D Generation: A Survey

Title: The Power of Context: How Multimodality Improves Image Super-Resolution

Title: MusicInfuser: Making Video Diffusion Listen and Dance