2025-04-17

Title: Flux Already Knows - Activating Subject-Driven Image Generation without Training

Title: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Title: Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation

Title: Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics

Title: Learning What NOT to Count

Title: Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset

Title: Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

Title: EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Title: DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment

Title: The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

Title: SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation

Title: GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision

Title: DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation

Title: Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting

Title: A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification

Title: ACE: Attentional Concept Erasure in Diffusion Models

Title: Synthetic Data for Blood Vessel Network Extraction

Title: AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection

Title: Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Title: R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors

Title: Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions

Title: Instruction-augmented Multimodal Alignment for Image-Text and Element Matching

Title: Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM

Title: Generative Deep Learning Framework for Inverse Design of Fuels

Title: Generalized Visual Relation Detection with Diffusion Models

Title: Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis

Title: Coding-Prior Guided Diffusion Network for Video Deblurring

Title: Cobra: Efficient Line Art COlorization with BRoAder References

Title: SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction

Title: FLIP Reasoning Challenge

Title: VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate