2025-05-06

Title: Multi-party Collaborative Attention Control for Image Customization

Title: Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models

Title: Global Stress Generation and Spatiotemporal Super-Resolution Physics-Informed Operator under Dynamic Loading for Two-Phase Random Materials

Title: OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

Title: Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

Title: VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

Title: WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Title: A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning

Title: Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study

Title: Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings

Title: PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth

Title: Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

Title: Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos

Title: AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting

Title: Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows

Title: Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Title: PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Title: DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion

Title: Rethinking Score Distilling Sampling for 3D Editing and Generation

Title: OODTE: A Differential Testing Engine for the ONNX Optimizer

Title: LookAlike: Consistent Distractor Generation in Math MCQs

Title: BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models

Title: HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Title: Semantic Probabilistic Control of Language Models

Title: Secrets of GFlowNets' Learning Behavior: A Theoretical Study

Title: Regression s all you need for medical image translation

Title: SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

Title: Unaligned RGB Guided Hyperspectral Image Super-Resolution with Spatial-Spectral Concordance

Title: HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement

Title: Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

Title: Robust AI-Generated Face Detection with Imbalanced Data

Title: DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Title: Improving Physical Object State Representation in Text-to-Image Generative Systems

Title: Federated Causal Inference in Healthcare: Methods, Challenges, and Applications

Title: Quantizing Diffusion Models from a Sampling-Aware Perspective

Title: Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset

Title: Entropy-Guided Sampling of Flat Modes in Discrete Spaces

Title: SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Title: T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Title: FairPO: Robust Preference Optimization for Fair Multi-Label Learning

Title: Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Title: Corr2Distrib: Making Ambiguous Correspondences an Ally to Predict Reliable 6D Pose Distributions

Title: Text to Image Generation and Editing: A Survey

Title: Bielik v3 Small: Technical Report

Title: Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Title: MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Title: Sim2Real in endoscopy segmentation with a novel structure aware image translation

Title: A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

Title: Cooperative Bayesian and variance networks disentangle aleatoric and epistemic uncertainties

Title: Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge

Title: Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models

Title: AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Title: No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

Title: Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation