2025-06-13

Title: Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models

Title: Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection

Title: NnD: Diffusion-based Generation of Physically-Nonnegative Objects

Title: ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Title: The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset

Title: SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score

Title: Retrieval of Surface Solar Radiation through Implicit Albedo Recovery from Temporal Context

Title: Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models

Title: Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment

Title: LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

Title: HalLoc: Token-level Localization of Hallucinations for Vision Language Models

Title: Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework

Title: Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video

Title: GeoCAD: Local Geometry-Controllable CAD Generation

Title: UrbanSense:AFramework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models

Title: PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Title: Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

Title: Can We Infer Confidential Properties of Training Data from LLMs?

Title: EQA-RM: A Generative Embodied Reward Model with Test-time Scaling

Title: ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion

Title: Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

Title: Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation

Title: Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height

Title: Rethinking Generative Human Video Coding with Implicit Motion Transformation

Title: LLMs Are Not Yet Ready for Deepfake Image Detection

Title: Equivariant Neural Diffusion for Molecule Generation

Title: DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

Title: Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration

Title: DanceChat: Large Language Model-Guided Music-to-Dance Generation

Title: Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning

Title: Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres

Title: High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model

Title: Hessian Geometry of Latent Space in Generative Models

Title: Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models

Title: GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning

Title: Uncertainty-Masked Bernoulli Diffusion for Camouflaged Object Detection Refinement

Title: IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain

Title: PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Title: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering

Title: Stroke-based Cyclic Amplifier: Image Super-Resolution at Arbitrary Ultra-Large Scales

Title: Dense Associative Memory with Epanechnikov Energy

Title: CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation

Title: The Diffusion Duality

Title: AIR: Zero-shot Generative Model Adaptation with Iterative Refinement

Title: M4V: Multi-Modal Mamba for Text-to-Video Generation

Title: VINCIE: Unlocking In-context Image Editing from Video

Title: Self-Adapting Language Models

Title: Execution Guided Line-by-Line Code Generation

Title: ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems

Title: SpectralAR: Spectral Autoregressive Visual Generation

Title: MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning

Title: GenWorld: Towards Detecting AI-generated Real-world Simulation Videos

Title: Fine-Grained Perturbation Guidance via Attention Head Selection

Title: SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis