2025-06-30

Title: APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization

Title: TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

Title: $\textrm{ODE}_t \left(\textrm{ODE}_l \right)$: Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling

Title: Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration

Title: Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis

Title: M3PO: Massively Multi-Task Model-Based Policy Optimization

Title: CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery

Title: TaleForge: Interactive Multimodal System for Personalized Story Creation

Title: GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles

Title: Generating Attribute-Aware Human Motions from Textual Prompt

Title: Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images

Title: Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications

Title: Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

Title: SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model

Title: RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation

Title: Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field

Title: MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation

Title: EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

Title: COOCO -- Common Objects Out-of-Context -- Semantic Violation in Scenes: Investigating Multimodal Context in Referential Communication

Title: RoomCraft: Controllable and Complete 3D Indoor Scene Generation

Title: Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling

Title: A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake

Title: Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems

Title: Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment

Title: Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy