2025-08-12

Title: MILD: Multi-Layer Diffusion Strategy for Complex and Precise Multi-IP Aware Human Erasing

Title: Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

Title: Slice or the Whole Pie? Utility Control for AI Models

Title: GFlowNets for Learning Better Drug-Drug Interaction Representations

Title: Generative Artificial Intelligence Extracts Structure-Function Relationships from Plants for New Materials

Title: Local Diffusion Models and Phases of Data Distributions

Title: CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation

Title: Using Imperfect Synthetic Data in Downstream Inference Tasks

Title: Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN

Title: Towards Robust Red-Green Watermarking for Autoregressive Image Generators

Title: Fourier Optics and Deep Learning Methods for Fast 3D Reconstruction in Digital Holography

Title: Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video

Title: PANAMA: A Network-Aware MARL Framework for Multi-Agent Path Finding in Digital Twin Ecosystems

Title: Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Title: AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

Title: Advancements in Chinese font generation since deep learning era: A survey

Title: MultiRef: Controllable Image Generation with Multiple Visual References

Title: QuiZSF: An efficient data-model interaction framework for zero-shot time-series forecasting

Title: Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

Title: AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning

Title: CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Title: Discovery Learning accelerates battery design evaluation

Title: TADoc: Robust Time-Aware Document Image Dewarping

Title: S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision

Title: Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Title: HiMat: DiT-based Ultra-High Resolution SVBRDF Generation

Title: DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents

Title: Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities

Title: A Stage-Aware Mixture of Experts Framework for Neurodegenerative Disease Progression Modelling

Title: 3DGS-VBench: A Comprehensive Video Quality Evaluation Benchmark for 3DGS Compression

Title: Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria

Title: Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Title: CMAMRNet: A Contextual Mask-Aware Network Enhancing Mural Restoration Through Comprehensive Mask Guidance

Title: CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion

Title: Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

Title: Similarity Matters: A Novel Depth-guided Network for Image Restoration and A New Dataset

Title: Unsupervised Real-World Super-Resolution via Rectified Flow Degradation Modelling

Title: Bridging Semantic Logic Gaps: A Cognition-Inspired Multimodal Boundary-Preserving Network for Image Manipulation Localization

Title: EDGE: A Theoretical Framework for Misconception-Aware Adaptive Learning

Title: HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation

Title: Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers

Title: SynMatch: Rethinking Consistency in Medical Image Segmentation with Sparse Annotations

Title: DragonFruitQualityNet: A Lightweight Convolutional Neural Network for Real-Time Dragon Fruit Quality Inspection on Mobile Devices

Title: RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning

Title: Planner-Refiner: Dynamic Space-Time Refinement for Vision-Language Alignment in Videos

Title: Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants

Title: CoAR: Concept Injection into Autoregressive Models for Personalized Text-to-Image Generation

Title: SODiff: Semantic-Oriented Diffusion Model for JPEG Compression Artifacts Removal

Title: DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery

Title: Tight Bounds for Schrödinger Potential Estimation in Unpaired Image-to-Image Translation Problems

Title: CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

Title: VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

Title: Enhanced Generative Structure Prior for Chinese Text Image Super-resolution

Title: CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Title: Commentary Generation for Soccer Highlights

Title: Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation

Title: When and how can inexact generative models still sample from the data manifold?

Title: ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

Title: LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Title: X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning

Title: Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo

Title: LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

Title: GLiClass: Generalist Lightweight Model for Sequence Classification Tasks

Title: Undress to Redress: A Training-Free Framework for Virtual Try-On

Title: TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding

Title: Make Your MoVe: Make Your 3D Contents by Adapting Multi-View Diffusion Models to External Editing

Title: Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting

Title: Grouped Speculative Decoding for Autoregressive Image Generation

Title: Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Title: Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion

Title: Sparse Probabilistic Graph Circuits

Title: UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models

Title: Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation

Title: Power Battery Detection

Title: Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning

Title: DiTVR: Zero-Shot Diffusion Transformer for Video Restoration

Title: Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models

Title: Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model

Title: TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather Removal

Title: Not Yet AlphaFold for the Mind: Evaluating Centaur as a Synthetic Participant

Title: Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Title: Diffusing the Blind Spot: Uterine MRI Synthesis with Diffusion Models

Title: Generative Video Matting

Title: RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question Answering

Title: Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly Detection

Title: Score Augmentation for Diffusion Models

Title: Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Title: Mitigating Biases in Surgical Operating Rooms with Geometry

Title: TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation

Title: S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix

Title: Matrix-3D: Omnidirectional Explorable 3D World Generation

Title: TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning

Title: FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting

Title: MuaLLM: A Multimodal Large Language Model Agent for Circuit Design Assistance with Hybrid Contextual Retrieval-Augmented Generation

Title: Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization

Title: CD-TVD: Contrastive Diffusion for 3D Super-Resolution with Scarce High-Resolution Time-Varying Data

Title: PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation

Title: Reinforcement Learning in Vision: A Survey

Title: SAGOnline: Segment Any Gaussians Online

Title: Learning User Preferences for Image Generation Model

Title: OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution

Title: Cut2Next: Generating Next Shot via In-Context Tuning

Title: StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation