2025-06-09

Title: Scalable Generation of Spatial Transcriptomics from Histology Images via Whole-Slide Flow Matching

Title: Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards

Title: Speaking images. A novel framework for the automated self-description of artworks

Title: An Independent Discriminant Network Towards Identification of Counterfeit Images and Videos

Title: Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment

Title: TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations

Title: Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation

Title: Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction

Title: BYO-Eval: Build Your Own Dataset for Fine-Grained Visual Assessment of Multimodal Language Models

Title: Degradation-Aware Image Enhancement via Vision-Language Classification

Title: Implicit Neural Representation for Video Restoration

Title: Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

Title: The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

Title: FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL

Title: MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning

Title: On Fitting Flow Models with Large Sinkhorn Couplings

Title: SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

Title: EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh

Title: PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Title: UniRes: Universal Image Restoration for Complex Degradations

Title: Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution

Title: GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance

Title: Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones

Title: Learning to Weight Parameters for Data Attribution

Title: Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery

Title: Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization

Title: Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Title: Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application

Title: BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Title: LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Title: NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces

Title: Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning

Title: FontAdapter: Instant Font Adaptation in Visual Text Generation

Title: Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

Title: Exponential Family Variational Flow Matching for Tabular Data Generation

Title: MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation

Title: AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models

Title: Restereo: Diffusion stereo video generation and restoration

Title: Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

Title: HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

Title: Feedback Guidance of Diffusion Models

Title: Synthetic Tabular Data: Methods, Attacks and Defenses

Title: Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

Title: ENMA: Tokenwise Autoregression for Generative Neural PDE Operators

Title: Model-Driven Graph Contrastive Learning

Title: Corrector Sampling in Language Models

Title: GenIR: Generative Visual Feedback for Mental Image Retrieval

Title: Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study

Title: STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis