2025-09-30

Title: Pathological Truth Bias in Vision-Language Models

Title: Deep Learning Empowered Super-Resolution: A Comprehensive Survey and Future Prospects

Title: GZSL-MoE: Apprentissage G{é}n{é}ralis{é} Z{é}ro-Shot bas{é} sur le M{é}lange d'Experts pour la Segmentation S{é}mantique de Nuages de Points 3DAppliqu{é} {à} un Jeu de Donn{é}es d'Environnement de Collaboration Humain-Robot

Title: LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning

Title: MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning

Title: DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models

Title: VideoScore2: Think before You Score in Generative Video Evaluation

Title: Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

Title: Adaptive Margin RLHF via Preference over Preferences

Title: ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models

Title: Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings

Title: FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning

Title: GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes

Title: Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas

Title: Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Title: Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery

Title: ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View

Title: Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation

Title: Planning with Unified Multimodal Models

Title: Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy

Title: Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Title: DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence

Title: Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

Title: Activation Matching for Explanation Generation

Title: Dynamics of Learning: Generative Schedules from Latent ODEs

Title: Follow-Your-Preference: Towards Preference-Aligned Image Inpainting

Title: Causally-Enhanced Reinforcement Policy Optimization

Title: Stochastic Interpolants via Conditional Dependent Coupling

Title: Impute-MACFM: Imputation based on Mask-Aware Flow Matching

Title: Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

Title: WeatherCycle: Unpaired Multi-Weather Restoration via Color Space Decoupled Cycle Learning

Title: CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

Title: Dense associative memory on the Bures-Wasserstein space

Title: Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction

Title: Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss

Title: SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts

Title: More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression

Title: NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning

Title: OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting

Title: SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction

Title: Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Title: Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection

Title: Seeing the Unseen in Low-light Spike Streams

Title: A Neural ODE Approach to Aircraft Flight Dynamics Modelling

Title: LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning

Title: Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport

Title: Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling

Title: Landing with the Score: Riemannian Optimization through Denoising

Title: Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought

Title: Generative Modeling of Shape-Dependent Self-Contact Human Poses

Title: WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

Title: Planner Aware Path Learning in Diffusion Language Models Training

Title: FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing

Title: No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation

Title: Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning

Title: RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation

Title: Calibrated and Resource-Aware Super-Resolution for Reliable Driver Behavior Analysis

Title: Disentanglement of Variations with Multimodal Generative Modeling

Title: From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations

Title: Towards Interpretable Visual Decoding with Attention to Brain Representations

Title: RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Title: VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement

Title: Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models

Title: VAMamba: An Efficient Visual Adaptive Mamba for Image Restoration

Title: MAN: Latent Diffusion Enhanced Multistage Anti-Noise Network for Efficient and High-Quality Low-Dose CT Image Denoising

Title: VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis

Title: InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects

Title: BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images

Title: DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

Title: MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing

Title: LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders

Title: Griffin: Generative Reference and Layout Guided Image Composition

Title: Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures

Title: HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score

Title: Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability

Title: QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification

Title: PD-Diag-Net: Clinical-Priors guided Network on Brain MRI for Auxiliary Diagnosis of Parkinson's Disease

Title: DiffPCN: Latent Diffusion Model Based on Multi-view Depth Images for Point Cloud Completion

Title: M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

Title: HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation

Title: Time-Shifted Token Scheduling for Symbolic Music Generation

Title: Anchored Supervised Fine-Tuning

Title: UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception

Title: GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning

Title: Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution

Title: From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation

Title: Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance

Title: Space Group Conditional Flow Matching

Title: Electric Currents for Discrete Data Generation

Title: Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation

Title: Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric

Title: Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models

Title: Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering

Title: EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Title: MoReact: Generating Reactive Motion from Textual Descriptions

Title: Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models

Title: HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models

Title: Brain-language fusion enables interactive neural readout and in-silico experimentation

Title: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

Title: HunyuanImage 3.0 Technical Report

Title: ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation

Title: Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

Title: SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss

Title: SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Title: Pretraining Scaling Laws for Generative Evaluations of Language Models

Title: A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks

Title: Autoregressive Video Generation beyond Next Frames Prediction

Title: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow

Title: GANji: A Framework for Introductory AI Image Generation

Title: Asymmetric VAE for One-Step Video Super-Resolution Acceleration

Title: LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis

Title: Tumor Synthesis conditioned on Radiomics

Title: Simulating Post-Neoadjuvant Chemotherapy Breast Cancer MRI via Diffusion Model with Prompt Tuning

Title: An Efficient 3D Latent Diffusion Model for T1-contrast Enhanced MRI Generation

Title: UniVid: The Open-Source Unified Video Model

Title: Semantic Editing with Coupled Stochastic Differential Equations

Title: FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation

Title: Latent Visual Reasoning

Title: Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization

Title: Cycle Diffusion Model for Counterfactual Image Generation

Title: SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation

Title: Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

Title: Expanding Horizons of Level Diversity via Multi-objective Evolutionary Learning

Title: NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis

Title: UI-UG: A Unified MLLM for UI Understanding and Generation

Title: Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Title: Watermarking Diffusion Language Models

Title: From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis

Title: Mask Clustering-based Annotation Engine for Large-Scale Submeter Land Cover Mapping

Title: RapidMV: Leveraging Spatio-Angular Representations for Efficient and Consistent Text-to-Multi-View Synthesis

Title: CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers

Title: A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models

Title: UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark

Title: NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding

Title: Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA

Title: LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

Title: CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

Title: CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D

Title: Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis

Title: Training-Free Multimodal Guidance for Video to Audio Generation

Title: NeMo: Needle in a Montage for Video-Language Understanding

Title: SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems

Title: FreeRet: MLLMs as Training-Free Retrievers

Title: RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement

Title: Learning Object-Centric Representations Based on Slots in Real World Scenarios

Title: SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Title: T-POP: Test-Time Personalization with Online Preference Feedback

Title: Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility

Title: IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Title: Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Title: ExGS: Extreme 3D Gaussian Compression with Diffusion Priors

Title: MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models

Title: Assessing the risk of future Dunkelflaute events for Germany using generative deep learning

Title: Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Title: Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data

Title: ELPG-DTFS: Prior-Guided Adaptive Time-Frequency Graph Neural Network for EEG Depression Diagnosis

Title: Environment-Aware Satellite Image Generation with Diffusion Models

Title: ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation

Title: MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

Title: VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines

Title: Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer

Title: OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Title: Segmentor-Guided Counterfactual Fine-Tuning for Image Synthesis

Title: Scalable GANs with Transformers

Title: OAT-FM: Optimal Acceleration Transport for Improved Flow Matching

Title: On-the-Fly Data Augmentation for Brain Tumor Segmentation

Title: Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel

Title: SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

Title: PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

Title: Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting

Title: MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts

Title: STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation

Title: Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Title: BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

Title: UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

Title: Towards generalizable deep ptychography neural networks

Title: Score Distillation of Flow Matching Models

Title: TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Title: Chance-constrained Flow Matching for High-Fidelity Constraint-aware Generation

Title: GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts

Title: Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Title: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models

Title: GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

Title: TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion

Title: Personalized Vision via Visual In-Context Learning

Title: GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

Title: DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

Title: DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

Title: PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

Title: FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation

Title: Visual Jigsaw Post-Training Improves MLLMs