2025-12-03

Title: Pharmacophore-based design by learning on voxel grids

Title: Mirror, Mirror on the Wall -- Which is the Best Model of Them All?

Title: Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models

Title: Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale

Title: Reversing Large Language Models for Efficient Training and Fine-Tuning

Title: FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

Title: CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models

Title: Spatiotemporal Pyramid Flow Matching for Climate Emulation

Title: Enhancing Cross Domain SAR Oil Spill Segmentation via Morphological Region Perturbation and Synthetic Label-to-SAR Generation

Title: Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

Title: Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision

Title: TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction

Title: WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting

Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework

Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation

Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding

Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling

Title: SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts

Title: On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

Title: WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens

Title: In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval

Title: GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

Title: Modeling and Inverse Identification of Interfacial Heat Conduction in Finite Layer and Semi-Infinite Substrate Systems via a Physics-Guided Neural Framework

Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence

Title: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening

Title: Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models

Title: Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents

Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution

Title: Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data

Title: GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Title: FGC-Comp: Adaptive Neighbor-Grouped Attribute Completion for Graph-based Anomaly Detection

Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset

Title: LumiX: Structured and Coherent Text-to-Intrinsic Generation

Title: IC-World: In-Context Generation for Shared World Modeling

Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

Title: From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity

Title: A Comparative Study on How Data Normalization Affects Zero-Shot Generalization in Time Series Foundation Models

Title: Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

Title: Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

Title: Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

Title: Glance: Accelerating Diffusion Models with 1 Sample

Title: FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization

Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Title: LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

Title: BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond

Title: DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Title: DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

Title: In-Context Sync-LoRA for Portrait Video Editing

Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues