2025-12-03

Title: Pharmacophore-based design by learning on voxel grids

Title: Opening the Black Box: An Explainable, Few-shot AI4E Framework Informed by Physics and Expert Knowledge for Materials Engineering

Title: FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

Title: SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting

Title: WhAM: Towards A Translative Model of Sperm Whale Vocalization

Title: InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages

Title: Uncertainty Reasoning with Photonic Bayesian Machines

Title: Towards Unified Video Quality Assessment

Title: Spatiotemporal Pyramid Flow Matching for Climate Emulation

Title: Progressive Image Restoration via Text-Conditioned Video Generation

Title: Enhancing Cross Domain SAR Oil Spill Segmentation via Morphological Region Perturbation and Synthetic Label-to-SAR Generation

Title: Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

Title: SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification

Title: Understanding and Harnessing Sparsity in Unified Multimodal Models

Title: On-the-fly Feedback SfM: Online Explore-and-Exploit UAV Photogrammetry with Incremental Mesh Quality-Aware Indicator and Predictive Path Planning

Title: ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via Local Similarity

Title: Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles

Title: MitUNet: Enhancing Floor Plan Recognition using a Hybrid Mix-Transformer and U-Net Architecture

Title: LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework

Title: Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

Title: ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation

Title: Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Title: WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling

Title: YingVideo-MV: Music-Driven Multi-Stage Video Generation

Title: dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Title: GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding

Title: Water Quality Estimation Through Machine Learning Multivariate Analysis

Title: Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling

Title: OmniPerson: Unified Identity-Preserving Pedestrian Generation

Title: Co-speech Gesture Video Generation via Motion-Based Graph Retrieval

Title: GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

Title: RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence

Title: PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding

Title: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Title: Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening

Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Title: Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models

Title: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Title: Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents

Title: PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution

Title: ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data

Title: LumiX: Structured and Coherent Text-to-Intrinsic Generation

Title: IC-World: In-Context Generation for Shared World Modeling

Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

Title: From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity

Title: Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

Title: MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Title: Glance: Accelerating Diffusion Models with 1 Sample

Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding

Title: DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Title: Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

Title: Layout Anything: One Transformer for Universal Room Layout Estimation

Title: Pruning AMR: Efficient Visualization of Implicit Neural Representations via Weight Matrix Analysis

Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Title: TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond

Title: AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry

Title: Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

Title: MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

Title: ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Title: Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

Title: MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Title: PPTArena: A Benchmark for Agentic PowerPoint Editing

Title: CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues