2025-12-12

Title: Diffusion Is Your Friend in Show, Suggest and Tell

Title: MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata

Title: Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition

Title: Detailed balance in large language model-driven agents

Title: Independent Density Estimation

Title: Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences

Title: CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation

Title: RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection

Title: MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Title: ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions

Title: Physically Aware 360$^\circ$ View Generation from a Single Image using Disentangled Scene Embeddings

Title: Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset

Title: A Conditional Generative Framework for Synthetic Data Augmentation in Segmenting Thin and Elongated Structures in Biological Images

Title: Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution

Title: Topology-Agnostic Animal Motion Generation from Text Prompt

Title: Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views

Title: RaLiFlow: Scene Flow Estimation with 4D Radar and LiDAR Point Clouds

Title: Disentangled and Distilled Encoder for Out-of-Distribution Reasoning with Rademacher Guarantees

Title: Mode-Seeking for Inverse Problems with Diffusion Models

Title: Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding

Title: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Title: Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration

Title: Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces

Title: DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Title: TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Title: Learning by Analogy: A Causal Framework for Composition Generalization

Title: CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images

Title: Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality

Title: IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation

Title: LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation

Title: Blood Pressure Prediction for Coronary Artery Disease Diagnosis using Coronary Computed Tomography Angiography

Title: What matters for Representation Alignment: Global Information or Spatial Structure?

Title: Interpretable and Steerable Concept Bottleneck Sparse Autoencoders

Title: Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants

Title: SWiT-4D: Sliding-Window Transformer for Lossless and Parameter-Free Temporal 4D Generation

Title: DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance

Title: Stronger Normalization-Free Transformers

Title: GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting

Title: OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Title: Mull-Tokens: Modality-Agnostic Latent Thinking

Title: VL-JEPA: Joint Embedding Predictive Architecture for Vision-language

Title: AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation

Title: MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation

Title: ClusIR: Towards Cluster-Guided All-in-One Image Restoration

Title: Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Title: Bidirectional Normalizing Flow: From Data to Noise and Back

Title: Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

Title: Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Title: SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Title: WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Title: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space