2025-03-25

Title: State Fourier Diffusion Language Model (SFDLM): A Scalable, Novel Iterative Approach to Language Modeling

Title: ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models

Title: IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes

Title: Generative Modeling of Class Probability for Multi-Modal Representation Learning

Title: CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series

Title: Bayesian generative models can flag performance loss, bias, and out-of-distribution image content

Title: What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models

Title: ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing

Title: Judge Anything: MLLM as a Judge Across Any Modality

Title: Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields

Title: Should we pre-train a decoder in contrastive learning for dense prediction tasks?

Title: DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis

Title: Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

Title: PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

Title: Measuring the Robustness of Audio Deepfake Detectors

Title: Guidance Free Image Editing via Explicit Conditioning

Title: On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Title: Efficient Diffusion Training through Parallelization with Truncated Karhunen-Loève Expansion

Title: OMR-Diffusion:Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Intent Understanding

Title: Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

Title: MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion

Title: Multi-modality Anomaly Segmentation on the Road

Title: EMPLACE: Self-Supervised Urban Scene Change Detection

Title: Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model

Title: DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis

Title: Serial Low-rank Adaptation of Vision Transformer

Title: Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction

Title: Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

Title: Relation Extraction with Instance-Adapted Predicate Descriptions

Title: Neural Network Approach to Stochastic Dynamics for Smooth Multimodal Density Estimation

Title: Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning

Title: GLADMamba: Unsupervised Graph-Level Anomaly Detection Powered by Selective State Space Model

Title: Guided Diffusion for the Extension of Machine Vision to Human Visual Perception

Title: Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

Title: TransAnimate: Taming Layer Diffusion to Generate RGBA Video

Title: FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Title: Real-World Remote Sensing Image Dehazing: Benchmark and Baseline

Title: PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Title: PIM: Physics-Informed Multi-task Pre-training for Improving Inertial Sensor-Based Human Activity Recognition

Title: OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models

Title: Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data

Title: SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Title: PolarFree: Polarization-based Reflection-free Imaging

Title: Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation

Title: Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions

Title: Vehicular Road Crack Detection with Deep Learning: A New Online Benchmark for Comprehensive Evaluation of Existing Algorithms

Title: Unified Geometry and Color Compression Framework for Point Clouds via Generative Diffusion Priors

Title: Anomize: Better Open Vocabulary Video Anomaly Detection

Title: An Image-like Diffusion Method for Human-Object Interaction Detection

Title: TCFG: Tangential Damping Classifier-free Guidance

Title: AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs

Title: LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space

Title: LongDiff: Training-Free Long Video Generation in One Go

Title: DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation

Title: Self-Attention Diffusion Models for Zero-Shot Biomedical Image Segmentation: Unlocking New Frontiers in Medical Imaging

Title: SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction

Title: A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games

Title: CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

Title: DiffGED: Computing Graph Edit Distance via Diffusion-based Graph Matching

Title: Surface-Aware Distilled 3D Semantic Features

Title: CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

Title: Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models

Title: Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection

Title: Improved Rates of Differentially Private Nonconvex-Strongly-Concave Minimax Optimization

Title: Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control

Title: Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Title: Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners

Title: Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Title: DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation

Title: Do Your Best and Get Enough Rest for Continual Learning

Title: RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data

Title: Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Title: PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes

Title: Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning

Title: Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning

Title: U-REPA: Aligning Diffusion U-Nets to ViTs

Title: Panorama Generation From NFoV Image Done Right

Title: Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation

Title: ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

Title: Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Title: InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Title: Hiding Images in Diffusion Models by Editing Learned Score Functions

Title: PALATE: Peculiar Application of the Law of Total Expectation to Enhance the Evaluation of Deep Generative Models

Title: Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding

Title: Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model

Title: SciClaims: An End-to-End Generative System for Biomedical Claim Analysis

Title: AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction

Title: DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels

Title: HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications

Title: Discriminative protein sequence modelling with Latent Space Diffusion

Title: EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Title: Anchor-based oversampling for imbalanced tabular data via contrastive and adversarial learning

Title: Adapting Video Diffusion Models for Time-Lapse Microscopy

Title: Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling

Title: Adventurer: Exploration with BiGAN for Deep Reinforcement Learning

Title: Generative Dataset Distillation using Min-Max Diffusion Model

Title: Dig2DIG: Dig into Diffusion Information Gains for Image Fusion

Title: Adaptive Machine Learning for Resource-Constrained Environments

Title: Human Motion Unlearning

Title: NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping

Title: OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Title: Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology

Title: GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting

Title: Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings

Title: Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving

Title: Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos

Title: Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition

Title: Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation

Title: Good Keypoints for the Two-View Geometry Estimation Problem

Title: CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

Title: SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection

Title: HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Title: Efficient Self-Supervised Adaptation for Medical Image Analysis

Title: A semantic communication-based workload-adjustable transceiver for wireless AI-generated content (AIGC) delivery

Title: CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Title: CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Title: SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Title: Training-free Diffusion Acceleration with Bottleneck Sampling

Title: Video-T1: Test-Time Scaling for Video Generation

Title: DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation

Title: Aether: Geometric-Aware Unified World Modeling

Title: Tuning-Free Amodal Segmentation via the Occlusion-Free Bias of Inpainting Models

Title: Equivariant Image Modeling

Title: Target-Aware Video Diffusion Models