2025-03-11

Title: A Materials Foundation Model via Hybrid Invariant-Equivariant Architectures

Title: GeoJEPA: Towards Eliminating Augmentation- and Sampling Bias in Multimodal Geospatial Learning

Title: Medical Hallucinations in Foundation Models and Their Impact on Healthcare

Title: Emergent Abilities in Large Language Models: A Survey

Title: Multi-agent Auto-Bidding with Latent Graph Diffusion Models

Title: Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records

Title: From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning

Title: IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Title: Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting

Title: A Survey on Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, and Beyond

Title: SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc

Title: Validating LLM-as-a-Judge Systems in the Absence of Gold Labels

Title: Generative Multi-Agent Q-Learning for Policy Optimization: Decentralized Wireless Networks

Title: MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

Title: Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity

Title: Towards Universal Text-driven CT Image Segmentation

Title: Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases

Title: Towards Conversational AI for Disease Management

Title: Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts

Title: Theta Theory: operads and coloring

Title: PointDiffuse: A Dual-Conditional Diffusion Model for Enhanced Point Cloud Semantic Segmentation

Title: Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior

Title: Unlocking Pretrained LLMs for Motion-Related Multimodal Generation: A Fine-Tuning Approach to Unify Diffusion and Next-Token Prediction

Title: USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Title: X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

Title: GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation

Title: VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models

Title: BioMoDiffuse: Physics-Guided Biomechanical Diffusion for Controllable and Authentic Human Motion Synthesis

Title: Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction

Title: ROCM: RLHF on consistency models

Title: ForestSplats: Deformable transient field for Gaussian Splatting in the Wild

Title: FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion

Title: PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model

Title: Explainable Synthetic Image Detection through Diffusion Timestep Ensembling

Title: Reinforced Diffuser for Red Teaming Large Vision-Language Models

Title: WaveStitch: Flexible and Fast Conditional Time Series Generation with Diffusion Models

Title: Get In Video: Add Anything You Want to the Video

Title: STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification

Title: Text2Story: Advancing Video Storytelling with Text Guidance

Title: GeoLangBind: Unifying Earth Observation with Agglomerative Vision-Language Foundation Models

Title: Accurate and Efficient Two-Stage Gun Detection in Video

Title: Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Title: Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Title: Backdoor Attacks on Discrete Graph Diffusion Models

Title: GIN-Graph: A Generative Interpretation Network for Model-Level Explanation of Graph Neural Networks

Title: Adversarial Robustness of Discriminative Self-Supervised Learning in Vision

Title: Generative Video Bi-flow

Title: VORTEX: Challenging CNNs at Texture Recognition by using Vision Transformers with Orderless and Randomized Token Encodings

Title: TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems

Title: EPR-GAIL: An EPR-Enhanced Hierarchical Imitation Learning Framework to Simulate Complex User Consumption Behaviors

Title: Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter

Title: Consistent Image Layout Editing with Diffusion Models

Title: Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues

Title: Federated Learning for Diffusion Models

Title: Pre-Training Meta-Rule Selection Policy for Visual Generative Abductive Learning

Title: Graph Retrieval-Augmented LLM for Conversational Recommendation Systems

Title: CtrTab: Tabular Data Synthesis with High-Dimensional and Limited Data

Title: A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification

Title: NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation

Title: PathVQ: Reforming Computational Pathology Foundation Model for Whole Slide Image Analysis via Vector Quantization

Title: A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation

Title: ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis

Title: Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation

Title: GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Title: One-Step Diffusion Model for Image Motion-Deblurring

Title: QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

Title: Generative modelling with jump-diffusions

Title: TR-DQ: Time-Rotation Diffusion Quantization

Title: Conceptrol: Concept Control of Zero-shot Personalized Image Generation

Title: Synthetic Data Generation for Minimum-Exposure Navigation in a Time-Varying Environment using Generative AI Models

Title: Deep Cut-informed Graph Embedding and Clustering

Title: CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning

Title: Adding Additional Control to One-Step Diffusion with Joint Distribution Matching

Title: AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation

Title: AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP

Title: Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Title: PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation

Title: Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence

Title: UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion

Title: What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Title: D3DR: Lighting-Aware Object Insertion in Gaussian Splatting

Title: CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving

Title: Color Alignment in Diffusion

Title: DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion

Title: Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints

Title: GenDR: Lightning Generative Detail Restorator

Title: VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation

Title: HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors

Title: AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

Title: Text-to-Image Diffusion Models Cannot Count, and Prompt Refinement Cannot Help

Title: ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

Title: KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus

Title: DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

Title: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Title: Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping

Title: CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing

Title: Motion Anything: Any to Motion Generation

Title: A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

Title: Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation

Title: Learning Decision Trees as Amortized Structure Inference

Title: TiGer: Self-Supervised Purification for Time-evolving Graphs

Title: SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

Title: Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

Title: EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Title: Recovering Partially Corrupted Major Objects through Tri-modality Based Image Completion

Title: TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Title: Generative method for aerodynamic optimization based on classifier-free guided denoising diffusion probabilistic model

Title: Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching

Title: Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation

Title: Controllable 3D Outdoor Scene Generation via Scene Graphs

Title: Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Title: MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction

Title: Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

Title: Strategies for political-statement segmentation and labelling in unstructured text

Title: Effective and Efficient Masked Image Generation Models

Title: Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion

Title: Synthetic Lung X-ray Generation through Cross-Attention and Affinity Transformation

Title: Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios

Title: Semantic Communications with Computer Vision Sensing for Edge Video Transmission

Title: AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

Title: COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

Title: Efficient Distillation of Classifier-Free Guidance using Adapters

Title: AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models

Title: DaD: Distilled Reinforcement Learning for Diverse Keypoint Detection

Title: Fully Unsupervised Annotation of C. Elegans

Title: HGO-YOLO: Advancing Anomaly Behavior Detection with Hierarchical Features and Lightweight Optimized Detection

Title: Probabilistic Segmentation for Robust Field of View Estimation

Title: TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models

Title: PersonaBooth: Personalized Text-to-Motion Generation

Title: SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models

Title: Keeping Representation Similarity in Finetuning for Medical Image Analysis

Title: TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Title: AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Title: Divide and Conquer Self-Supervised Learning for High-Content Imaging

Title: Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration

Title: Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction

Title: ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning

Title: Inductive Moment Matching

Title: Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters

Title: Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation

Title: Detection Avoidance Techniques for Large Language Models

Title: VACE: All-in-One Video Creation and Editing

Title: Balanced Image Stylization with Style Matching Score

Title: VoD: Learning Volume of Differences for Video-Based Deepfake Detection