diffusion

Title: Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models. (arXiv:2311.16117v1 [cs.CV])

Title: Effective Quantization for Diffusion Models on CPUs. (arXiv:2311.16133v1 [cs.CV])

Title: Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models. (arXiv:2311.16176v1 [cs.LG])

Title: Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations. (arXiv:2311.16353v1 [cs.LG])

Title: Manifold Preserving Guided Diffusion. (arXiv:2311.16424v1 [cs.LG])

Title: TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering. (arXiv:2311.16465v1 [cs.CV])

Title: Efficient Multimodal Diffusion Models Using Joint Data Infilling with Partially Shared U-Net. (arXiv:2311.16488v1 [cs.CV])

Title: $Z^*$: Zero-shot Style Transfer via Attention Rearrangement. (arXiv:2311.16491v1 [cs.CV])

Title: Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement. (arXiv:2311.16495v1 [cs.CV])

Title: MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model. (arXiv:2311.16498v1 [cs.CV])

Title: Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent Synthetic Images. (arXiv:2311.16499v1 [cs.CV])

Title: LLMGA: Multimodal Large Language Model based Generation Assistant. (arXiv:2311.16500v1 [cs.CV])

Title: TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models. (arXiv:2311.16503v1 [cs.CV])

Title: Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance. (arXiv:2311.16507v1 [cs.CV])

Title: GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation. (arXiv:2311.16511v1 [cs.CV])

Title: CoSeR: Bridging Image and Language for Cognitive Super-Resolution. (arXiv:2311.16512v1 [cs.CV])

Title: Fine-grained Appearance Transfer with Diffusion Models. (arXiv:2311.16513v1 [cs.CV])

Title: SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution. (arXiv:2311.16518v1 [cs.CV])

Title: Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models. (arXiv:2311.16555v1 [cs.CV])

Title: DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser. (arXiv:2311.16565v1 [cs.CV])

Title: MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices. (arXiv:2311.16567v1 [cs.CV])

Title: LEDITS++: Limitless Image Editing using Text-to-Image Models. (arXiv:2311.16711v1 [cs.CV])

Title: As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors. (arXiv:2311.16739v1 [cs.CV])

Title: ChatTraffc: Text-to-Traffic Generation via Diffusion Model. (arXiv:2311.16203v1 [cs.LG])

Title: DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification. (arXiv:2311.16124v1 [cs.CR])

Title: Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks. (arXiv:2311.16538v1 [cs.LG])

Title: Inexpensive High Fidelity Melt Pool Models in Additive Manufacturing Using Generative Deep Diffusion. (arXiv:2311.16168v1 [cs.LG])

Title: Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation. (arXiv:2311.16199v1 [cs.LG])

Title: Personalized Predictions of Glioblastoma Infiltration: Mathematical Models, Physics-Informed Neural Networks and Multimodal Scans. (arXiv:2311.16536v1 [cs.LG])

self-supervised

Title: Progressive Target-Styled Feature Augmentation for Unsupervised Domain Adaptation on Point Clouds. (arXiv:2311.16474v1 [cs.CV])

Title: Augmenting x-ray single particle imaging reconstruction with self-supervised machine learning. (arXiv:2311.16652v1 [cs.CV])

Title: StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models. (arXiv:2311.16509v1 [cs.CL])

Title: Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling. (arXiv:2311.16361v1 [cs.LG])

Title: Contrastive encoder pre-training-based clustered federated learning for heterogeneous data. (arXiv:2311.16535v1 [cs.LG])

Title: MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures. (arXiv:2311.16666v1 [cs.LG])

foundation model

Title: Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs. (arXiv:2311.16140v1 [cs.CV])

Title: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. (arXiv:2311.16502v1 [cs.CL])

Title: Source-Free Domain Adaptation with Frozen Multimodal Foundation Model. (arXiv:2311.16510v1 [cs.CV])

Title: Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. (arXiv:2311.16452v1 [cs.CL])

generative

Title: Semantic Generative Augmentations for Few-Shot Counting. (arXiv:2311.16122v1 [cs.CV])

Title: RelVAE: Generative Pretraining for few-shot Visual Relationship Detection. (arXiv:2311.16261v1 [cs.CV])

Title: MI-Gen: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images. (arXiv:2311.16480v1 [cs.CV])

Title: PISA: Point-cloud-based Instructed Scene Augmentation. (arXiv:2311.16501v1 [cs.CV])

Title: Improving Lane Detection Generalization: A Novel Framework using HD Maps for Boosting Diversity. (arXiv:2311.16589v1 [cs.CV])

Title: MedGen: A Python Natural Language Processing Toolkit for Medical Text Processing. (arXiv:2311.16588v1 [cs.CL])

Title: Deep Learning for Time Series Classification of Parkinson's Disease Eye Tracking Data. (arXiv:2311.16381v1 [cs.LG])

anomaly

Title: Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach. (arXiv:2311.16514v1 [cs.CV])

Title: Segment Every Out-of-Distribution Object. (arXiv:2311.16516v1 [cs.CV])

Title: A Unified Hardware-based Threat Detector for AI Accelerators. (arXiv:2311.16684v1 [cs.CR])

Title: MACE: A Multi-pattern Accommodated and Efficient Anomaly Detection Method in the Frequency Domain. (arXiv:2311.16191v1 [cs.LG])

in-context