diffusion

Title: Perceptual Similarity guidance and text guidance optimization for Editing Real Images using Guided Diffusion Models. (arXiv:2312.06680v1 [cs.CV])

Title: SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction. (arXiv:2312.06704v1 [cs.CV])

Title: Neutral Editing Framework for Diffusion-based Video Editing. (arXiv:2312.06708v1 [cs.CV])

Title: Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models. (arXiv:2312.06712v1 [cs.CV])

Title: EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion. (arXiv:2312.06725v1 [cs.CV])

Title: DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting. (arXiv:2312.06734v1 [cs.CV])

Title: InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following. (arXiv:2312.06738v1 [cs.CV])

Title: SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models. (arXiv:2312.06739v1 [cs.CV])

Title: Relightful Harmonization: Lighting-aware Portrait Background Replacement. (arXiv:2312.06886v1 [cs.CV])

Title: LoRA-Enhanced Distillation on Guided Diffusion Models. (arXiv:2312.06899v1 [cs.CV])

Title: CCM: Adding Conditional Controls to Text-to-Image Consistency Models. (arXiv:2312.06971v1 [cs.CV])

Title: Diff-OP3D: Bridging 2D Diffusion for Open Pose 3D Zero-Shot Classification. (arXiv:2312.07039v1 [cs.CV])

Title: Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation. (arXiv:2312.07063v1 [cs.CV])

Title: DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models. (arXiv:2312.07066v1 [cs.CL])

Title: Text2AC-Zero: Consistent Synthesis of Animated Characters using 2D Diffusion. (arXiv:2312.07133v1 [cs.CV])

Title: Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation. (arXiv:2312.07231v1 [cs.CV])

Title: Scalable Motion Style Transfer with Constrained Diffusion Generation. (arXiv:2312.07311v1 [cs.CV])

Title: GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos. (arXiv:2312.07322v1 [cs.CV])

Title: Learned representation-guided diffusion models for large-image generation. (arXiv:2312.07330v1 [cs.CV])

Title: Boosting Latent Diffusion with Flow Matching. (arXiv:2312.07360v1 [cs.CV])

Title: DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing. (arXiv:2312.07409v1 [cs.CV])

Title: MinD-3D: Reconstruct High-quality 3D objects in Human Brain. (arXiv:2312.07485v1 [cs.CV])

Title: Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay. (arXiv:2312.06710v1 [cs.LG])

Title: Generating High-Resolution Regional Precipitation Using Conditional Diffusion Model. (arXiv:2312.07112v1 [cs.LG])

Title: Equivariant Flow Matching with Hybrid Probability Transport. (arXiv:2312.07168v1 [cs.LG])

Title: Momentum Particle Maximum Likelihood. (arXiv:2312.07335v1 [cs.LG])

self-supervised

Title: Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images. (arXiv:2312.07273v1 [cs.CV])

Title: NearbyPatchCL: Leveraging Nearby Patches for Self-Supervised Patch-Level Multi-Class Classification in Whole-Slide Images. (arXiv:2312.07489v1 [cs.CV])

Title: Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus. (arXiv:2312.06668v1 [cs.CL])

Title: Multimodal Pretraining of Medical Time Series and Notes. (arXiv:2312.06855v1 [cs.LG])

Title: Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification. (arXiv:2312.07338v1 [cs.CL])

foundation model

Title: AM-RADIO: Agglomerative Model -- Reduce All Domains Into One. (arXiv:2312.06709v1 [cs.CV])

Title: SqueezeSAM: User friendly mobile interactive segmentation. (arXiv:2312.06736v1 [cs.CV])

Title: Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment. (arXiv:2312.06960v1 [cs.CV])

Title: Efficient Few-Shot Clinical Task Adaptation with Large Language Models. (arXiv:2312.07125v1 [cs.CV])

Title: How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation. (arXiv:2312.07424v1 [cs.LG])

Title: Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks. (arXiv:2312.06795v1 [cs.LG])

generative

Title: Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning. (arXiv:2312.06699v1 [cs.CV])

Title: Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations. (arXiv:2312.06716v1 [cs.CV])

Title: Image Content Generation with Causal Reasoning. (arXiv:2312.07132v1 [cs.CV])

Title: SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models. (arXiv:2312.07492v1 [cs.CL])

anomaly

in-context

Title: Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack. (arXiv:2312.06924v1 [cs.CL])

Title: Improving Factual Error Correction by Learning to Inject Factual Errors. (arXiv:2312.07049v1 [cs.CL])

Title: ICL Markup: Structuring In-Context Learning using Soft-Token Tags. (arXiv:2312.07405v1 [cs.CL])

Title: Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection. (arXiv:2312.07476v1 [cs.CL])