

Title: Plasticine3D: Non-rigid 3D editting with text guidance. (arXiv:2312.10111v1 [cs.CV])

Title: Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation. (arXiv:2312.10113v1 [cs.CV])

Title: MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic 3D Human Generation. (arXiv:2312.10120v1 [cs.CV])

Title: Tell Me What You See: Text-Guided Real-World Image Denoising. (arXiv:2312.10191v1 [cs.CV])


Title: From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior. (arXiv:2312.10118v1 [cs.CV])

Title: Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization. (arXiv:2312.10165v1 [cs.CV])

Title: T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning. (arXiv:2312.10217v1 [cs.CV])

foundation model

Title: FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models. (arXiv:2312.10114v1 [cs.CV])

Title: SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery. (arXiv:2312.10115v1 [cs.CV])

Title: Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey. (arXiv:2312.10163v1 [cs.CV])

Title: Low-resource classification of mobility functioning information in clinical sentences using large language models. (arXiv:2312.10202v1 [cs.CL])


Title: NM-FlowGAN: Modeling sRGB Noise with a Hybrid Approach based on Normalizing Flows and Generative Adversarial Networks. (arXiv:2312.10112v1 [cs.CV])

Title: Data-Efficient Multimodal Fusion on a Single GPU. (arXiv:2312.10144v1 [cs.LG])



Title: ICD-LM: Configuring Vision-Language In-Context Demonstrations by Language Modeling. (arXiv:2312.10104v1 [cs.CV])