2024-12-31

Title: Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Title: A Review of Latent Representation Models in Neuroimaging

Title: Symbolic Disentangled Representations for Images

Title: Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction

Title: Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation

Title: UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control

Title: Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales

Title: Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Title: YOLO-MST: Multiscale deep learning method for infrared small target detection based on super-resolution and YOLO

Title: Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

Title: Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach

Title: ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers

Title: MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation

Title: An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models

Title: Comprehensive Review of EEG-to-Output Research: Decoding Neural Signals into Images, Videos, and Audio

Title: A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification

Title: MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

Title: UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity

Title: Multi-Modality Driven LoRA for Adverse Condition Depth Estimation

Title: StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN

Title: Mining Platoon Patterns from Traffic Videos

Title: Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Title: Motion Transfer-Driven intra-class data augmentation for Finger Vein Recognition

Title: FairDiffusion: Enhancing Equity in Latent Diffusion Models via Fair Bayesian Perturbation

Title: Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control

Title: Prot\'eg\'e: Learn and Generate Basic Makeup Styles with Generative Adversarial Networks (GANs)

Title: Open-Sora: Democratizing Efficient Video Production for All

Title: EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers

Title: Bringing Objects to Life: 4D generation from 3D objects

Title: ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos

Title: Image Augmentation Agent for Weakly Supervised Semantic Segmentation

Title: JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling

Title: Toward Scene Graph and Layout Guided Complex 3D Scene Generation

Title: Multimodal Variational Autoencoder: a Barycentric View

Title: DPBridge: Latent Diffusion Bridge for Dense Prediction

Title: Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Title: Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Title: NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics

Title: SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy

Title: Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis

Title: Overcoming Class Imbalance: Unified GNN Learning with Structural and Semantic Connectivity Representations

Title: Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Title: HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images

Title: 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives

Title: Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Title: Advancing Parkinson's Disease Progression Prediction: Comparing Long Short-Term Memory Networks and Kolmogorov-Arnold Networks

Title: VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Title: TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting

Title: ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning

Title: Inclusion 2024 Global Multimedia Deepfake Detection: Towards Multi-dimensional Facial Forgery Detection

Title: DDIM sampling for Generative AIBIM, a faster intelligent structural design framework

Title: ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Title: Low-Light Image Enhancement via Generative Perceptual Priors

Title: HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

Title: Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering

Title: Efficiently Serving LLM Reasoning Programs with Certaindex

Title: EdgeRAG: Online-Indexed RAG for Edge Devices

Title: Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration

Title: E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Title: Towards Effective Discrimination Testing for Generative AI

Title: VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Title: Varformer: Adapting VAR's Generative Prior for Image Restoration

Title: Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Title: Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Title: PERSE: Personalized 3D Generative Avatars from A Single Portrait