2024-12-10

Title: TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection

Title: FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector

Title: Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Title: Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images

Title: Tabular data generation with tensor contraction layers and transformers

Title: HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Title: DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA

Title: UniScene: Unified Occupancy-centric Driving Scene Generation

Title: A Graph-Based Approach for Conversational AI-Driven Personal Memory Capture and Retrieval in a Real-world Application

Title: CigTime: Corrective Instruction Generation Through Inverse Motion Editing

Title: AI-powered Digital Twin of the Ocean: Reliable Uncertainty Quantification for Real-time Wave Height Prediction with Deep Ensemble

Title: Enhancing Sample Generation of Diffusion Models using Noise Level Correction

Title: Uncovering Vision Modality Threats in Image-to-Image Tasks

Title: Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

Title: TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Title: RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation

Title: Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Title: Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

Title: Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages

Title: Efficient Continuous Video Flow Model for Video Prediction

Title: HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing

Title: Jointly RS Image Deblurring and Super-Resolution with Adjustable-Kernel and Multi-Domain Attention

Title: A Tiered GAN Approach for Monet-Style Image Generation

Title: Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Title: Compositional Image Retrieval via Instruction-Aware Contrastive Learning

Title: ProtGO: A Transformer based Fusion Model for accurately predicting Gene Ontology (GO) Terms from full scale Protein Sequences

Title: BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Title: Open-Source Acceleration of Stable-Diffusion.cpp

Title: Language-Guided Image Tokenization for Generation

Title: SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Title: Self-Guidance: Boosting Flow and Diffusion Generation on Their Own

Title: CSG: A Context-Semantic Guided Diffusion Approach in De Novo Musculoskeletal Ultrasound Image Generation

Title: MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Title: 3D-Consistent Image Inpainting with Diffusion Models

Title: XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference

Title: Accelerating Video Diffusion Models via Distribution Matching

Title: GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting and Meshing

Title: BiDM: Pushing the Limit of Quantization for Diffusion Models

Title: Enhanced 3D Generation by 2D Editing

Title: Accelerating Manufacturing Scale-Up from Material Discovery Using Agentic Web Navigation and Retrieval-Augmented AI for Process Engineering Schematics Design

Title: Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Title: Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

Title: Nested Diffusion Models Using Hierarchical Latent Priors

Title: Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation

Title: Post-hoc Probabilistic Vision-Language Models

Title: Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Title: FlexDiT: Dynamic Token Density Control for Diffusion Transformer

Title: Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training

Title: GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

Title: PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems

Title: SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation

Title: MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

Title: AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations

Title: Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Title: An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

Title: ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Title: AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

Title: Towards Long Video Understanding via Fine-detailed Video Story Generation

Title: You KAN Do It in a Single Shot: Plug-and-Play Methods with Single-Instance Priors

Title: Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Title: MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery

Title: Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction

Title: VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition

Title: Flow Matching Guide and Code

Title: Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

Title: Neural Garment Dynamic Super-Resolution

Title: LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations

Title: HAIFAI: Human-AI Collaboration for Mental Face Reconstruction

Title: Normalizing Flows are Capable Generative Models

Title: UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts

Title: Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit

Title: Exploring the Impact of Synthetic Data on Human Gesture Recognition Tasks Using GANs

Title: Generative Lines Matching Models

Title: World-Consistent Data Generation for Vision-and-Language Navigation

Title: How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning

Title: AnomalyControl: Learning Cross-modal Semantic Features for Controllable Anomaly Synthesis

Title: When Dimensionality Reduction Meets Graph (Drawing) Theory: Introducing a Common Framework, Challenges and Opportunities

Title: MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

Title: Copyright-Protected Language Generation via Adaptive Model Fusion

Title: The Narrow Gate: Localized Image-Text Communication in Vision-Language Models

Title: Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

Title: ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Title: EMOv2: Pushing 5M Vision Model Frontier

Title: Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach

Title: Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

Title: You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Title: Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection

Title: ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet

Title: Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Title: InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

Title: Ranking-aware adapter for text-driven image ordering with CLIP

Title: Visual Lexicon: Rich Image Features in Language Space

Title: Diverse Score Distillation

Title: Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Title: Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Title: Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

Title: [MASK] is All You Need