2025-04-01

Title: PowerGNN: A Topology-Aware Graph Neural Network for Electricity Grids

Title: A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation

Title: Reasoning Beyond Limits: Advances and Open Problems for LLMs

Title: Cyborg Data: Merging Human with AI Generated Training Data

Title: Uncertainty-Aware Graph Self-Training with Expectation-Maximization Regularization

Title: Graph-Based Uncertainty-Aware Self-Training with Stochastic Node Labeling

Title: Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting

Title: Patronus: Bringing Transparency to Diffusion Models with Prototypes

Title: DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers

Title: SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction

Title: Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

Title: AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs

Title: Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Title: Bi-Level Multi-View fuzzy Clustering with Exponential Distance

Title: From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Title: indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy

Title: On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation

Title: MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs

Title: Shape and Texture Recognition in Large Vision-Language Models

Title: Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation

Title: Evaluating Compositional Scene Understanding in Multimodal Generative Models

Title: Can DeepSeek-V3 Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery

Title: Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Title: A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery

Title: Synthetic Art Generation and DeepFake Detection A Study on Jamini Roy Inspired Dataset

Title: Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus

Title: SalesRLAgent: A Reinforcement Learning Approach for Real-Time Sales Conversion Prediction and Optimization

Title: MoCha: Towards Movie-Grade Talking Character Synthesis

Title: HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

Title: TraceMark-LDM: Authenticatable Watermarking for Latent Diffusion Models via Binary-Guided Rearrangement

Title: Object Isolated Attention for Consistent Story Visualization

Title: ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts

Title: VideoFusion: A Spatio-Temporal Collaborative Network for Mutli-modal Video Fusion and Restoration

Title: FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning

Title: Towards Physically Plausible Video Generation via VLM Planning

Title: Map Feature Perception Metric for Map Generation Quality Assessment and Loss Optimization

Title: JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Title: COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation

Title: A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models

Title: Diffusion Meets Few-shot Class Incremental Learning

Title: GMapLatent: Geometric Mapping in Latent Space

Title: VideoGen-Eval: Agent-based System for Video Generation Evaluation

Title: Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Title: TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

Title: OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

Title: A Survey on Unlearnable Data

Title: Enhancing Creative Generation on Stable Diffusion-based Models

Title: DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

Title: Make Autoregressive Great Again: Diffusion-Free Graph Generation with Next-Scale Prediction

Title: Graph-Eq: Discovering Mathematical Equations using Graph Generative Models

Title: Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Title: Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation

Title: DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Title: Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity

Title: Expanding-and-Shrinking Binary Neural Networks

Title: HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Title: Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

Title: KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language

Title: Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation

Title: Time-Series Forecasting via Topological Information Supervised Framework with Efficient Topological Feature Learning

Title: Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning

Title: On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices

Title: Learned Image Compression and Restoration for Digital Pathology

Title: MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach

Title: DiffScale: Continuous Downscaling and Bias Correction of Subseasonal Wind Speed Forecasts using Diffusion Models

Title: Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

Title: FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment

Title: Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations

Title: JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation

Title: AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

Title: Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning

Title: DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model

Title: HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Title: TransMamba: Flexibly Switching between Transformer and Mamba

Title: DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description

Title: Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning

Title: Learning a Canonical Basis of Human Preferences from Binary Ratings

Title: Predicting Targeted Therapy Resistance in Non-Small Cell Lung Cancer Using Multimodal Machine Learning

Title: Many-to-Many Matching via Sparsity Controlled Optimal Transport

Title: Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes

Title: Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation

Title: FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics

Title: Visual Acoustic Fields

Title: Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction

Title: Style Quantization for Data-Efficient GAN Training

Title: PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks

Title: ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion

Title: InstructRestore: Region-Customized Image Restoration with Human Instructions

Title: Effectively Controlling Reasoning Models through Thinking Intervention

Title: Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1

Title: Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Title: Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views

Title: Consistent Subject Generation via Contrastive Instantiated Concepts