2026-03-31

Title: Language-Conditioned World Modeling for Visual Navigation

Title: From Diffusion To Flow: Efficient Motion Generation In MotionGPT3

Title: Survey on Remote Sensing Scene Classification: From Traditional Methods to Large Generative AI Models

Title: Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data

Title: Physics-Aware Diffusion for LiDAR Point Cloud Densification

Title: A training-free framework for high-fidelity appearance transfer via diffusion transformers

Title: Aesthetic Assessment of Chinese Handwritings Based on Vision Language Models

Title: From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics

Title: Elucidating the Design Space of Flow Matching for Cellular Microscopy

Title: MemGuard-Alpha: Detecting and Filtering Memorization-Contaminated Signals in LLM-Based Financial Forecasting via Membership Inference and Cross-Model Disagreement

Title: Gaussian Joint Embeddings For Self-Supervised Representation Learning

Title: Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations

Title: Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Title: Envisioning global urban development with satellite imagery and generative AI

Title: Beyond Textual Knowledge-Leveraging Multimodal Knowledge Bases for Enhancing Vision-and-Language Navigation

Title: LACON: Training Text-to-Image Model from Uncurated Data

Title: Property-Guided Molecular Generation and Optimization via Latent Flows

Title: Strategic Candidacy in Generative AI Arenas

Title: Leveraging Avatar Fingerprinting: A Multi-Generator Photorealistic Talking-Head Public Database and Benchmark

Title: High dimensional theory of two-phase optimizers

Title: Probabilistic Forecasting of Localized Wildfire Spread Based on Conditional Flow Matching

Title: Generative Shape Reconstruction with Geometry-Guided Langevin Dynamics

Title: Unified Number-Free Text-to-Motion Generation Via Flow Matching

Title: Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching

Title: SceneExpander: Expanding 3D Scenes with Free-Form Inserted Views

Title: Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation

Title: SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation

Title: Spectral-Aware Text-to-Time Series Generation with Billion-Scale Multimodal Meteorological Data

Title: MotionRFT: Unified Reinforcement Fine-Tuning for Text-to-Motion Generation

Title: Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

Title: Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models

Title: Make It Up: Fake Images, Real Gains in Generalized Few-shot Semantic Segmentation

Title: LightMover: Generative Light Movement with Color and Intensity Controls

Title: Seeing the Scene Matters: Revealing Forgetting in Video Understanding Models with a Scene-Aware Long-Video Benchmark

Title: TrendGen: An Outfit Recommendation and Display System

Title: Dual-Path Learning based on Frequency Structural Decoupling and Regional-Aware Fusion for Low-Light Image Super-Resolution

Title: Unsafe by Reciprocity: How Generation-Understanding Coupling Undermines Safety in Unified Multimodal Models

Title: Falcon Perception

Title: The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams

Title: Mind the Shape Gap: A Benchmark and Baseline for Deformation-Aware 6D Pose Estimation of Agricultural Produce

Title: SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

Title: GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback

Title: LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model

Title: FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies

Title: KV Cache Quantization for Self-Forcing Video Generation: A 33-Method Empirical Study

Title: Variational Learning of Fractional Posteriors

Title: Understanding Semantic Perturbations on In-Processing Generative Image Watermarks

Title: TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets

Title: OmniColor: A Unified Framework for Multi-modal Lineart Colorization

Title: LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Title: Annotation-Free Detection of Drivable Areas and Curbs Leveraging LiDAR Point Cloud Maps

Title: A Robust Low-Rank Prior Model for Structured Cartoon-Texture Image Decomposition with Heavy-Tailed Noise

Title: You Only Erase Once: Erasing Anything without Bringing Unexpected Content

Title: OPRO: Orthogonal Panel-Relative Operators for Panel-Aware In-Context Image Generation

Title: Test-Time Instance-Specific Parameter Composition: A New Paradigm for Adaptive Generative Modeling

Title: Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

Title: Customized Visual Storytelling with Unified Multimodal LLMs

Title: LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation

Title: Can Unsupervised Segmentation Reduce Annotation Costs for Video Semantic Segmentation?

Title: Look, Compare and Draw: Differential Query Transformer for Automatic Oil Painting

Title: TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

Title: AI-Powered Facial Mask Removal Is Not Suitable For Biometric Identification

Title: When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models

Title: Inference-time Trajectory Optimization for Manga Image Editing

Title: What-If Explanations Over Time: Counterfactuals for Time Series Classification

Title: Towards Emotion Recognition with 3D Pointclouds Obtained from Facial Expression Images

Title: Diversity Matters: Dataset Diversification and Dual-Branch Network for Generalized AI-Generated Image Detection

Title: Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning

Title: SAGE: Sink-Aware Grounded Decoding for Multimodal Hallucination Mitigation

Title: ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control

Title: FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation

Title: ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments

Title: Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Title: MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation

Title: RetinexDualV2: Physically-Grounded Dual Retinex for Generalized UHD Image Restoration

Title: From Independent to Correlated Diffusion: Generalized Generative Modeling with Probabilistic Computers

Title: Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

Title: From Vessel Trajectories to Safety-Critical Encounter Scenarios: A Generative AI Framework for Autonomous Ship Digital Testing

Title: AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Title: SIMR-NO: A Spectrally-Informed Multi-Resolution Neural Operator for Turbulent Flow Super-Resolution

Title: LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

Title: GEMS: Agent-Native Multimodal Generation with Memory and Skills

Title: Heddle: A Distributed Orchestration System for Agentic RL Rollout

Title: ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment

Title: ColorFLUX: A Structure-Color Decoupling Framework for Old Photo Colorization

Title: Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

Title: ToLL: Topological Layout Learning with Structural Multi-view Augmentation for 3D Scene Graph Pretraining

Title: MR-ImagenTime: Multi-Resolution Time Series Generation through Dual Image Representations

Title: Integrating Multimodal Large Language Model Knowledge into Amodal Completion

Title: VistaGEN: Consistent Driving Video Generation with Fine-Grained Control Using Multiview Visual-Language Reasoning

Title: AutoCut: End-to-end advertisement video editing based on multimodal discretization and controllable generation

Title: Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models

Title: EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Title: Unified Restoration-Perception Learning: Maritime Infrared-Visible Image Fusion and Segmentation

Title: Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

Title: $R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

Title: CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

Title: ConceptWeaver: Weaving Disentangled Concepts with Flow

Title: Generalizable Detection of AI Generated Images with Large Models and Fuzzy Decision Tree

Title: Seen2Scene: Completing Realistic 3D Scenes with Visibility-Guided Flow

Title: Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model

Title: Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph Generation

Title: ORSIFlow: Saliency-Guided Rectified Flow for Optical Remote Sensing Salient Object Detection

Title: TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Title: Divide and Restore: A Modular Task-Decoupled Framework for Universal Image Restoration

Title: DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

Title: Stepwise Credit Assignment for GRPO on Flow-Matching Models

Title: SonoWorld: From One Image to a 3D Audio-Visual Scene

Title: On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Title: PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

Title: HandX: Scaling Bimanual Motion and Interaction Generation

Title: Gen-Searcher: Reinforcing Agentic Search for Image Generation