2026-03-11

Title: VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model

Title: Are Expressive Encoders Necessary for Discrete Graph Generation?

Title: HECTOR: Hybrid Editable Compositional Object References for Video Generation

Title: Towards Visual Query Segmentation in the Wild

Title: A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems

Title: MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering

Title: TIDE: Text-Informed Dynamic Extrapolation with Step-Aware Temperature Control for Diffusion Transformers

Title: Using Vision Language Foundation Models to Generate Plant Simulation Configurations via In-Context Learning

Title: SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Title: Diffusion-Based Authentication of Copy Detection Patterns: A Multimodal Framework with Printer Signature Conditioning

Title: The Coupling Within: Flow Matching via Distilled Normalizing Flows

Title: Spectral-Structured Diffusion for Single-Image Rain Removal

Title: GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

Title: OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing

Title: Chain of Event-Centric Causal Thought for Physically Plausible Video Generation

Title: Training-free Motion Factorization for Compositional Video Generation

Title: Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Title: QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model

Title: Rotation Equivariant Mamba for Vision Tasks

Title: Agentic AI as a Network Control-Plane Intelligence Layer for Federated Learning over 6G

Title: RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

Title: Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

Title: Progressive Split Mamba: Effective State Space Modelling for Image Restoration

Title: Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Title: TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

Title: RAE-NWM: Navigation World Model in Dense Visual Representation Space

Title: When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Title: ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph

Title: From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

Title: CogBlender: Towards Continuous Cognitive Intervention in Text-to-Image Generation

Title: IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework

Title: Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework

Title: Reviving ConvNeXt for Efficient Convolutional Diffusion Models

Title: Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity

Title: Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion

Title: Streaming Autoregressive Video Generation via Diagonal Distillation

Title: Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning

Title: Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

Title: Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization

Title: Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

Title: ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis

Title: Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution

Title: Decoder-Free Distillation for Quantized Image Restoration

Title: Grounding Synthetic Data Generation With Vision and Language Models

Title: X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

Title: Well Log-Guided Synthesis of Subsurface Images from Sparse Petrography Data Using cGANs

Title: When to Lock Attention: Training-Free KV Control in Video Diffusion

Title: ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Title: TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR

Title: FrameDiT: Diffusion Transformer with Frame-Level Matrix Attention for Efficient Video Generation

Title: LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Title: ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

Title: CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning

Title: InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Title: DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary

Title: Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

Title: WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition

Title: On the Structural Failure of Chamfer Distance in 3D Shape Optimization

Title: Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation

Title: Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

Title: Towards a Neural Debugger for Python