2026-03-19

Title: Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Title: Script-to-Slide Grounding: Grounding Script Sentences to Slide Objects for Automatic Instructional Video Generation

Title: AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding

Title: TDMM-LM: Bridging Facial Understanding and Animation via Language Models

Title: KGS-GCN: Enhancing Sparse Skeleton Sensing via Kinematics-Driven Gaussian Splatting and Probabilistic Topology for Action Recognition

Title: Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

Title: PhysQuantAgent: An Inference Pipeline of Mass Estimation for Vision-Language Models

Title: Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

Title: SCE-LITE-HQ: Smooth visual counterfactual explanations with generative foundation models

Title: Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Title: Early Quantization Shrinks Codebook: A Simple Fix for Diversity-Preserving Tokenization

Title: PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning

Title: CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

Title: SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval

Title: Pixel-level Counterfactual Contrastive Learning for Medical Image Segmentation

Title: MosaicMem: Hybrid Spatial Memory for Controllable Video World Models

Title: SMAL-pets: SMAL Based Avatars of Pets from Single Image

Title: Catching rationalization in the act: detecting motivated reasoning before and after CoT via activation probing

Title: GigaWorld-Policy: An Efficient Action-Centered World--Action Model

Title: Variational Rectification Inference for Learning with Noisy Labels

Title: Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation

Title: WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

Title: Stereo World Model: Camera-Guided Stereo Video Generation

Title: SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Title: Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models

Title: The Causal Uncertainty Principle: Manifold Tearing and the Topological Limits of Counterfactual Interventions

Title: Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

Title: Harnessing the Power of Foundation Models for Accurate Material Classification

Title: Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion

Title: Large-Scale 3D Ground-Motion Synthesis with Physics-Inspired Latent Operator Flow Matching

Title: Joint Degradation-Aware Arbitrary-Scale Super-Resolution for Variable-Rate Extreme Image Compression

Title: ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation

Title: FACE-net: Factual Calibration and Emotion Augmentation for Retrieval-enhanced Emotional Video Captioning

Title: AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization

Title: UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Title: Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Title: Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model

Title: ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling

Title: FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion

Title: Face anonymization preserving facial expressions and photometric realism

Title: FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models

Title: Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Title: ReLaGS: Relational Language Gaussian Splatting

Title: Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Title: DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis

Title: Anchoring and Rescaling Attention for Semantically Coherent Inbetweening

Title: Few-Step Diffusion Sampling Through Instance-Aware Discretizations

Title: Flow Matching Policy with Entropy Regularization

Title: Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Title: DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

Title: TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Title: Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems

Title: ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation

Title: Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics

Title: Steering Video Diffusion Transformers with Massive Activations

Title: TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Title: Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass

Title: Revisiting foundation models for cell instance segmentation

Title: Physics-Aware Machine Learning for Seismic and Volcanic Signal Interpretation

Title: Procedural Generation of Algorithm Discovery Tasks in Machine Learning

Title: Differential Attention-Augmented BiomedCLIP with Asymmetric Focal Optimization for Imbalanced Multi-Label Video Capsule Endoscopy Classification

Title: Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Title: A Creative Agent is Worth a 64-Token Template

Title: SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale

Title: TransText: Transparency Aware Image-to-Video Typography Animation

Title: LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

Title: Versatile Editing of Video Content, Actions, and Dynamics without Training

Title: LoST: Level of Semantics Tokenization for 3D Shapes

Title: The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

Title: EchoGen: Cycle-Consistent Learning for Unified Layout-Image Generation and Understanding