2025-12-30

Title: Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders

Title: SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models

Title: Wireless Traffic Prediction with Large Language Model

Title: ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis

Title: Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation

Title: The Physics Constraint Paradox: When Removing Explicit Constraints Improves Physics-Informed Data for Machine Learning

Title: Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models

Title: GeCo: A Differentiable Geometric Consistency Metric for Video Generation

Title: The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency

Title: Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation

Title: Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model

Title: A Three-Level Alignment Framework for Large-Scale 3D Retrieval and Controlled 4D Generation

Title: Real-Time In-Cabin Driver Behavior Recognition on Low-Cost Edge Hardware

Title: MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation

Title: LangPrecip: Language-Aware Multimodal Precipitation Nowcasting

Title: DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models

Title: Self-Evaluation Unlocks Any-Step Text-to-Image Generation

Title: iOSPointMapper: RealTime Pedestrian and Accessibility Mapping with Mobile AI

Title: DeFloMat: Detection with Flow Matching for Stable and Efficient Generative Object Localization

Title: EmoCtrl: Controllable Emotional Image Content Generation

Title: Towards Robust Optical-SAR Object Detection under Missing Modalities: A Dynamic Quality-Aware Fusion Framework

Title: Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing

Title: Decomposing Task Vectors for Refined Model Editing

Title: DreamOmni3: Scribble-based Editing and Generation

Title: CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation

Title: Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains

Title: Energy-Guided Flow Matching Enables Few-Step Conformer Generation and Ground-State Identification

Title: PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment

Title: Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Title: Rethinking Memory Design in SAM-Based Visual Object Tracking

Title: Envision: Embodied Visual Planning via Goal-Imagery Video Diffusion

Title: FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution

Title: Scaling Unverifiable Rewards: A Case Study on Visual Insights

Title: Visual Autoregressive Modelling for Monocular Depth Estimation

Title: Quantum Generative Models for Computational Fluid Dynamics: A First Exploration of Latent Space Learning in Lattice Boltzmann Simulations

Title: CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation

Title: Autoregressive Flow Matching for Motion Prediction

Title: SCPainter: A Unified Framework for Realistic 3D Asset Insertion and Novel View Synthesis

Title: GRExplainer: A Universal Explanation Method for Temporal Graph Neural Networks

Title: Plug In, Grade Right: Psychology-Inspired AGIQA

Title: Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

Title: ReDiF: Reinforced Distillation for Few Step Diffusion

Title: EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation

Title: KANO: Kolmogorov-Arnold Neural Operator for Image Super-Resolution

Title: ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

Title: M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Title: Guided Path Sampling: Steering Diffusion Models Back on Track with Principled Path Guidance

Title: JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Title: ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving

Title: Learning Where to Focus: Density-Driven Guidance for Detecting Dense Tiny Objects

Title: FLOW: A Feedback-Driven Synthetic Longitudinal Dataset of Work and Wellbeing

Title: RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance

Title: Reverse Personalization

Title: A Low-Cost UAV Deep Learning Pipeline for Integrated Apple Disease Diagnosis,Freshness Assessment, and Fruit Detection

Title: How Much Data Is Enough? Uniform Convergence Bounds for Generative & Vision-Language Models under Low-Dimensional Structure

Title: PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion

Title: GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation

Title: Task-oriented Learnable Diffusion Timesteps for Universal Few-shot Learning of Dense Tasks

Title: Bridging Your Imagination with Audio-Video Generation via a Unified Director

Title: Anomaly Detection by Effectively Leveraging Synthetic Images

Title: KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Title: RS-Prune: Training-Free Data Pruning at High Ratios for Efficient Remote Sensing Diffusion Foundation Models

Title: ASemConsist: Adaptive Semantic Feature Control for Training-Free Identity-Consistent Generation

Title: Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

Title: On the Inverse Flow Matching Problem in the One-Dimensional and Gaussian Cases

Title: CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation

Title: SpatialMosaic: A Multiview VLM Dataset for Partial Visibility

Title: Post-Training Quantization of OpenPangu Models for Efficient Deployment on Atlas A2

Title: NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization

Title: Diffusion priors enhanced velocity model building from time-lag images using a neural operator

Title: SoulX-LiveTalk Technical Report

Title: Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment

Title: DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

Title: Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision

Title: RealX3D: A Physically-Degraded 3D Benchmark for Multi-view Visual Restoration and Reconstruction

Title: Stochastic Siamese MAE Pretraining for Longitudinal Medical Images

Title: CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models

Title: Automated river gauge plate reading using a hybrid object detection and generative AI framework in the Limpopo River Basin

Title: Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators

Title: HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Title: SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context

Title: IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation

Title: Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution

Title: AnyMS: Bottom-up Attention Decoupling for Layout-guided and Training-free Multi-subject Customization

Title: PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation

Title: ThinkGen: Generalized Thinking for Visual Generation

Title: ProGuard: Towards Proactive Multimodal Safeguard

Title: LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Title: Memorization in 3D Shape Generation: An Empirical Study

Title: OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Title: IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

Title: Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Title: Training AI Co-Scientists Using Rubric Rewards

Title: Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion