2025-07-18

Title: MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Title: Assay2Mol: large language model-based drug design using BioAssay context

Title: Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Title: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

Title: AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation

Title: Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Title: World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving

Title: AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

Title: Local Representative Token Guided Merging for Text-to-Image Generation

Title: A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models

Title: ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion

Title: Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition

Title: Large Language Models' Internal Perception of Symbolic Music

Title: AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Title: Generalist Bimanual Manipulation via Foundation Video Diffusion Models

Title: Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models

Title: DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Title: LoViC: Efficient Long Video Generation with Context Compression

Title: FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Title: RGB Pre-Training Enhanced Unobservable Feature Latent Diffusion Model for Spectral Reconstruction

Title: A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints

Title: Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization

Title: Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers

Title: Label-Consistent Dataset Distillation with Detector-Guided Refinement

Title: DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model

Title: Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction

Title: R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning

Title: 3DKeyAD: High-Resolution 3D Point Cloud Anomaly Detection via Keypoint-Guided Point Clustering

Title: A Computational Framework to Identify Self-Aspects in Text

Title: NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation

Title: Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation

Title: DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model

Title: MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling

Title: Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection

Title: Leveraging Pre-Trained Visual Models for AI-Generated Video Detection

Title: VITA: Vision-to-Action Flow Matching Policy

Title: Enhancing Cross-task Transfer of Large Language Models via Activation Steering

Title: DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation

Title: FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization

Title: Taming Diffusion Transformer for Real-Time Mobile Video Generation

Title: Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Title: Hierarchical Rectified Flow Matching with Mini-Batch Couplings