2025-12-04

Title: Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL

Title: Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning

Title: Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

Title: Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra

Title: Does Head Pose Correction Improve Biometric Facial Recognition?

Title: SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Title: PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery

Title: Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

Title: Step-by-step Layered Design Generation

Title: HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration

Title: SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation

Title: FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting

Title: MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification

Title: Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation

Title: KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models

Title: GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers

Title: GeoVideo: Introducing Geometric Regularization into Video Generation Model

Title: Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

Title: Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models

Title: Joint Progression Modeling (JPM): A Probabilistic Framework for Mixed-Pathology Progression

Title: Towards Object-centric Understanding for Instructional Videos

Title: CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving

Title: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Title: Adaptive sampling using variational autoencoder and reinforcement learning

Title: OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

Title: Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Title: CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

Title: Towards Irreversible Machine Unlearning for Diffusion Models

Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

Title: Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation

Title: Harnessing Hypergraphs in Geometric Deep Learning for 3D RNA Inverse Folding

Title: LAMP: Language-Assisted Motion Planning for Controllable Video Generation

Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

Title: The promising potential of vision language models for the generation of textual weather forecasts

Title: Dynamically Scaled Activation Steering

Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images

Title: PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention

Title: LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling

Title: CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation

Title: PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation

Title: Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba

Title: Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Title: Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction

Title: UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework

Title: Beyond the Ground Truth: Enhanced Supervision for Image Restoration

Title: Technical Report on Text Dataset Distillation

Title: BlurDM: A Blur Diffusion Model for Image Deblurring

Title: DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment

Title: Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation

Title: C3G: Learning Compact 3D Representations with 2K Gaussians

Title: PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

Title: Fast & Efficient Normalizing Flows and Applications of Image Generative Models

Title: RELIC: Interactive Video World Model with Long-Horizon Memory

Title: MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking

Title: Stable Signer: Hierarchical Sign Language Generative Model

Title: PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

Title: SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows