2025-05-06

Title: Multi-party Collaborative Attention Control for Image Customization

Title: Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models

Title: Global Stress Generation and Spatiotemporal Super-Resolution Physics-Informed Operator under Dynamic Loading for Two-Phase Random Materials

Title: OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

Title: COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Title: VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

Title: Explainable Machine Learning for Cyberattack Identification from Traffic Flows

Title: WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Title: The DCR Delusion: Measuring the Privacy Risk of Synthetic Data

Title: Contextures: Representations from Contexts

Title: A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning

Title: PainFormer: a Vision Foundation Model for Automatic Pain Assessment

Title: Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

Title: Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings

Title: Vision and Intention Boost Large Language Model in Long-Term Action Anticipation

Title: PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth

Title: Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

Title: Context-Aware Online Conformal Anomaly Detection with Prediction-Powered Data Acquisition

Title: Distinguishing AI-Generated and Human-Written Text Through Psycholinguistic Analysis

Title: $\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge

Title: Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Title: PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Title: DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion

Title: Rethinking Score Distilling Sampling for 3D Editing and Generation

Title: BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models

Title: MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection

Title: Lifelong Whole Slide Image Analysis: Online Vision-Language Adaptation and Past-to-Present Gradient Distillation

Title: Always Skip Attention

Title: GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning

Title: Secrets of GFlowNets' Learning Behavior: A Theoretical Study

Title: Regression s all you need for medical image translation

Title: What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction

Title: Lightweight Defense Against Adversarial Attacks in Time Series Classification

Title: Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Title: Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving

Title: Sparfels: Fast Reconstruction from Sparse Unposed Imagery

Title: ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

Title: Robust AI-Generated Face Detection with Imbalanced Data

Title: DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Title: Improving Physical Object State Representation in Text-to-Image Generative Systems

Title: Quantizing Diffusion Models from a Sampling-Aware Perspective

Title: Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset

Title: Entropy-Guided Sampling of Flat Modes in Discrete Spaces

Title: Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition

Title: VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection

Title: RM-R1: Reward Modeling as Reasoning

Title: Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Title: T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Title: Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Title: Text to Image Generation and Editing: A Survey

Title: Bielik v3 Small: Technical Report

Title: Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Title: RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet

Title: Mirror Mean-Field Langevin Dynamics

Title: Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models

Title: MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Title: Sim2Real in endoscopy segmentation with a novel structure aware image translation

Title: Multimodal Deep Learning for Stroke Prediction and Detection using Retinal Imaging and Clinical Data

Title: fastabx: A library for efficient computation of ABX discriminability

Title: Using Knowledge Graphs to harvest datasets for efficient CLIP model training

Title: Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models

Title: Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models

Title: Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models

Title: No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves