2025-06-30

Title: Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning

Title: GraphLAMA: Enabling Efficient Adaptation of Graph Language Models with Limited Annotations

Title: FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction

Title: VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

Title: Evaluation of LLM-based Strategies for the Extraction of Food Product Information from Online Shops

Title: Does Multimodality Lead to Better Time Series Forecasting?

Title: Performance of diverse evaluation metrics in NLP-based assessment and text generation of consumer complaints

Title: TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

Title: Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Title: $\textrm{ODE}_t \left(\textrm{ODE}_l \right)$: Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling

Title: Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration

Title: Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning

Title: Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis

Title: M3PO: Massively Multi-Task Model-Based Policy Optimization

Title: Evaluating List Construction and Temporal Understanding capabilities of Large Language Models

Title: Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data

Title: Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models

Title: TaleForge: Interactive Multimodal System for Personalized Story Creation

Title: PrefPaint: Enhancing Image Inpainting through Expert Human Feedback

Title: ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts

Title: PARSI: Persian Authorship Recognition via Stylometric Integration

Title: 3D-Telepathy: Reconstructing 3D Objects from EEG Signals

Title: Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation

Title: SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space

Title: On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling

Title: TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments

Title: Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applications

Title: Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

Title: Exploring Semantic Masked Autoencoder for Self-supervised Point Cloud Understanding

Title: Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism

Title: TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models

Title: SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model

Title: RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation

Title: Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion

Title: UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting

Title: Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field

Title: MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation

Title: Reasoning in machine vision: learning to think fast and slow

Title: Tied Prototype Model for Few-Shot Medical Image Segmentation

Title: RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models

Title: Leveraging In-Context Learning for Political Bias Testing of LLMs

Title: OutDreamer: Video Outpainting with a Diffusion Transformer

Title: Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling

Title: Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment

Title: HyperCLOVA X THINK Technical Report

Title: Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy

Title: MiCo: Multi-image Contrast for Reinforcement Visual Reasoning