2025-07-18

Title: MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Title: Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

Title: Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Title: Safeguarding Federated Learning-based Road Condition Classification

Title: Assay2Mol: large language model-based drug design using BioAssay context

Title: Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Title: MS-DGCNN++: A Multi-Scale Fusion Dynamic Graph Neural Network with Biological Knowledge Integration for LiDAR Tree Species Classification

Title: Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning

Title: BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training

Title: Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection

Title: Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Title: Federated Learning in Open- and Closed-Loop EMG Decoding: A Privacy and Performance Perspective

Title: Improving physics-informed neural network extrapolation via transfer learning and adaptive activation functions

Title: Integrated Oculomics and Lipidomics Reveal Microvascular Metabolic Signatures Associated with Cardiovascular Health in a Healthy Cohort

Title: The first open machine translation system for the Chechen language

Title: FORTRESS: Function-composition Optimized Real-Time Resilient Structural Segmentation via Kolmogorov-Arnold Enhanced Spatial Attention Networks

Title: Improving Drug Identification in Overdose Death Surveillance using Large Language Models

Title: AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis

Title: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

Title: AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation

Title: From SGD to Spectra: A Theory of Neural Network Weight Dynamics

Title: A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique

Title: Strategy Adaptation in Large Language Model Werewolf Agents

Title: Transformer-based Spatial Grounding: A Comprehensive Survey

Title: Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Title: Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation

Title: HairShifter: Consistent and High-Fidelity Video Hair Transfer via Anchor-Guided Animation

Title: Unified Medical Image Segmentation with State Space Modeling Snake

Title: Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation

Title: World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving

Title: Continuous Marine Tracking via Autonomous UAV Handoff

Title: Synergy: End-to-end Concept Model

Title: Local Representative Token Guided Merging for Text-to-Image Generation

Title: A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models

Title: Compact Vision Transformer by Reduction of Kernel Complexity

Title: Learning Robust Negation Text Representations

Title: Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises

Title: DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment

Title: FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction

Title: ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion

Title: PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database

Title: Large Language Models' Internal Perception of Symbolic Music

Title: RONOM: Reduced-Order Neural Operator Modeling

Title: From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Title: MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval

Title: FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval

Title: Feature-Enhanced TResNet for Fine-Grained Food Image Classification

Title: Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent?

Title: SEMT: Static-Expansion-Mesh Transformer Network Architecture for Remote Sensing Image Captioning

Title: Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations

Title: Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)

Title: SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

Title: WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding

Title: An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System

Title: HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation

Title: From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation

Title: LanePerf: a Performance Estimation Framework for Lane Detection

Title: Generalist Bimanual Manipulation via Foundation Video Diffusion Models

Title: Federated Learning for Commercial Image Sources

Title: Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services

Title: Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI

Title: Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models

Title: Architectural Backdoors in Deep Learning: A Survey of Vulnerabilities, Detection, and Defense

Title: DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Title: Enterprise Security Incident Analysis and Countermeasures Based on the T-Mobile Data Breach

Title: A Deep-Learning Framework for Land-Sliding Classification from Remote Sensing Image

Title: Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Title: Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications

Title: Probabilistic Soundness Guarantees in LLM Reasoning Chains

Title: Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Title: LoViC: Efficient Long Video Generation with Context Compression

Title: cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration

Title: FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Title: A Spectral Interpretation of Redundancy in a Graph Reservoir

Title: RGB Pre-Training Enhanced Unobservable Feature Latent Diffusion Model for Spectral Reconstruction

Title: WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring

Title: A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints

Title: FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient

Title: Teach Old SAEs New Domain Tricks with Boosting

Title: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

Title: Measuring CEX-DEX Extracted Value and Searcher Profitability: The Darkest of the MEV Dark Forest

Title: From Paranoia to Compliance: The Bumpy Road of System Hardening Practices on Stack Exchange

Title: Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery

Title: MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems

Title: Backscattering-Based Security in Wireless Power Transfer Applied to Battery-Free BLE Sensors

Title: The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting

Title: Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection

Title: Label-Consistent Dataset Distillation with Detector-Guided Refinement

Title: Formalizing Attack Scenario Description: A Proposed Model

Title: DASViT: Differentiable Architecture Search for Vision Transformer

Title: Channel-wise Motion Features for Efficient Motion Segmentation

Title: Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection

Title: DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model

Title: GLAD: Generalizable Tuning for Vision-Language Models

Title: MUPAX: Multidimensional Problem Agnostic eXplainable AI

Title: Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction

Title: R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning

Title: A Computational Framework to Identify Self-Aspects in Text

Title: NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation

Title: Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation

Title: DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model

Title: SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models

Title: Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Title: Prompt Injection 2.0: Hybrid AI Threats

Title: Automatically assessing oral narratives of Afrikaans and isiXhosa children

Title: MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling

Title: Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection

Title: Leveraging Pre-Trained Visual Models for AI-Generated Video Detection

Title: $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

Title: VITA: Vision-to-Action Flow Matching Policy

Title: Enhancing Cross-task Transfer of Large Language Models via Activation Steering

Title: HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models

Title: Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets

Title: Automating Steering for Safe Multimodal Large Language Models

Title: Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Title: Merge Kernel for Bayesian Optimization on Permutation Space

Title: Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management

Title: Multi-Agent Synergy-Driven Iterative Visual Narrative Synthesis

Title: DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation

Title: AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Title: Boosting Team Modeling through Tempo-Relational Representation Learning

Title: FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization

Title: A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models

Title: Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark

Title: GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

Title: The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Title: A Survey of Context Engineering for Large Language Models

Title: Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes

Title: Training Transformers with Enforced Lipschitz Constants

Title: Taming Diffusion Transformer for Real-Time Mobile Video Generation

Title: Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Title: $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

Title: Hierarchical Rectified Flow Matching with Mini-Batch Couplings

Title: VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding