2025-06-02

Title: Meaning Is Not A Metric: Using LLMs to make cultural context legible at scale

Title: Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework

Title: Conversational Exploration of Literature Landscape with LitChat

Title: Rethinking the Understanding Ability across LLMs through Mutual Information

Title: R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

Title: Emergent LLM behaviors are observationally equivalent to data leakage

Title: My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals

Title: Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

Title: MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Title: Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies

Title: MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation

Title: DLP: Dynamic Layerwise Pruning in Large Language Models

Title: DenseLoRA: Dense Low-Rank Adaptation of Large Language Models

Title: LLM-Driven E-Commerce Marketing Content Optimization: Balancing Creativity and Conversion

Title: MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

Title: LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions

Title: Aligning LLMs by Predicting Preferences from User Writing Samples

Title: A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs

Title: Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks

Title: Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction

Title: RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery

Title: Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation

Title: ValueSim: Generating Backstories to Model Individual Value Systems

Title: BiasFilter: An Inference-Time Debiasing Framework for Large Language Models

Title: EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

Title: ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage

Title: Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective

Title: Say What You Mean: Natural Language Access Control with Large Language Models for Internet of Things

Title: Large Language Models Often Know When They Are Being Evaluated

Title: CoMaPOI: A Collaborative Multi-Agent Framework for Next POI Prediction Bridging the Gap Between Trajectory and Language

Title: Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities

Title: Measuring Sycophancy of Language Models in Multi-turn Dialogues

Title: Document Valuation in LLM Summaries: A Cluster Shapley Approach

Title: Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks

Title: Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation

Title: Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs

Title: Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations

Title: Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models

Title: ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Title: Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Title: Revisiting Uncertainty Estimation and Calibration of Large Language Models

Title: OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities

Title: Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation

Title: One Task Vector is not Enough: A Large-Scale Study for In-Context Learning

Title: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation

Title: Probing Association Biases in LLM Moderation Over-Sensitivity

Title: ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents

Title: SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

Title: Retrieval Augmented Generation based Large Language Models for Causality Mining

Title: A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

Title: FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Title: Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs

Title: Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws

Title: Large Language Model Meets Constraint Propagation

Title: BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System

Title: Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 Russian Invasion of Ukraine

Title: MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

Title: TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine

Title: HardTests: Synthesizing High-Quality Test Cases for LLM Coding

Title: Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning

Title: The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It

Title: R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration

Title: CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer

Title: Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability

Title: LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing

Title: Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Title: Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation

Title: Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models

Title: CLaSp: In-Context Layer Skip for Self-Speculative Decoding

Title: Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling

Title: Semi-structured LLM Reasoners Can Be Rigorously Audited

Title: Automated Structured Radiology Report Generation

Title: Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

Title: Advantageous Parameter Expansion Training Makes Better Large Language Models

Title: Mamba Knockout for Unraveling Factual Information Flow

Title: Proactive Guidance of Multi-Turn Conversation in Industrial Search

Title: Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games

Title: Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation

Title: Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations

Title: ScienceMeter: Tracking Scientific Knowledge Updates in Language Models

Title: HiCaM: A Hierarchical-Causal Modification Framework for Long-Form Text Modification

Title: Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents

Title: Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning

Title: Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

Title: Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

Title: Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research

Title: Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

Title: LLM Inference Enhanced by External Knowledge: A Survey

Title: ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

Title: LLMs Are Globally Multilingual Yet Locally Monolingual: Exploring Knowledge Transfer via Language and Thought Theory

Title: MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs

Title: Model Unlearning via Sparse Autoencoder Subspace Guided Projections

Title: Exploring the Impact of Occupational Personas on Domain-Specific QA

Title: When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

Title: CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Title: VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation

Title: TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Title: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Title: DEEPQUESTION: Systematic Generation of Real-World Challenges for Evaluating LLMs Performance

Title: Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections

Title: Localizing Persona Representations in LLMs

Title: Cross-Attention Speculative Decoding

Title: A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

Title: CREFT: Sequential Multi-Agent LLM for Character Relation Extraction

Title: Bench4KE: Benchmarking Automated Competency Question Generation

Title: NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization

Title: When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation

Title: Harnessing Large Language Models for Scientific Novelty Detection

Title: Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX

Title: Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization

Title: The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models

Title: Disentangling Language and Culture for Evaluating Multilingual Large Language Models

Title: Efficient Text Encoders for Labor Market Analysis

Title: Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching

Title: Multiple LLM Agents Debate for Equitable Cultural Alignment

Title: TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

Title: A Simple Linear Patch Revives Layer-Pruned Large Language Models

Title: Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Title: Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration

Title: BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization

Title: Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios

Title: Multi-Domain ABSA Conversation Dataset Generation via LLMs for Real-World Evaluation and Model Comparison

Title: HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America

Title: FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

Title: Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Title: Circuit Stability Characterizes Language Model Generalization

Title: LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews

Title: From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning

Title: Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?

Title: Drop Dropout on Single-Epoch Language Model Pretraining

Title: Guiding Generative Storytelling with Knowledge Graphs

Title: LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text

Title: Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs

Title: How much do language models memorize?

Title: MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs

Title: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models