2025-10-29

Title: Evaluating Long-Term Memory for Long-Context Question Answering

Title: BitSkip: An Empirical Analysis of Quantization and Early Exit Composition

Title: Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language

Title: How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse

Title: CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection

Title: Temporal Blindness in Multi-Turn LLM Agents: Misaligned Tool Use vs. Human Time Perception

Title: Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs

Title: OraPlan-SQL: A Planning-Centric Framework for Complex Bilingual NL2SQL Reasoning

Title: Language Models for Longitudinal Clinical Prediction

Title: AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages

Title: Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation

Title: Agent-based Automated Claim Matching with Instruction-following LLMs

Title: Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs

Title: Leveraging LLMs for Early Alzheimer's Prediction

Title: Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

Title: M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

Title: PICOs-RAG: PICO-supported Query Rewriting for Retrieval-Augmented Generation in Evidence-Based Medicine

Title: META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine

Title: TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents

Title: Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward

Title: SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs

Title: Pie: A Programmable Serving System for Emerging LLM Applications

Title: Challenging Multilingual LLMs: A New Taxonomy and Benchmark for Unraveling Hallucination in Translation

Title: Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Title: Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

Title: Beyond Line-Level Filtering for the Pretraining Corpora of LLMs

Title: Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean

Title: MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

Title: Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment

Title: HACK: Hallucinations Along Certainty and Knowledge Axes

Title: Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Title: Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations

Title: From Memorization to Reasoning in the Spectrum of Loss Curvature

Title: Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

Title: MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference

Title: Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards

Title: Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning

Title: Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants

Title: LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability

Title: Text Simplification with Sentence Embeddings

Title: SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

Title: LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data

Title: Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content

Title: SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

Title: Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices

Title: Iterative Critique-Refine Framework for Enhancing LLM Personalization

Title: Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems

Title: A word association network methodology for evaluating implicit biases in LLMs compared to humans

Title: CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Title: Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written

Title: Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts

Title: ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?

Title: ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

Title: Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

Title: Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs

Title: Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation

Title: Relative Scaling Laws for LLMs

Title: "Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

Title: OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning

Title: Optimizing Retrieval for RAG via Reinforced Contrastive Learning

Title: Evolving Diagnostic Agents in a Virtual Clinical Environment

Title: InteractComp: Evaluating Search Agents With Ambiguous Queries

Title: Dissecting Role Cognition in Medical LLMs via Neuronal Ablation

Title: Repurposing Synthetic Data for Fine-grained Search Agent Supervision

Title: AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

Title: WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking

Title: ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

Title: AgentFold: Long-Horizon Web Agents with Proactive Context Management

Title: Tongyi DeepResearch Technical Report

Title: Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Title: ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?