2025-10-27

Title: Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

Title: Code-enabled language models can outperform reasoning models on diverse tasks

Title: FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction

Title: Do LLMs Truly Understand When a Precedent Is Overruled?

Title: Irish-BLiMP: A Linguistic Benchmark for Evaluating Human and Language Model Performance in a Low-Resource Setting

Title: Can Confidence Estimates Decide When Chain-of-thought is Necessary for Llms?

Title: Input Matters: Evaluating Input Structure's Impact on LLM Summaries of Sports Play-by-Play

Title: Reasoning's Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection

Title: Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization

Title: Bridging Language Gaps with Adaptive RAG: Improving Indonesian Language Question Answering

Title: CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases

Title: Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Title: The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection

Title: Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications

Title: Social Simulations with Large Language Model Risk Utopian Illusion

Title: Estonian Native Large Language Model Benchmark

Title: DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services

Title: Correlation Dimension of Auto-Regressive Large Language Models

Title: Sparser Block-Sparse Attention via Token Permutation

Title: PARL: Prompt-based Agents for Reinforcement Learning

Title: Efficient semantic uncertainty quantification in language models via diversity-steered sampling

Title: Typoglycemia under the Hood: Investigating Language Models' Understanding of Scrambled Words

Title: TripTide: A Benchmark for Adaptive Travel Planning under Disruptions

Title: Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning

Title: SindBERT, the Sailor: Charting the Seas of Turkish NLP

Title: Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings

Title: Redefining Retrieval Evaluation in the Era of LLMs

Title: REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring

Title: MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Title: Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models

Title: InterpDetect: Interpretable Signals for Detecting Hallucinations in Retrieval-Augmented Generation

Title: Are the LLMs Capable of Maintaining at Least the Language Genus?

Title: From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene

Title: RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models

Title: The Universal Landscape of Human Reasoning