2026-01-22

Title: The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues

Title: Opening the Black Box: A Survey on the Mechanisms of Multi-Step Reasoning in Large Language Models

Title: Hallucination-Free Automatic Question & Answer Generation for Intuitive Learning

Title: RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Title: Project Aletheia: Verifier-Guided Distillation of Backtracking for Small Language Models

Title: Guided by the Plan: Enhancing Faithful Autoregressive Text-to-Audio Generation with Guided Decoding

Title: Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research

Title: Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks

Title: Business Logic-Driven Text-to-SQL Data Synthesis for Business Intelligence

Title: Towards Execution-Grounded Automated AI Research

Title: Self-Blinding and Counterfactual Self-Simulation Mitigate Biases and Sycophancy in Large Language Models

Title: Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Title: Social Caption: Evaluating Social Understanding in Multimodal Models

Title: SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation

Title: Say Anything but This: When Tokenizer Betrays Reasoning in LLMs

Title: AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization

Title: ClaimDB: A Fact Verification Benchmark over Large Structured Data

Title: DARL: Encouraging Diverse Answers for General Reasoning without Verifiers

Title: Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Title: Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Title: RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models

Title: Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max

Title: HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model

Title: Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation

Title: PodBench: A Comprehensive Benchmark for Instruction-Aware Audio-Oriented Podcast Script Generation

Title: CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents

Title: The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations

Title: CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning

Title: A Comprehensive Benchmark of Language Models on Unicode and Romanized Sinhala

Title: Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora

Title: Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction

Title: \textsc{LogicScore}: Fine-grained Logic Evaluation of Conciseness, Completeness, and Determinateness in Attributed Question Answering

Title: Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure

Title: RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)

Title: Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems

Title: The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Title: Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time

Title: Supporting Humans in Evaluating AI Summaries of Legal Depositions

Title: Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Title: Metadata Conditioned Large Language Models for Localization

Title: Taxonomy-Aligned Risk Extraction from 10-K Filings with Autonomous Improvement Using LLMs

Title: The Effect of Scripts and Formats on LLM Numeracy

Title: Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks