2025-10-09

Title: OpenStaxQA: A multilingual dataset based on open-source college textbooks

Title: Knowledge Graph-Guided Multi-Agent Distillation for Reliable Industrial Question Answering with Datasets

Title: Transparent Reference-free Automated Evaluation of Open-Ended User Survey Responses

Title: CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning

Title: TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

Title: Scalable multilingual PII annotation for responsible AI in LLMs

Title: Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians

Title: A Comprehensive Survey of Hallucination in Large Language Models: Causes, Detection, and Mitigation

Title: Language models for longitudinal analysis of abusive content in Billboard Music Charts

Title: Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"

Title: LLM Bias Detection and Mitigation through the Lens of Desired Distributions

Title: EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preference

Title: EverydayMMQA: A Multilingual and Multimodal Framework for Culturally Grounded Spoken Visual QA

Title: Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language

Title: Protecting De-identified Documents from Search-based Linkage Attacks

Title: Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

Title: Instructional Goal-Aligned Question Generation for Student Evaluation in Virtual Lab Settings: How Closely Do LLMs Actually Align?

Title: FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Title: MathRobust-LV: Evaluation of Large Language Models' Robustness to Linguistic Variations in Mathematical Reasoning

Title: A Survey on Agentic Security: Applications, Threats and Defenses

Title: Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Title: From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining

Title: Flipping the Dialogue: Training and Evaluating User Language Models

Title: The Algebra of Meaning: Why Machines Need Montague More Than Moore's Law

Title: TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents

Title: Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?

Title: Aligning Large Language Models via Fully Self-Synthetic Data

Title: ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory

Title: PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch

Title: Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback

Title: Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks

Title: How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects

Title: Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management

Title: PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Title: Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Title: AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Title: TWIST: Training-free and Label-free Short Text Clustering through Iterative Vector Updating with LLMs

Title: Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs

Title: Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

Title: Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness

Title: FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Title: Overview of the Plagiarism Detection Task at PAN 2025

Title: BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

Title: Adaptive Tool Generation with Models as Tools and Reinforcement Learning

Title: Mid-Training of Large Language Models: A Survey

Title: SID: Multi-LLM Debate Driven by Self Signals

Title: OpenJAI-v1.0: An Open Thai Large Language Model

Title: Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding

Title: $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences

Title: MeXtract: Light-Weight Metadata Extraction from Scientific Papers

Title: LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

Title: SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Title: Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Title: EDUMATH: Generating Standards-aligned Educational Math Word Problems

Title: Probing Social Identity Bias in Chinese LLMs with Gendered Pronouns and Social Groups

Title: Towards Reliable Retrieval in RAG Systems for Large Legal Datasets

Title: Pragyaan: Designing and Curating High-Quality Cultural Post-Training Datasets for Indian Languages

Title: Native Hybrid Attention for Efficient Sequence Modeling

Title: Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge

Title: Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models

Title: Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Title: Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages

Title: LuxInstruct: A Cross-Lingual Instruction Tuning Dataset For Luxembourgish

Title: Accelerating Diffusion LLM Inference via Local Determinism Propagation

Title: All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations

Title: Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis

Title: TALENT: Table VQA via Augmented Language-Enhanced Natural-text Transcription

Title: Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning

Title: TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning

Title: Comparing human and language models sentence processing difficulties on complex structures

Title: Reasoning for Hierarchical Text Classification: The Case of Patents

Title: More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

Title: NurseLLM: The First Specialized Language Model for Nursing

Title: Quantifying Data Contamination in Psychometric Evaluations of LLMs

Title: CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models

Title: Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible

Title: Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models

Title: Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models

Title: Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation

Title: Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

Title: Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships

Title: LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Title: When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality Evaluation

Title: Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

Title: Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Title: LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation

Title: Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

Title: Online Rubrics Elicitation from Pairwise Comparisons

Title: On the Convergence of Moral Self-Correction in Large Language Models

Title: Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

Title: Vibe Checker: Aligning Code Evaluation with Human Preference