2025-10-01

Title: Cyclic Ablation: Testing Concept Localization against Functional Regeneration in AI

Title: From Internal Representations to Text Quality: A Geometric Approach to LLM Evaluation

Title: Generative Value Conflicts Reveal LLM Priorities

Title: From Faithfulness to Correctness: Generative Reward Models that Think Critically

Title: SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA

Title: The Rise of AfricaNLP: Contributions, Contributors, and Community Impact (2005-2025)

Title: Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries

Title: MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Title: Calibrating Verbalized Confidence with Self-Generated Distractors

Title: Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning

Title: Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model

Title: Probing the Limits of Stylistic Alignment in Vision-Language Models

Title: RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance

Title: Transformers through the lens of support-preserving maps between measures

Title: The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale

Title: QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs

Title: Mitigating Biases in Language Models via Bias Unlearning

Title: LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts

Title: Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities

Title: CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling

Title: Think Less, Label Better: Multi-Stage Domain-Grounded Synthetic Data Generation for Fine-Tuning Large Language Models in Telecommunications

Title: TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Title: Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches

Title: RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models

Title: Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer

Title: Believing without Seeing: Quality Scores for Contextualizing Vision-Language Model Explanations

Title: ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

Title: RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

Title: PerQ: Efficient Evaluation of Multilingual Text Personalization Quality

Title: Mem-α: Learning Memory Construction via Reinforcement Learning

Title: Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel

Title: Bringing Emerging Architectures to Sequence Labeling in NLP

Title: Reliability Crisis of Reference-free Metrics for Grammatical Error Correction

Title: RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation

Title: RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation

Title: Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning

Title: RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

Title: DyFlow: Dynamic Workflow Framework for Agentic Reasoning

Title: The Silent Judge: Unacknowledged Shortcut Bias in LLM-as-a-Judge

Title: Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis

Title: IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

Title: Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts

Title: End-to-End Aspect-Guided Review Summarization at Scale

Title: Vocabulary Customization for Efficient Domain-Specific LLM Deployment

Title: The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems

Title: CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models

Title: MGen: Millions of Naturally Occurring Generics in Context

Title: Explaining novel senses using definition generation with open language models

Title: VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text

Title: Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models

Title: Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing

Title: Optimizing Speech Language Models for Acoustic Consistency

Title: QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization

Title: Feedback Forensics: A Toolkit to Measure AI Personality

Title: One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Title: Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts

Title: Fast-dLLM v2: Efficient Block-Diffusion LLM

Title: Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning

Title: Automatic Fact-checking in English and Telugu

Title: Text-Based Approaches to Item Alignment to Content Standards in Large-Scale Reading & Writing Tests

Title: Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search

Title: CreAgentive: An Agent Workflow Driven Multi-Category Creative Generation Engine

Title: Regression Language Models for Code

Title: dParallel: Learnable Parallel Decoding for dLLMs

Title: VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Title: BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Title: Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization

Title: OceanGym: A Benchmark Environment for Underwater Embodied Agents

Title: Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling

Title: Generating Difficult-to-Translate Texts

Title: Deconstructing Self-Bias in LLM-generated Translation Benchmarks

Title: MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages

Title: Scaling Spoken Language Models with Syllabic Speech Tokenization

Title: Convergence and Divergence of Language Models under Different Random Seeds