2025-12-22

Title: A Women's Health Benchmark for Large Language Models

Title: Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Title: XLM: A Python package for non-autoregressive language models

Title: Perturb Your Data: Paraphrase-Guided Training Data Watermarking

Title: When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation

Title: Data Augmentation Supporting a Conversational Agent Designed for Smoking Cessation Support Groups

Title: Enhancing Long Document Long Form Summarisation with Self-Planning

Title: Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Title: Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

Title: Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Title: AutoMetrics: Approximate Human Judgements with Automatically Generated Evaluators

Title: Subjective Question Generation and Answer Evaluation using NLP

Title: Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models

Title: Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Title: UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Title: Are Vision Language Models Cross-Cultural Theory of Mind Reasoners?

Title: Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection

Title: Linear Personality Probing and Steering in LLMs: A Big Five Study

Title: Toward Ethical AI Through Bayesian Uncertainty in Neural Question Answering

Title: When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Title: AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora

Title: DEER: A Comprehensive and Reliable Benchmark for Deep-Research Expert Reports

Title: ShareChat: A Dataset of Chatbot Conversations in the Wild