2025-02-27

Title: MixLLM: Dynamic Routing in Mixed Large Language Models

Title: FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models

Title: Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Title: Chain of Draft: Thinking Faster by Writing Less

Title: Steered Generation via Gradient Descent on Sparse Features

Title: Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources

Title: Enhancing Text Classification with a Novel Multi-Agent Collaboration Framework Leveraging BERT

Title: Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

Title: MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Title: Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science

Title: Automatic Prompt Optimization via Heuristic Search: A Survey

Title: Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance

Title: Active Few-Shot Learning for Text Classification

Title: Seeing the Forest for the Trees: A Large Scale, Continuously Updating Meta-Analysis of Frontier LLMs

Title: Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs

Title: ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions

Title: Language Models Grow Less Humanlike beyond Phase Transition

Title: Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models

Title: Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection

Title: Sliding Window Attention Training for Efficient Large Language Models

Title: A Causal Lens for Evaluating Faithfulness Metrics

Title: Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework

Title: Learning to Generate Structured Output with Schema Reinforcement Learning

Title: On Pruning State-Space LLMs

Title: From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Title: END: Early Noise Dropping for Efficient and Effective Context Denoising

Title: Kanana: Compute-efficient Bilingual Language Models

Title: JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models

Title: MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors

Title: Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles

Title: Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning

Title: PEToolLLM: Towards Personalized Tool Learning in Large Language Models

Title: GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation

Title: MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering

Title: Binary Neural Networks for Large Language Model: A Survey

Title: MathClean: A Benchmark for Synthetic Mathematical Data Cleaning

Title: Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique

Title: Improving the quality of Web-mined Parallel Corpora of Low-Resource Languages using Debiasing Heuristics

Title: Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs

Title: LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm

Title: Evaluating Gender Bias in German Machine Translation

Title: Conformal Linguistic Calibration: Trading-off between Factuality and Specificity

Title: Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement

Title: Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Title: When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning

Title: Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models

Title: TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency

Title: MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis

Title: BIG-Bench Extra Hard

Title: LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts

Title: FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Title: Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Title: Negation-Induced Forgetting in LLMs

Title: Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time

Title: Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Title: Disentangled VAD Representations via a Variational Framework for Political Stance Detection

Title: CritiQ: Mining Data Quality Criteria from Human Preferences

Title: Shh, don't say that! Domain Certification in LLMs

Title: Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Title: Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets

Title: Controlled Diversity: Length-optimized Natural Language Generation

Title: Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Title: DataMan: Data Manager for Pre-training Large Language Models

Title: Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs

Title: The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Title: Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing