2025-07-04

Title: McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models

Title: Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization

Title: Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Title: GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons

Title: MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Title: DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

Title: Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models

Title: Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection

Title: IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders

Title: WebSailor: Navigating Super-human Reasoning for Web Agent

Title: Revisiting Active Learning under (Human) Label Variation

Title: MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

Title: Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

Title: Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Title: Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models

Title: Multimodal Mathematical Reasoning with Diverse Solving Perspective

Title: SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

Title: Generalizing Verifiable Instruction Following

Title: LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

Title: MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Title: Answer Matching Outperforms Multiple Choice for Language Model Evaluation