2026-02-06

Title: BioACE: An Automated Framework for Biomedical Answer and Citation Evaluations

Title: CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System

Title: Capacity Constraints and the Multilingual Penalty for Lexical Disambiguation

Title: Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Title: Data Kernel Perspective Space Performance Guarantees for Synthetic Data from Transformer Models

Title: GreekMMLU: A Native-Sourced Multitask Benchmark for Evaluating Language Models in Greek

Title: Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems

Title: The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems

Title: Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky

Title: Aligning Large Language Model Behavior with Human Citation Preferences

Title: FedMosaic: Federated Retrieval-Augmented Generation via Parametric Adapters

Title: Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

Title: CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

Title: Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Title: Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Title: MentorCollab: Selective Large-to-Small Inference-Time Guidance for Efficient Reasoning

Title: How Do Language Models Acquire Character-Level Information?

Title: PACE: Defying the Scaling Hypothesis of Exploration in Iterative Alignment for Mathematical Reasoning

Title: Cross-Lingual Empirical Evaluation of Large Language Models for Arabic Medical Tasks

Title: IESR:Efficient MCTS-Based Modular Reasoning for Text-to-SQL with Large Language Models

Title: Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances

Title: Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Title: OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Title: Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models

Title: Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Title: Structured Context Engineering for File-Native Agentic Systems: Evaluating Schema Accuracy, Format Effectiveness, and Multi-File Navigation at Scale

Title: LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation

Title: Transport and Merge: Cross-Architecture Merging for Large Language Models

Title: A Human-in-the-Loop, LLM-Centered Architecture for Knowledge-Graph Question Answering

Title: Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Title: CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Title: MedErrBench: A Fine-Grained Multilingual Benchmark for Medical Error Detection and Correction with Clinical Expert Annotations

Title: Consensus-Aligned Neuron Efficient Fine-Tuning Large Language Models for Multi-Domain Machine Translation

Title: CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering

Title: LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards

Title: Different Time, Different Language: Revisiting the Bias Against Non-Native Speakers in GPT Detectors

Title: Reinforcement World Model Learning for LLM-based Agents

Title: OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Title: RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference

Title: xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection

Title: EuroLLM-22B: Technical Report

Title: Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Title: Codified Finite-state Machines for Role-playing

Title: KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs

Title: Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions

Title: DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

Title: A Systematic Evaluation of Large Language Models for PTSD Severity Estimation: The Role of Contextual Knowledge and Modeling Strategies

Title: Multi-Token Prediction via Self-Distillation

Title: Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Title: DFlash: Block Diffusion for Flash Speculative Decoding