2026-01-07

Title: WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables

Title: PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models

Title: ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

Title: LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference

Title: Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency

Title: DataParasite Enables Scalable and Repurposable Online Data Curation

Title: Reconstructing Item Characteristic Curves using Fine-Tuned Large Language Models

Title: FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions

Title: Scalable Construction of a Lung Cancer Knowledge Base: Profiling Semantic Reasoning in LLMs

Title: Improved Evidence Extraction for Document Inconsistency Detection with LLMs

Title: Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Title: When Do Tools and Planning Help LLMs Think? A Cost- and Latency-Aware Benchmark

Title: Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking

Title: Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Title: Extracting books from production language models

Title: Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Title: EvoRoute: Experience-Driven Self-Routing LLM Agent Systems

Title: Mitigating Prompt-Induced Hallucinations in Large Language Models via Structured Reasoning

Title: SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation

Title: Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Title: EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce

Title: MiMo-V2-Flash Technical Report

Title: Punctuation-aware Hybrid Trainable Sparse Attention for Large Language Models

Title: The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture

Title: TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents

Title: To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs

Title: Training Language Models with homotokens Leads to Delayed Overfitting

Title: LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark

Title: Revisiting Data Compression with Language Modeling

Title: Beyond the Black Box: Theory and Mechanism of Large Language Models

Title: Image, Word and Thought: A More Challenging Language Task for the Iterated Learning Model

Title: RAL2M: Retrieval Augmented Learning-To-Match Against Hallucination in Compliance-Guaranteed Service Systems

Title: Memorization, Emergence, and Explaining Reversal Failures: A Controlled Study of Relational Semantics in LLMs

Title: Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion

Title: LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation

Title: Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

Title: Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning

Title: Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders

Title: Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

Title: Stable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented Generation

Title: Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Title: SentGraph: Hierarchical Sentence Graph for Multi-hop Retrieval-Augmented Question Answering

Title: MMFormalizer: Multimodal Autoformalization in the Wild

Title: Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis

Title: MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models

Title: LittiChoQA: Literary Texts in Indic Languages Chosen for Question Answering

Title: Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Title: NorwAI's Large Language Models: Technical Report

Title: BaseCal: Unsupervised Confidence Calibration via Base Model Signals

Title: Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage

Title: Temporal Graph Network: Hallucination Detection in Multi-Turn Conversation

Title: Detecting Hallucinations in Retrieval-Augmented Generation via Semantic-level Internal Reasoning Graph

Title: Do LLMs Encode Functional Importance of Reasoning Tokens?

Title: Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models

Title: Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs

Title: Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs

Title: Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models

Title: ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation

Title: The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Title: Self-Verification is All You Need To Pass The Japanese Bar Examination

Title: Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Title: WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning

Title: Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning

Title: MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Title: X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Title: DIP: Dynamic In-Context Planner For Diffusion Language Models

Title: UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward

Title: MalruleLib: Large-Scale Executable Misconception Reasoning with Step Traces for Modeling Student Thinking in Mathematics

Title: Multi-RADS Synthetic Radiology Report Dataset and Head-to-Head Benchmarking of 41 Open-Weight and Proprietary Language Models

Title: STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning

Title: Automated Semantic Rules Detection (ASRD) for Emergent Communication Interpretation