2026-02-04

Title: The Hypocrisy Gap: Quantifying Divergence Between Internal Belief and Chain-of-Thought Explanation via Sparse Autoencoders

Title: STEMVerse: A Dual-Axis Diagnostic Framework for STEM Reasoning in Large Language Models

Title: Test-Time Detoxification without Training or Learning Anything

Title: ROSA-Tuning: Enhancing Long-Context Modeling via Suffix Matching

Title: Graph-Augmented Reasoning with Large Language Models for Tobacco Pest and Disease Management

Title: WideSeek: Advancing Wide Research via Multi-Agent Scaling

Title: Monotonicity as an Architectural Bias for Robust Language Models

Title: InfMem: Learning System-2 Memory Control for Long-Context Agent

Title: Predicting first-episode homelessness among US Veterans using longitudinal EHR data: time-varying models and social risk factors

Title: From Task Solving to Robust Real-World Adaptation in LLM Agents

Title: AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic

Title: R2-Router: A New Paradigm for LLM Routing with Reasoning

Title: CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment

Title: Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication

Title: Which course? Discourse! Teaching Discourse and Generation in the Era of LLMs

Title: HALT: Hallucination Assessment via Log-probs as Time series

Title: Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness

Title: Where Norms and References Collide: Evaluating LLMs on Normative Reasoning

Title: CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Title: LatentMem: Customizing Latent Memory for Multi-Agent Systems

Title: SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression

Title: ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution

Title: AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback

Title: Test-time Recursive Thinking: Self-Improvement without External Feedback

Title: Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision

Title: The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language Models

Title: ChemPro: A Progressive Chemistry Benchmark for Large Language Models

Title: One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence

Title: FASA: Frequency-aware Sparse Attention

Title: Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch

Title: ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution

Title: Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Title: ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs

Title: POP: Prefill-Only Pruning for Efficient Large Model Inference

Title: MIRROR: A Multi-Agent Framework with Iterative Adaptive Revision and Hierarchical Retrieval for Optimization Modeling in Operations Research

Title: Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Title: PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

Title: Pursuing Best Industrial Practices for Retrieval-Augmented Generation in the Medical Domain

Title: Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

Title: Verified Critical Step Optimization for LLM Agents

Title: FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

Title: A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Title: Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish

Title: Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM Reasoning

Title: Learning to Reason Faithfully through Step-Level Faithfulness Maximization

Title: Can Large Language Models Generalize Procedures Across Representations?

Title: SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue

Title: Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models

Title: Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs

Title: $V_0$: A Generalist Value Model for Any Policy at State Zero

Title: CL-bench: A Benchmark for Context Learning

Title: Controlling Output Rankings in Generative Engines for LLM-based Search

Title: Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Title: BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish

Title: TRE: Encouraging Exploration in the Trust Region

Title: RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish

Title: Instruction Anchors: Dissecting the Causal Dynamics of Modality Arbitration

Title: Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation

Title: OCRTurk: A Comprehensive OCR Benchmark for Turkish

Title: Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language Models

Title: OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question Answering

Title: Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States

Title: No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Title: Training Multi-Turn Search Agent via Contrastive Dynamic Branch Sampling

Title: CUBO: Self-Contained Retrieval-Augmented Generation on Consumer Laptops 10 GB Corpora, 16 GB RAM, Single-Device Deployment

Title: Context Compression via Explicit Information Transmission

Title: They Said Memes Were Harmless-We Found the Ones That Hurt: Decoding Jokes, Symbols, and Cultural References

Title: Accelerating Scientific Research with Gemini: Case Studies and Common Techniques