2026-01-08

Title: DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing

Title: Internal Reasoning vs. External Control: A Thermodynamic Analysis of Sycophancy in Large Language Models

Title: Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models

Title: Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support

Title: OpenAI GPT-5 System Card

Title: WRAVAL -- WRiting Assist eVALuation

Title: The Instruction Gap: LLMs get lost in Following Instruction

Title: Less is more: Not all samples are effective for evaluation

Title: GuardEval: A Multi-Perspective Benchmark for Evaluating Safety, Fairness, and Robustness in LLM Moderators

Title: LLM_annotate: A Python package for annotating and analyzing fiction characters

Title: Topic Segmentation Using Generative Language Models

Title: Bare-Metal Tensor Virtualization: Overcoming the Memory Wall in Edge-AI Inference on ARM64

Title: A path to natural language through tokenisation and transformers

Title: Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

Title: Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation

Title: Rendering Data Unlearnable by Exploiting LLM Alignment Mechanisms

Title: Tigrinya Number Verbalization: Rules, Algorithm, and Implementation

Title: Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models

Title: PCoA: A New Benchmark for Medical Aspect-Based Summarization With Phrase-Level Context Attribution

Title: Training-Free Adaptation of New-Generation LLMs using Legacy Clinical Models

Title: The Critical Role of Aspects in Measuring Document Similarity

Title: Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale

Title: Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Title: Prompting Underestimates LLM Capability for Time Series Classification

Title: EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning

Title: CALM: Culturally Self-Aware Language Models

Title: Submodular Evaluation Subset Selection in Automatic Prompt Optimization

Title: Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning

Title: Reasoning Pattern Alignment Merging for Adaptive Reasoning

Title: IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

Title: Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents

Title: PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

Title: Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach

Title: DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing

Title: Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models

Title: EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory

Title: Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict

Title: Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios

Title: DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Title: How Do Large Language Models Learn Concepts During Continual Pre-Training?

Title: PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

Title: OLA: Output Language Alignment in Code-Switched LLM Interactions

Title: From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs

Title: DiVA: Fine-grained Factuality Verification with Agentic-Discriminative Verifier

Title: Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Title: Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines

Title: Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases

Title: Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning

Title: LLM-MC-Affect: LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight

Title: ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs

Title: SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation

Title: e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings

Title: DisastQA: A Comprehensive Benchmark for Evaluating Question Answering in Disaster Management

Title: NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models

Title: Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis

Title: From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs

Title: Evaluation Framework for AI Creativity: A Case Study Based on Story Generation

Title: RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Title: ADEPT: Adaptive Dynamic Early-Exit Process for Transformers

Title: Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR

Title: MIND: From Passive Mimicry to Active Reasoning through Capability-Aware Multi-Perspective CoT Distillation

Title: Stuttering-Aware Automatic Speech Recognition for Indonesian Language

Title: O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Title: Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

Title: Evaluation of Multilingual LLMs Personalized Text Generation Capabilities Targeting Groups and Social-Media Platforms

Title: Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Title: Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations

Title: HearSay Benchmark: Do Audio LLMs Leak What They Hear?

Title: Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents

Title: Compact Example-Based Explanations for Language Models

Title: NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning

Title: Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework

Title: VietMed-MCQ: A Consistency-Filtered Data Synthesis Framework for Vietnamese Traditional Medicine Evaluation

Title: Where meaning lives: Layer-wise accessibility of psycholinguistic features in encoder and decoder language models

Title: AI Generated Text Detection

Title: Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Title: What Does Loss Optimization Actually Teach, If Anything? Knowledge Dynamics in Continual Pre-training of LLMs

Title: PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media

Title: What Matters For Safety Alignment?

Title: Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Title: Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification

Title: Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval

Title: When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering

Title: Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models

Title: Large-Scale Aspect-Based Sentiment Analysis with Reasoning-Infused LLMs

Title: RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection

Title: Benchmark^2: Systematic Evaluation of LLM Benchmarks

Title: VotIE: Information Extraction from Meeting Minutes

Title: Simulated Students in Tutoring Dialogues: Substance or Illusion?

Title: SpeakerSleuth: Evaluating Large Audio-Language Models as Judges for Multi-turn Speaker Consistency

Title: Analyzing and Improving Cross-lingual Knowledge Transfer for Machine Translation

Title: When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life

Title: Modular Prompt Optimization: Optimizing Structured Prompts with Section-Local Textual Gradients

Title: Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion

Title: KDCM: Reducing Hallucination in LLMs through Explicit Reasoning Structures

Title: SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks

Title: Layer-wise Positional Bias in Short-Context Language Modeling

Title: InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

Title: ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models

Title: LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation

Title: FLEx: Language Modeling with Few-shot Language Explanations

Title: All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection