2026-01-21

Title: Context Discipline and Performance Correlation: Analyzing LLM Performance and Quality Degradation Under Varying Context Lengths

Title: Compass-Embedding v4: Robust Contrastive Learning for Multilingual E-commerce Embeddings

Title: Measuring Stability Beyond Accuracy in Small Open-Source Medical Large Language Models for Pediatric Endocrinology

Title: An Empirical Analysis of Fine-Tuning Large Language Models on Bioinformatics Literature: PRSGPT and BioStarsGPT

Title: Concept Attractors in LLMs and their Applications

Title: LimAgents: Multi-Agent LLMs for Generating Research Limitations

Title: Bielik 11B v3: Multilingual Large Language Model for European Languages

Title: Speculative Decoding: Performance or Illusion?

Title: Entropic Context Shaping: Information-Theoretic Filtering for Context-Aware LLM Agents

Title: Towards AGI A Pragmatic Approach Towards Self Evolving Agent

Title: RAC: Retrieval-Augmented Clarification for Faithful Conversational Search

Title: Bridging Human Interpretation and Machine Representation: A Landscape of Qualitative Data Analysis in the LLM Era

Title: LIME-LLM: Probing Models with Fluent Counterfactuals, Not Broken Text

Title: Industry-Aligned Granular Topic Modeling

Title: Cleansing the Artificial Mind: A Self-Reflective Detoxification Framework for Large Language Models

Title: Translation as a Scalable Proxy for Multilingual Evaluation

Title: Beyond Tokens: Concept-Level Training Objectives for LLMs

Title: ATOD: An Evaluation Framework and Benchmark for Agentic Task-Oriented Dialogue System

Title: CTPD: Cross Tokenizer Preference Distillation

Title: Advances in LLM Reasoning Enable Flexibility in Clinical Problem-Solving

Title: Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence

Title: PPA-Plan: Proactive Pitfall Avoidance for Reliable Planning in Long-Context LLM Reasoning

Title: LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding

Title: Enhancing LLM-Based Data Annotation with Error Decomposition

Title: Event Detection with a Context-Aware Encoder and LoRA for Improved Performance on Long-Tailed Classes

Title: Double-Calibration: Towards Trustworthy LLMs via Calibrating Knowledge and Reasoning Confidence

Title: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

Title: $\texttt{MemoryRewardBench}$: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Title: Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning

Title: Preserving Fairness and Safety in Quantized LLMs Through Critical Weight Protection

Title: Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

Title: Codebook-Injected Dialogue Segmentation for Multi-Utterance Constructs Annotation: LLM-Assisted and Gold-Label-Free Evaluation

Title: To Copy or Not to Copy: Copying Is Easier to Induce Than Recall

Title: Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Title: Large language models struggle with ethnographic text annotation

Title: Powerful Training-Free Membership Inference Against Autoregressive Language Models

Title: Bengali Text Classification: An Evaluation of Large Language Model Approaches

Title: Analyzing Cancer Patients' Experiences with Embedding-based Topic Modeling and LLMs

Title: Tolerance Principle and Small Language Model Learning

Title: Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models

Title: Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

Title: Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models

Title: Conversational Context Classification: A Representation Engineering Approach

Title: Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

Title: A Scalable Entity-Based Framework for Auditing Bias in LLMs

Title: LR-DWM: Efficient Watermarking for Diffusion Language Models

Title: NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages

Title: Legal experts disagree with rationale extraction techniques for explaining ECtHR case outcome classification

Title: System-Mediated Attention Imbalances Make Vision-Language Models Say Yes

Title: Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping

Title: Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty

Title: DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity

Title: Benchmarking Concept-Spilling Across Languages in LLMs

Title: Evaluating Contextually Mediated Factual Recall in Multilingual Large Language Models

Title: A Cloud-based Multi-Agentic Workflow for Science

Title: Disagreement as Data: Reasoning Trace Analytics in Multi-Agent Systems

Title: BioPulse-QA: A Dynamic Biomedical Question-Answering Benchmark for Evaluating Factuality, Robustness, and Bias in Large Language Models

Title: Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Title: Intelligent Documentation in Medical Education: Can AI Replace Manual Case Logging?

Title: Augmenting Question Answering with A Hybrid RAG Approach

Title: UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages

Title: A Two-Stage GPU Kernel Tuner Combining Semantic Refactoring and Search-Based Optimization

Title: A Shared Geometry of Difficulty in Multilingual Language Models

Title: Towards Robust Process Reward Modeling via Noise-aware Learning

Title: VISPA: Pluralistic Alignment via Automatic Value Selection and Activation

Title: Who Does This Name Remind You of? Nationality Prediction via Large Language Model Associative Memory

Title: Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?

Title: Multimodal Multi-Agent Empowered Legal Judgment Prediction

Title: Race, Ethnicity and Their Implication on Bias in Large Language Models

Title: From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation

Title: Gated Differentiable Working Memory for Long-Context Language Modeling

Title: SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Title: Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs

Title: A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits

Title: Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Title: Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios

Title: The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Title: ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Title: Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models

Title: Bi-Attention HateXplain : Taking into account the sequential aspect of data during explainability in a multi-task context

Title: Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses

Title: Profiling German Text Simplification with Interpretable Model-Fingerprints

Title: Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs

Title: Leveraging Lora Fine-Tuning and Knowledge Bases for Construction Identification

Title: CORE-T: COherent REtrieval of Tables for Text-to-SQL

Title: Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning

Title: Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains

Title: Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference

Title: Medical Triage as Pairwise Ranking: A Benchmark for Urgency in Patient Portal Messages

Title: OpenExempt: A Diagnostic Benchmark for Legal Reasoning and a Framework for Creating Custom Benchmarks on Demand

Title: Beyond Single-shot Writing: Deep Research Agents are Unreliable at Multi-turn Report Revision

Title: Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation

Title: Aligning Agentic World Models via Knowledgeable Experience Learning

Title: Beyond Cosine Similarity: Taming Semantic Drift and Antonym Intrusion in a 15-Million Node Turkish Synonym Graph

Title: Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models

Title: Unlearning in LLMs: Methods, Evaluation, and Open Challenges

Title: A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Title: OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference

Title: Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme Modeling of Climate Discourse

Title: RegCheck: A tool for automating comparisons between study registrations and papers

Title: LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction

Title: Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

Title: Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models

Title: Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning

Title: Structured Insight from Unstructured Data: Large Language Models for SDOH-Driven Diabetes Risk Prediction

Title: Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

Title: Trust Me, I'm an Expert: Decoding and Steering Authority Bias in Large Language Models

Title: MOSLD-Bench: Multilingual Open-Set Learning and Discovery Benchmark for Text Categorization

Title: PhysicsSolutionAgent: Towards Multimodal Explanations for Numerical Physics Problem Solving

Title: Anonpsy: A Graph-Based Framework for Structure-Preserving De-identification of Psychiatric Narratives

Title: When Wording Steers the Evaluation: Framing Bias in LLM judges

Title: Comparing Without Saying: A Dataset and Benchmark for Implicit Comparative Opinion Mining from Same-User Reviews

Title: TREX: Tokenizer Regression for Optimal Data Mixture

Title: Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Title: CauScientist: Teaching LLMs to Respect Data for Causal Discovery

Title: Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models

Title: Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge

Title: Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation

Title: CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks

Title: HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference

Title: Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning

Title: Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning

Title: GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Title: Simulated Ignorance Fails: A Systematic Study of LLM Behaviors on Forecasting Problems Before Model Knowledge Cutoff

Title: OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Title: On Temperature-Constrained Non-Deterministic Machine Translation: Potential and Evaluation

Title: Towards robust long-context understanding of large language model via active recap learning

Title: Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues

Title: Pro-AI Bias in Large Language Models

Title: Knowledge Graph-Assisted LLM Post-Training for Enhanced Legal Reasoning

Title: FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Title: Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education

Title: OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models

Title: Confident Rankings with Fewer Items: Adaptive LLM Evaluation with Continuous Scores

Title: AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Title: HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs

Title: Automatic Prompt Optimization for Dataset-Level Feature Discovery

Title: "The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework

Title: From Tags to Trees: Structuring Fine-Grained Knowledge for Controllable Data Selection in LLM Instruction Tuning

Title: Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Title: BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models

Title: RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

Title: Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants

Title: PRiSM: Benchmarking Phone Realization in Speech Models

Title: Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

Title: Kakugo: Distillation of Low-Resource Languages into Small Language Models

Title: XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs

Title: NewsRECON: News article REtrieval for image CONtextualization

Title: A Systematic Analysis of Chunking Strategies for Reliable Question Answering

Title: Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic

Title: Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Title: Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law

Title: Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum

Title: HALT: Hallucination Assessment via Latent Testing

Title: MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems

Title: APEX-Agents

Title: Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment