2026-03-12

Title: GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

Title: Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?

Title: AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic

Title: The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

Title: Quantifying Hallucinations in Language Language Models on Medical Textbooks

Title: Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Title: Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations

Title: The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models

Title: A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Title: PoultryLeX-Net: Domain-Adaptive Dual-Stream Transformer Architecture for Large-Scale Poultry Stakeholder Modeling

Title: TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

Title: CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models

Title: Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Title: Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality

Title: There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

Title: Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations

Title: Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English

Title: Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

Title: Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

Title: SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

Title: Probing the Limits of the Lie Detector Approach to LLM Deception

Title: Fine-Tune, Don't Prompt, Your Language Model to Identify Biased Language in Clinical Notes

Title: SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

Title: Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language

Title: Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs

Title: Measuring and Eliminating Refusals in Military Large Language Models

Title: A Principle-Driven Adaptive Policy for Group Cognitive Stimulation Dialogue for Elderly with Cognitive Impairment

Title: TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records

Title: The Generation-Recognition Asymmetry: Six Dimensions of a Fundamental Divide in Formal Language Theory

Title: Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation

Title: Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Title: OpenClaw-RL: Train Any Agent Simply by Talking

Title: Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

Title: Sabiá-4 Technical Report

Title: S-GRADES -- Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

Title: GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning

Title: Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas

Title: Large language models can disambiguate opioid slang on social media

Title: Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck

Title: Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

Title: Aligning Large Language Models with Searcher Preferences

Title: Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs

Title: PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

Title: Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent

Title: VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

Title: Safe and Scalable Web Agent Learning via Recreated Websites

Title: AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

Title: Automatic End-to-End Data Integration using Large Language Models

Title: End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering

Title: Disentangling Similarity and Relatedness in Topic Models

Title: Making Bielik LLM Reason (Better): A Field Report

Title: Prism-$Δ$: Differential Subspace Steering for Prompt Highlighting in Large Language Models

Title: HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in Cardiology

Title: mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

Title: Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness

Title: Large Language Models as Annotators for Machine Translation Quality Estimation

Title: Interpretable Chinese Metaphor Identification via LLM-Assisted MIPVU Rule Script Generation: A Comparative Protocol Study

Title: PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Title: An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

Title: From Images to Words: Efficient Cross-Modal Knowledge Distillation to Language Models from Black-box Teachers

Title: LLM2Vec-Gen: Generative Embeddings from Large Language Models

Title: Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge

Title: Instruction set for the representation of graphs