2026-02-24

Title: ReportLogic: Evaluating Logical Quality in Deep Research Reports

Title: ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

Title: INSURE-Dial: A Phase-Aware Conversational Dataset \& Benchmark for Compliance Verification and Phase Detection

Title: Prompt Optimization Via Diffusion Language Models

Title: Asymptotic Semantic Collapse in Hierarchical Optimization

Title: Luna-2: Scalable Single-Token Evaluation with Small Language Models

Title: DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

Title: PolyFrame at MWE-2026 AdMIRe 2: When Words Are Not Enough: Multimodal Idiom Disambiguation

Title: Contradiction to Consensus: Dual Perspective, Multi Source Retrieval Based Claim Verification with Source Level Disagreement using LLM

Title: ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

Title: Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem

Title: ArabicNumBench: Evaluating Arabic Number Reading in Large Language Models

Title: BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models

Title: Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models

Title: EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation

Title: DeepInnovator: Triggering the Innovative Capabilities of LLMs

Title: Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning

Title: Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation

Title: Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks

Title: Uncovering Context Reliance in Unstructured Knowledge Editing

Title: IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

Title: Do LLMs and VLMs Share Neurons for Inference? Evidence and Mechanisms of Cross-Modal Transfer

Title: Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

Title: Astra: Activation-Space Tail-Eigenvector Low-Rank Adaptation of Large Language Models

Title: How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

Title: AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG

Title: A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

Title: Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMs

Title: Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content

Title: Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

Title: Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering

Title: Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

Title: PerSoMed: A Large-Scale Balanced Dataset for Persian Social Media Text Classification

Title: Personalized Prediction of Perceived Message Effectiveness Using Large Language Model Based Digital Twins

Title: Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

Title: How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Title: Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining

Title: Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering

Title: Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning

Title: KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge

Title: SAMAS: A Spectrum-Guided Multi-Agent System for Achieving Style Fidelity in Literary Translation

Title: SHIELD: Semantic Heterogeneity Integrated Embedding for Latent Discovery in Clinical Trial Safety Signals

Title: Janus-Q: End-to-End Event-Driven Trading via Hierarchical-Gated Reward Modeling

Title: Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Title: Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Title: ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting

Title: gencat: Generative computerized adaptive testing

Title: AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization

Title: Position: General Alignment Has Hit a Ceiling; Edge Alignment Must Be Taken Seriously

Title: Entropy in Large Language Models

Title: Multilingual Large Language Models do not comprehend all natural languages to equal degrees

Title: How Retrieved Context Shapes Internal Representations in RAG

Title: BabyLM Turns 4: Call for Papers for the 2026 BabyLM Workshop

Title: NanoKnow: How to Know What Your Language Model Knows

Title: To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

Title: KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration