2026-03-10

Title: Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

Title: Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation

Title: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

Title: Rethinking Personalization in Large Language Models at the Token Level

Title: "Dark Triad" Model Organisms of Misalignment: Narrow Fine-Tuning Mirrors Human Antisocial Behavior

Title: Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records

Title: MedInjection-FR: Exploring the Role of Native, Synthetic, and Translated Data in Biomedical Instruction Tuning

Title: Language Shapes Mental Health Evaluations in Large Language Models

Title: A Dynamic Self-Evolving Extraction System

Title: Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping

Title: Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks

Title: Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues

Title: A Systematic Investigation of Document Chunking Strategies and Embedding Sensitivity

Title: Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models

Title: AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge

Title: Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment

Title: Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision

Title: Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

Title: Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing

Title: Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin

Title: How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection

Title: RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts

Title: Position: LLMs Must Use Functor-Based and RAG-Driven Bias Mitigation for Fairness

Title: Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios

Title: Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

Title: Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning

Title: The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

Title: Cross-Modal Taxonomic Generalization in (Vision-) Language Models

Title: Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

Title: TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

Title: MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

Title: StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Title: KohakuRAG: A simple RAG framework with hierarchical document indexing

Title: Whitening Reveals Cluster Commitment as the Geometric Separator of Hallucination Types

Title: QuadAI at SemEval-2026 Task 3: Ensemble Learning of Hybrid RoBERTa and LLMs for Dimensional Aspect-Based Sentiment Analysis

Title: Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

Title: Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

Title: Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation

Title: AI Steerability 360: A Toolkit for Steering Large Language Models

Title: An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data

Title: What Do AI Agents Talk About? Emergent Communication Structure in the First AI-Only Social Network

Title: CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

Title: BRIDGE: Benchmark for multi-hop Reasoning In long multimodal Documents with Grounded Evidence

Title: SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

Title: ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments

Title: DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Title: High-Fidelity Pruning for Large Language Models

Title: Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

Title: EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

Title: Gradually Excavating External Knowledge for Implicit Complex Question Answering

Title: Gender Bias in MT for a Genderless Language: New Benchmarks for Basque

Title: RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs

Title: Is continuous CoT better suited for multi-lingual reasoning?

Title: TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

Title: Sensivity of LLMs' Explanations to the Training Randomness:Context, Class & Task Dependencies

Title: Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Title: NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating

Title: How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

Title: AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Title: Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

Title: Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

Title: LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Title: SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

Title: Do Language Models Know Theo Has a Wife? Investigating the Proviso Problem

Title: Adaptive Loops and Memory in Transformers: Think Harder or Know More?

Title: COACH meets QUORUM: A Framework and Pipeline for Aligning User, Expert and Developer Perspectives in LLM-generated Health Counselling

Title: Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

Title: Aligning to Illusions: Choice Blindness in Human and AI Feedback

Title: One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States

Title: A Dataset for Probing Translationese Preferences in English-to-Swedish Translation

Title: Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA