2025-10-17

Title: From Explainability to Action: A Generative Operational Framework for Integrating XAI in Clinical Mental Health Screening

Title: A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Title: Users as Annotators: LLM Preference Learning from Comparison Mode

Title: Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference

Title: ConDABench: Interactive Evaluation of Language Models for Data Analysis

Title: SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models

Title: Meronymic Ontology Extraction via Large Language Models

Title: ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Title: Serialized EHR make for good text representations

Title: DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models

Title: On-device System of Compositional Multi-tasking in Large Language Models

Title: Language steering in latent space to mitigate unintended code-switching

Title: Revisiting the UID Hypothesis in LLM Reasoning Traces

Title: EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing

Title: ConsistencyAI: A Benchmark to Assess LLMs' Factual Consistency When Responding to Different Demographic Groups

Title: BenchPress: A Human-in-the-Loop Annotation System for Rapid Text-to-SQL Benchmark Curation

Title: Harnessing Consistency for Robust Test-Time LLM Ensemble

Title: Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA

Title: ShishuLM: Lightweight Language Model with Hybrid Decoder-MLP Architecture and Paired Weight Sharing

Title: Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues

Title: Unlocking the Potential of Diffusion Language Models through Template Infilling

Title: What Layers When: Learning to Skip Compute in LLMs with Residual Gates

Title: TextBandit: Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks

Title: Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

Title: PAGE: Prompt Augmentation for text Generation Enhancement

Title: Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation

Title: Order from Chaos: Comparative Study of Ten Leading LLMs on Unstructured Data Categorization

Title: Reliable Fine-Grained Evaluation of Natural Language Math Proofs

Title: A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

Title: The Harder The Better: Maintaining Supervised Fine-tuning Generalization with Less but Harder Data

Title: Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

Title: Attribution Quality in AI-Generated Content:Benchmarking Style Embeddings and LLM Judges

Title: Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

Title: RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

Title: Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory

Title: Schema for In-Context Learning

Title: LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Title: Interpreting the Latent Structure of Operator Precedence in Language Models

Title: Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning

Title: RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems

Title: AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

Title: Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms

Title: Readability $\ne$ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Title: Element2Vec: Build Chemical Element Representation from Text for Property Prediction

Title: Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling

Title: FACTS: Table Summarization via Offline Template Generation with Agentic Workflows

Title: An LLM-Powered AI Agent Framework for Holistic IoT Traffic Interpretation

Title: BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs

Title: LLMs Can Get "Brain Rot"!

Title: Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

Title: Big Reasoning with Small Models: Instruction Retrieval at Inference Time

Title: FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis

Title: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Title: Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Title: Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems

Title: The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Title: CRaFT: An Explanation-Based Framework for Evaluating Cultural Reasoning in Multilingual Language Models

Title: Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping Games

Title: ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

Title: Toward Cybersecurity-Expert Small Language Models

Title: RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following

Title: DPRF: A Generalizable Dynamic Persona Refinement Framework for Optimizing Behavior Alignment Between Personalized LLM Role-Playing Agents and Humans

Title: LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Title: Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs

Title: MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Title: Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

Title: Less is More: Denoising Knowledge Graphs For Retrieval Augmented Generation

Title: Qwen3Guard Technical Report

Title: PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering

Title: Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL

Title: Constraint-Driven Small Language Models Based on Agent and OpenAlex Knowledge Graph: Mining Conceptual Pathways and Discovering Innovation Points in Academic Papers

Title: MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning

Title: MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

Title: Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL

Title: Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts

Title: CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering

Title: On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?

Title: From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program

Title: PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

Title: Suicidal Comment Tree Dataset: Enhancing Risk Assessment and Prediction Through Contextual Analysis

Title: Your Next Token Prediction: A Multilingual Benchmark for Personalized Response Generation

Title: MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering

Title: Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Title: Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

Title: Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents

Title: LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models

Title: Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

Title: Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Title: Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models

Title: RLAIF-SPA: Optimizing LLM-based Emotional Speech Synthesis via RLAIF

Title: Intent Clustering with Shared Pseudo-Labels

Title: An Efficient Rubric-based Generative Verifier for Search-Augmented LLMs

Title: Speculative Model Risk in Healthcare AI: Using Storytelling to Surface Unintended Harms

Title: AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Title: Pluto: A Benchmark for Evaluating Efficiency of LLM-generated Hardware Code

Title: COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

Title: Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning

Title: Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking

Title: Midtraining Bridges Pretraining and Posttraining Distributions

Title: Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation

Title: Predicting Task Performance with Context-aware Scaling Laws

Title: AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations

Title: LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Title: MetaBench: A Multi-task Benchmark for Assessing LLMs in Metabolomics

Title: DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

Title: Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Title: LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Title: TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Title: Attention Is All You Need for KV Cache in Diffusion LLMs