2025-09-30

Title: Are you sure? Measuring models bias in content moderation through uncertainty

Title: AccessEval: Benchmarking Disability Bias in Large Language Models

Title: RAR$^2$: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval

Title: TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?

Title: Multi-Modal Sentiment Analysis with Dynamic Attention Fusion

Title: Enabling Approximate Joint Sampling in Diffusion LMs

Title: Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models

Title: MIRAGE: Multi-hop Reasoning with Ambiguity Evaluation for Illusory Questions

Title: ML2B: Multi-Lingual ML Benchmark For AutoML

Title: EditGRPO: Reinforcement Learning with Post -Rollout Edits for Clinically Accurate Chest X-Ray Report Generation

Title: Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning

Title: ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

Title: Towards Generalizable Implicit In-Context Learning with Attention Routing

Title: The Bias is in the Details: An Assessment of Cognitive Bias in LLMs

Title: HEART: Emotionally-driven test-time scaling of Language Models

Title: Infusing Theory of Mind into Socially Intelligent LLM Agents

Title: Extract-0: A Specialized Language Model for Document Information Extraction

Title: Large language models management of medications: three performance analyses

Title: LLMs Behind the Scenes: Enabling Narrative Scene Illustration

Title: What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?

Title: Same Content, Different Representations: A Controlled Study for Table QA

Title: ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning

Title: AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts

Title: Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Title: Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate

Title: Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

Title: From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents

Title: The Geometry of Creative Variability: How Credal Sets Expose Calibration Gaps in Language Models

Title: d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching

Title: How to Make Large Language Models Generate 100% Valid Molecules?

Title: Non-Collaborative User Simulators for Tool Agents

Title: Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning

Title: Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models

Title: Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs

Title: Pretraining LLM with Latent Thoughts in Continuous Space

Title: Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Title: From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs

Title: Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2

Title: PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness

Title: A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks

Title: Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models

Title: A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models

Title: Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces

Title: Learning to Reason in Structured In-context Environments with Reinforcement Learning

Title: C-Evolve: Consensus-based Evolution for Prompt Groups

Title: Dual-Space Smoothness for Robust and Balanced LLM Unlearning

Title: MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction

Title: Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Title: CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

Title: Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT

Title: Train Once, Answer All: Many Pretraining Experiments for the Cost of One

Title: No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization

Title: Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation

Title: Comparison of Scoring Rationales Between Large Language Models and Human Raters

Title: Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models

Title: Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models

Title: Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review

Title: The Impact of Role Design in In-Context Learning for Large Language Models

Title: From Human Annotation to Automation: LLM-in-the-Loop Active Learning for Arabic Sentiment Analysis

Title: On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization

Title: Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales

Title: Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks

Title: LLM Hallucination Detection: HSAD

Title: Fast Thinking for Large Language Models

Title: Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models

Title: Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs

Title: Aligning LLMs for Multilingual Consistency in Enterprise Applications

Title: TF-Bench: Evaluating Program Semantics Reasoning with Type Inference in System F

Title: VIVA+: Human-Centered Situational Decision-Making

Title: Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering

Title: Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Title: Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis

Title: Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality

Title: From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization

Title: Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions

Title: Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering

Title: Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

Title: SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models

Title: Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

Title: DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

Title: Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step

Title: Assessing Large Language Models in Updating Their Forecasts with New Information

Title: Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems

Title: Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues

Title: HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs

Title: ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation

Title: Toward Preference-aligned Large Language Models via Residual-based Model Steering

Title: The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact

Title: The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis

Title: MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Title: Sequential Diffusion Language Models

Title: SparseD: Sparse Attention for Diffusion Language Models

Title: Ensembling Multilingual Transformers for Robust Sentiment Analysis of Tweets

Title: Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?

Title: GEAR: A General Evaluation Framework for Abductive Reasoning

Title: BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models

Title: Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Distributional Semantics

Title: Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems

Title: EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos

Title: Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE

Title: Your thoughts tell who you are: Characterize the reasoning patterns of LRMs

Title: Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

Title: Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Title: Retrieval-augmented GUI Agents with Generative Guidelines

Title: Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models

Title: PET: Preference Evolution Tracking with LLM-Generated Explainable Distribution

Title: AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Title: Can Large Language Models Express Uncertainty Like Human?

Title: BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models

Title: ScenarioBench: Trace-Grounded Compliance Evaluation for Text-to-SQL and RAG

Title: MoVa: Towards Generalizable Classification of Human Morals and Values

Title: Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents

Title: Prompt and Parameter Co-Optimization for Large Language Models

Title: MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation

Title: SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Title: Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement

Title: LOGOS: LLM-driven End-to-End Grounded Theory Development and Schema Induction for Qualitative Research

Title: DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

Title: Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Title: Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs

Title: Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey

Title: Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

Title: AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment

Title: Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining

Title: Reinforcement Mid-Training

Title: HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment

Title: LLaDA-MoE: A Sparse MoE Diffusion Language Model

Title: Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling

Title: Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents

Title: CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task

Title: Alternatives To Next Token Prediction In Text Generation - A Survey

Title: Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset

Title: Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models

Title: GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

Title: Knowledge Editing with Subspace-Aware Key-Value Mappings

Title: Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings

Title: AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

Title: Inducing Dyslexia in Vision Language Models

Title: Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research

Title: InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

Title: Understanding the Dilemma of Unlearning for Large Language Models

Title: Reference-Free Rating of LLM Responses via Latent Information

Title: MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

Title: Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution

Title: ProxyAttn: Guided Sparse Attention via Representative Heads

Title: LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space

Title: SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models

Title: Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions

Title: KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning

Title: SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching

Title: Hierarchical Error Correction for Large Language Models: A Systematic Framework for Domain-Specific AI Quality Enhancement

Title: Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

Title: Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning

Title: Expanding Computation Spaces of LLMs at Inference Time

Title: BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications

Title: How Well Do LLMs Imitate Human Writing Style?

Title: MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Title: The Dialogue That Heals: A Comprehensive Evaluation of Doctor Agents' Inquiry Capability

Title: SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems

Title: Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns

Title: Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Title: Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Title: Confidence-Guided Error Correction for Disordered Speech Recognition

Title: Scaling Generalist Data-Analytic Agents

Title: Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs

Title: Towards Personalized Deep Research: Benchmarks and Evaluations

Title: Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?

Title: Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection

Title: Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation

Title: Pretraining Large Language Models with NVFP4

Title: EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Title: NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation

Title: Incentive-Aligned Multi-Source LLM Summaries

Title: Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding

Title: InfoAgent: Advancing Autonomous Information-Seeking Agents