2025-10-10

Title: Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

Title: Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data

Title: LASER: An LLM-based ASR Scoring and Evaluation Rubric

Title: Populism Meets AI: Advancing Populism Research with LLMs

Title: MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori Inference

Title: AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding

Title: Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics

Title: Can Speech LLMs Think while Listening?

Title: When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Title: OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs

Title: Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices

Title: IASC: Interactive Agentic System for ConLangs

Title: Vocabulary embeddings organize linguistic structure early in language model training

Title: Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation

Title: Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models

Title: Banking Done Right: Redefining Retail Banking with Language-Centric AI

Title: OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

Title: Textual Entailment and Token Probability as Bias Evaluation Metrics

Title: Stress-Testing Model Specs Reveals Character Differences among Language Models

Title: Large Language Models Meet Virtual Cell: A Survey

Title: MemWeaver: A Hierarchical Memory from Textual Interactive Behaviors for Personalized Generation

Title: SUBQRAG: sub-question driven dynamic graph rag

Title: Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing

Title: ToolExpander: Extending the Frontiers of Tool-Using Reinforcement Learning to Weak LLMs

Title: OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

Title: Parallel Test-Time Scaling for Latent Reasoning Models

Title: Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Title: ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning

Title: Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Title: The Unintended Trade-off of AI Alignment:Balancing Hallucination Mitigation and Safety in LLMs

Title: Drift No More? Context Equilibria in Multi-Turn LLM Interactions

Title: RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model

Title: LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

Title: HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Title: Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models

Title: AdaSwitch: Adaptive Switching Generation for Knowledge Distillation

Title: Ready to Translate, Not to Represent? Bias and Performance Gaps in Multilingual LLMs Across Language Families and Domains

Title: Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking

Title: CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching

Title: Contrastive Weak-to-strong Generalization

Title: Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models

Title: ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

Title: Towards Human-Like Grading: A Unified LLM-Enhanced Framework for Subjective Question Evaluation

Title: STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

Title: Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation

Title: Vision-Enabled LLMs in Historical Lexicography: Digitising and Enriching Estonian-German Dictionaries from the 17th and 18th Centuries

Title: A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Title: LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Title: Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning

Title: Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge

Title: Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Title: ChatGPT as a Translation Engine: A Case Study on Japanese-English

Title: Climate Knowledge in Large Language Models

Title: A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Title: Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility

Title: The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language Models

Title: Lossless Vocabulary Reduction for Auto-Regressive Language Models

Title: Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing

Title: Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations

Title: Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling

Title: AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents

Title: DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations

Title: Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs

Title: METRICALARGS: A Taxonomy for Studying Metrical Poetry with LLMs

Title: Training-Free Group Relative Policy Optimization

Title: Memory Retrieval and Consolidation in Large Language Models through Function Tokens

Title: LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions

Title: SenWave: A Fine-Grained Multi-Language Sentiment Analysis Dataset Sourced from COVID-19 Tweets

Title: The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Title: Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling

Title: Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Title: Neuron-Level Analysis of Cultural Understanding in Large Language Models

Title: AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Title: Two-Stage Voting for Robust and Efficient Suicide Risk Detection on Social Media

Title: On the Relationship Between the Choice of Representation and In-Context Learning

Title: If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models

Title: Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT

Title: DeepPrune: Parallel Scaling without Inter-trace Redundancy

Title: Neologism Learning for Controllability and Self-Verbalization

Title: Efficient Prompt Optimisation for Legal Text Classification with Proxy Prompt Evaluator

Title: Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Title: CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Title: ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation