2025-05-30

Title: Training Language Models to Generate Quality Code with Program Analysis Feedback

Title: Climate Finance Bench

Title: Pre-Training Curriculum for Multi-Token Prediction in Language Models

Title: StressTest: Can YOUR Speech LM Handle the Stress?

Title: Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems

Title: MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators

Title: Can Large Language Models Match the Conclusions of Systematic Reviews?

Title: First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay

Title: Self-Critique and Refinement for Faithful Natural Language Explanations

Title: What Has Been Lost with Synthetic Evaluation?

Title: Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation

Title: GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative Classification

Title: When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

Title: VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models

Title: Talent or Luck? Evaluating Attribution Bias in Large Language Models

Title: ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Title: Structured Memory Mechanisms for Stable Context Representation in Large Language Models

Title: Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

Title: WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning

Title: Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates

Title: OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

Title: NegVQA: Can Vision Language Models Understand Negation?

Title: StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs

Title: LLMs for Argument Mining: Detection, Extraction, and Relationship Classification of pre-defined Arguments in Online Comments

Title: LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements

Title: ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Title: Exploring Scaling Laws for EHR Foundation Models

Title: Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation

Title: DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors

Title: A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs

Title: Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models

Title: Context Robust Knowledge Editing for Language Models

Title: Machine-Facing English: Defining a Hybrid Register Shaped by Human-AI Discourse

Title: Improving Multilingual Social Media Insights: Aspect-based Comment Analysis

Title: EL4NER: Ensemble Learning for Named Entity Recognition via Multiple Small-Parameter Large Language Models

Title: Query Routing for Retrieval-Augmented Language Models

Title: Self-Correcting Code Generation Using Small Language Models

Title: SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services

Title: Generating Diverse Training Samples for Relation Extraction with Large Language Models

Title: Dataset Cartography for Large Language Model Alignment: Mapping and Diagnosing Preference Data

Title: ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations

Title: PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics

Title: Enhancing Large Language Models'Machine Translation via Dynamic Focus Anchoring

Title: Cross-Domain Bilingual Lexicon Induction via Pretrained Language Models

Title: Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

Title: Map&Make: Schema Guided Text to Table Generation

Title: Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification

Title: Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Title: Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration

Title: ExpeTrans: LLMs Are Experiential Transfer Learners

Title: MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration

Title: MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration

Title: ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering

Title: The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

Title: Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Title: ScEdit: Script-based Assessment of Knowledge Editing

Title: How Does Response Length Affect Long-Form Factuality

Title: EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Title: Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Title: Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs

Title: Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

Title: Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors

Title: Discriminative Policy Optimization for Token-Level Reward Models

Title: Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Title: Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models

Title: From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs

Title: UAQFact: Evaluating Factual Knowledge Utilization of LLMs on Unanswerable Questions

Title: Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons

Title: Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt

Title: Spoken Language Modeling with Duration-Penalized Self-Supervised Units

Title: Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Title: Probability-Consistent Preference Optimization for Enhanced LLM Reasoning

Title: Translation in the Wild

Title: Understanding Refusal in Language Models with Sparse Autoencoders

Title: Evaluating AI capabilities in detecting conspiracy theories on YouTube

Title: Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

Title: Table-R1: Inference-Time Scaling for Table Reasoning

Title: Characterizing the Expressivity of Transformer Language Models

Title: AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

Title: GeNRe: A French Gender-Neutral Rewriting System Using Collective Nouns

Title: Are Reasoning Models More Prone to Hallucination?

Title: ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs

Title: Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation

Title: ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Title: LoLA: Low-Rank Linear Attention With Sparse Caching

Title: Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models

Title: Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

Title: SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

Title: Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models

Title: Label-Guided In-Context Learning for Named Entity Recognition

Title: ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Title: Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Title: ATLAS: Learning to Optimally Memorize the Context at Test Time

Title: DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Title: Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Title: From Chat Logs to Collective Insights: Aggregative Question Answering