2025-06-05

Title: Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems

Title: A conclusive remark on linguistic theorizing and language modeling

Title: FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes

Title: HyperSteer: Activation Steering at Scale with Hypernetworks

Title: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Title: From Instructions to ODRL Usage Policies: An Ontology Guided Approach

Title: Hopscotch: Discovering and Skipping Redundancies in Language Models

Title: Ask a Local: Detecting Hallucinations With Specialized Model Divergence

Title: A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation

Title: Trajectory Prediction Meets Large Language Models: A Survey

Title: DistRAG: Towards Distance-Based Spatial Reasoning in LLMs

Title: Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models

Title: Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer's Disease Detection

Title: APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training

Title: EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding

Title: Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing

Title: Measuring Human Involvement in AI-Generated Text: A Case Study on Academic Writing

Title: Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information

Title: TokAlign: Efficient Vocabulary Adaptation via Token Alignment

Title: Seed-Coder: Let the Code Model Curate Data for Itself

Title: Go-Browse: Training Web Agents with Structured Exploration

Title: Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

Title: BPO: Revisiting Preference Modeling in Direct Preference Optimization

Title: ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch

Title: POSS: Position Specialist Generates Better Draft for Speculative Decoding

Title: MiMo-VL Technical Report

Title: FreePRM: Training Process Reward Models Without Ground Truth Process Labels

Title: Exchange of Perspective Prompting Enhances Reasoning in Large Language Models

Title: KG-BiLM: Knowledge Graph Embedding via Bidirectional Language Models

Title: Automatically Suggesting Diverse Example Sentences for L2 Japanese Learners Using Pre-Trained Language Models

Title: From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models

Title: Auto prompt sql: a resource-efficient architecture for text-to-sql translation in constrained environments

Title: Learning to Insert [PAUSE] Tokens for Better Reasoning

Title: Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales

Title: Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks

Title: RewardAnything: Generalizable Principle-Following Reward Models

Title: Trustworthy Medical Question Answering: An Evaluation-Centric Survey

Title: Robust Preference Optimization via Dynamic Target Margins

Title: AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

Title: ScoreRAG: A Retrieval-Augmented Generation Framework with Consistency-Relevance Scoring and Structured Summarization for News Generation

Title: Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision

Title: Act-as-Pet: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services

Title: AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models

Title: ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations

Title: Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Title: Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Title: Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts

Title: PulseReddit: A Novel Reddit Dataset for Benchmarking MAS in High-Frequency Cryptocurrency Trading

Title: EuroGEST: Investigating gender stereotypes in multilingual language models

Title: RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing

Title: Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Title: Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems

Title: HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

Title: More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning

Title: TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering

Title: From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding

Title: Structured Pruning for Diverse Best-of-N Reasoning Optimization

Title: Around the World in 24 Hours: Probing LLM Knowledge of Time and Place

Title: Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

Title: DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding

Title: Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era

Title: QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering

Title: AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data

Title: LexTime: A Benchmark for Temporal Ordering of Legal Events

Title: Unveiling and Eliminating the Shortcut Learning for Locate-Then-Edit Knowledge Editing via Both Subject and Relation Awareness

Title: Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

Title: Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs

Title: On Support Samples of Next Word Prediction

Title: Explainability-Based Token Replacement on LLM-Generated Text

Title: High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Title: Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning

Title: LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward

Title: Controlling Difficulty of Generated Text for AI-Assisted Language Learning

Title: A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions

Title: LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Title: EuroLLM-9B: Technical Report

Title: TextAtari: 100K Frames Game Playing with Language Agents

Title: Rectified Sparse Attention

Title: CLAIM: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues

Title: Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?

Title: Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis

Title: A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Title: SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling

Title: SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Title: Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models

Title: R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

Title: Efficient Knowledge Editing via Minimal Precomputation