2025-06-03

Title: Amadeus-Verbo Technical Report: The powerful Qwen2.5 family models trained in Portuguese

Title: Scaling Physical Reasoning with the PHYSICS Dataset

Title: From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling

Title: Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists

Title: Unraveling SITT: Social Influence Technique Taxonomy and Detection with LLMs

Title: Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling

Title: You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models

Title: Probing Politico-Economic Bias in Multilingual Large Language Models: A Cultural Analysis of Low-Resource Pakistani Languages

Title: Evaluating the Sensitivity of LLMs to Prior Context

Title: Gaussian mixture models as a proxy for interacting language models

Title: COSMIC: Generalized Refusal Direction Identification in LLM Activations

Title: SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset

Title: HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs

Title: Writing-Zero: Bridge the Gap Between Non-verifiable Problems and Verifiable Rewards

Title: Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models

Title: LaMP-QA: A Benchmark for Personalized Long-form Question Answering

Title: Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement

Title: Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences

Title: Structuring Radiology Reports: Challenging LLMs with Lightweight Models

Title: Structure-Aware Fill-in-the-Middle Pretraining for Code

Title: REIC: RAG-Enhanced Intent Classification at Scale

Title: ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering

Title: MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility

Title: PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain

Title: Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race

Title: The Impact of Disability Disclosure on Fairness and Bias in LLM-Driven Candidate Selection

Title: MultiHoax: A Dataset of Multi-hop False-Premise Questions

Title: CASPER: A Large Scale Spontaneous Speech Dataset

Title: Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings

Title: Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation

Title: DLM-One: Diffusion Language Models for One-Step Sequence Generation

Title: Can LLMs Understand Unvoiced Speech? Exploring EMG-to-Text Conversion with LLMs

Title: Lossless Token Sequence Compression via Meta-Tokens

Title: An evaluation of LLMs for generating movie reviews: GPT-4o, Gemini-2.0 and DeepSeek-V3

Title: SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation

Title: TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering

Title: Disentangling Codemixing in Chats: The NUS ABC Codemixed Corpus

Title: Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models

Title: Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs

Title: Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees' Dialogue to Facilitate Nurse Communication Training

Title: SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL

Title: Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively

Title: Scaling Textual Gradients via Sampling-Based Momentum

Title: Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Title: Dual Debiasing for Noisy In-Context Learning for Text Generation

Title: Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions

Title: Inter-Passage Verification for Multi-evidence Multi-answer QA

Title: G2S: A General-to-Specific Learning Framework for Temporal Knowledge Graph Forecasting with Large Language Models

Title: Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization

Title: Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data

Title: EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models

Title: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models

Title: Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection

Title: Exploring In-context Example Generation for Machine Translation

Title: Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems

Title: Evaluating the Evaluation of Diversity in Commonsense Generation

Title: CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention

Title: Retrieval-Augmented Generation Systems for Intellectual Property via Synthetic Multi-Angle Fine-tuning

Title: Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing

Title: ARIA: Training Language Agents with Intention-Driven Reward Aggregation

Title: Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages

Title: AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation

Title: The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation

Title: PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements

Title: Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation

Title: Social Construction of Urban Space: Understanding Neighborhood Boundaries Using Rental Listings

Title: Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics

Title: SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions

Title: GuideX: Guided Synthetic Data Generation for Zero-Shot Information Extraction

Title: Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

Title: SafeTy Reasoning Elicitation Alignment for Multi-Turn Dialogues

Title: DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA

Title: Measuring Faithfulness and Abstention: An Automated Pipeline for Evaluating LLM-Generated 3-ply Case-Based Legal Arguments

Title: Chain-of-Thought Training for Open E2E Spoken Dialogue Systems

Title: Structured Gradient Guidance for Few-Shot Adaptation in Large Language Models

Title: Narrative Media Framing in Political Discourse

Title: DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments

Title: Data Swarms: Optimizable Generation of Synthetic Evaluation Data

Title: Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection

Title: Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations

Title: Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons

Title: Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

Title: Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction via LLMs-as-the-Judge

Title: KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision

Title: Research Borderlands: Analysing Writing Across Research Cultures

Title: RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems

Title: Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering

Title: GuessBench: Sensemaking Multimodal Creativity in the Wild

Title: From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses

Title: One for All: Update Parameterized Knowledge Across Multiple Models

Title: Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks

Title: HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs

Title: COMPKE: Complex Question Answering under Knowledge Editing

Title: Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

Title: EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

Title: How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation

Title: L3Cube-MahaEmotions: A Marathi Emotion Recognition Dataset with Synthetic Annotations using CoTR prompting and Large Language Models

Title: What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning

Title: CC-Tuning: A Cross-Lingual Connection Mechanism for Improving Joint Multilingual Supervised Fine-Tuning

Title: Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning

Title: Improve MLLM Benchmark Efficiency through Interview

Title: Affordance Benchmark for MLLMs

Title: SocialEval: Evaluating Social Intelligence of Large Language Models

Title: Pi-SQL: Enhancing Text-to-SQL with Fine-Grained Guidance from Pivot Programming Languages

Title: How do Transformer Embeddings Represent Compositions? A Functional Analysis

Title: anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding

Title: Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection

Title: From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question Generation

Title: ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness

Title: XGUARD: A Graded Benchmark for Evaluating Safety Failures of Large Language Models on Extremist Content

Title: NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Title: LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World

Title: Do LLMs Understand Why We Write Diaries? A Method for Purpose Extraction and Clustering

Title: Talking to Data: Designing Smart Assistants for Humanities Databases

Title: Less is More: Local Intrinsic Dimensions of Contextual Language Models

Title: Probing Neural Topology of Large Language Models

Title: CHEER-Ekman: Fine-grained Embodied Emotion Classification

Title: SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Title: How Programming Concepts and Neurons Are Shared in Code Language Models

Title: zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Title: Un-considering Contextual Information: Assessing LLMs' Understanding of Indexical Elements

Title: Contextual Candor: Enhancing LLM Trustworthiness Through Hierarchical Unanswerability Detection

Title: From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models

Title: A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition

Title: The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage

Title: LAQuer: Localized Attribution Queries in Content-grounded Generation

Title: Culturally-Grounded Chain-of-Thought (CG-CoT):Enhancing LLM Performance on Culturally-Specific Tasks in Low-Resource Languages

Title: CoBRA: Quantifying Strategic Language Use and LLM Pragmatics

Title: Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures

Title: Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

Title: Mamba Drafters for Speculative Decoding

Title: Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

Title: Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean

Title: ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists

Title: MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine

Title: CoRE: Condition-based Reasoning for Identifying Outcome Variance in Complex Events

Title: DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models

Title: Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis

Title: Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines

Title: Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model

Title: Schema as Parameterized Tools for Universal Information Extraction

Title: VM14K: First Vietnamese Medical Benchmark

Title: A Platform for Investigating Public Health Content with Efficient Concern Classification

Title: Growing Through Experience: Scaling Episodic Grounding in Language Models

Title: Evaluating Large Language Models in Crisis Detection: A Real-World Benchmark from Psychological Support Hotlines

Title: Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models

Title: The Landscape of Arabic Large Language Models (ALLMs): A New Era for Arabic Language Technology

Title: TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models

Title: Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents

Title: The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Title: KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors

Title: MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations

Title: AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time Adaptation

Title: Comparing LLM-generated and human-authored news text using formal syntactic theory

Title: UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment

Title: Self-Refining Language Model Anonymizers via Adversarial Distillation

Title: Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Title: TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge

Title: Integrating Neural and Symbolic Components in a Model of Pragmatic Question-Answering

Title: LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification

Title: Multilingual Definition Modeling

Title: CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models

Title: Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes

Title: FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

Title: V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat

Title: STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework

Title: Dictionaries to the Rescue: Cross-Lingual Vocabulary Transfer for Low-Resource Languages Using Bilingual Dictionaries

Title: Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

Title: Prompt Engineering Large Language Models' Forecasting Capabilities

Title: Unified Large Language Models for Misinformation Detection in Low-Resource Linguistic Settings

Title: Statement-Tuning Enables Efficient Cross-lingual Generalization in Encoder-only Models

Title: IndicRAGSuite: Large-Scale Datasets and a Benchmark for Indian Language RAG Systems

Title: Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Title: Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons

Title: ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

Title: Cross-Lingual Transfer of Cultural Knowledge: An Asymmetric Phenomenon

Title: StochasTok: Improving Fine-Grained Subword Understanding in LLMs

Title: When LLMs Team Up: The Emergence of Collaborative Affective Computing

Title: mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

Title: Fairness Dynamics During Training

Title: Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning

Title: SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Title: Tug-of-war between idiom's figurative and literal meanings in LLMs

Title: Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Title: Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs

Title: Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning

Title: MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation

Title: iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering

Title: Human-Centric Evaluation for Foundation Models

Title: Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books

Title: NAVER LABS Europe Submission to the Instruction-following Track

Title: Analysis of LLM Bias (Chinese Propaganda & Anti-US Sentiment) in DeepSeek-R1 vs. ChatGPT o3-mini-high

Title: BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses

Title: Not All Jokes Land: Evaluating Large Language Models Understanding of Workplace Humor

Title: Minimal Pair-Based Evaluation of Code-Switching

Title: CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions

Title: Is Extending Modality The Right Path Towards Omni-Modality?

Title: Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

Title: From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation

Title: Esoteric Language Models

Title: RewardBench 2: Advancing Reward Model Evaluation

Title: Novel Benchmark for NER in the Wastewater and Stormwater Domain

Title: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Title: Self-ensemble: Mitigating Confidence Distortion for Large Language Models

Title: WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

Title: DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation