2025-06-06

Title: GEM: Empowering LLM for both Embedding Generation and Language Understanding

Title: MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP

Title: Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Title: MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Title: Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning

Title: Zero-Shot Open-Schema Entity Structure Discovery

Title: Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

Title: Aligning Large Language Models with Implicit Preferences from User-Generated Content

Title: SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL

Title: DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation

Title: Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation

Title: Is It JUST Semantics? A Case Study of Discourse Particle Understanding in LLMs

Title: BSBench: will your LLM find the largest prime number?

Title: SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Title: Demonstrations of Integrity Attacks in Multi-Agent Systems

Title: Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis

Title: Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts?

Title: Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching

Title: SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

Title: MuSciClaims: Multimodal Scientific Claim Verification

Title: LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models

Title: Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification

Title: A MISMATCHED Benchmark for Scientific Natural Language Inference

Title: Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning

Title: Subjective Perspectives within Learned Representations Predict High-Impact Innovation

Title: Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning

Title: TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering

Title: Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents

Title: Normative Conflicts and Shallow AI Alignment

Title: MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

Title: Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Title: Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification

Title: Accelerated Test-Time Scaling with Model-Free Speculative Sampling

Title: SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

Title: Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection

Title: Identifying Reliable Evaluation Metrics for Scientific Text Revision

Title: Fine-Grained Interpretation of Political Opinions in Large Language Models

Title: MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark

Title: Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques

Title: Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Title: Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Title: A Reasoning-Based Approach to Cryptic Crossword Clue Solving

Title: Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models

Title: Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights

Title: Prompting LLMs: Length Control for Isometric Machine Translation

Title: Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Title: ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Title: Verbose ListOps (VLO): Beyond Long Context -- Unmasking LLM's Reasoning Blind Spots

Title: Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback

Title: ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT

Title: From Struggle (06-2024) to Mastery (02-2025) LLMs Conquer Advanced Algorithm Exams and Pave the Way for Editorial Generation

Title: SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Title: ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Title: Controlling Summarization Length Through EOS Token Weighting

Title: Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers

Title: TALL -- A Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages

Title: Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation

Title: Does It Make Sense to Speak of Introspection in Large Language Models?

Title: RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation

Title: Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation

Title: Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin

Title: The NTNU System at the S&I Challenge 2025 SLA Open Track

Title: DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning

Title: Information Locality as an Inductive Bias for Neural Language Models

Title: AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

Title: Do Large Language Models Judge Error Severity Like Humans?

Title: Knowledgeable-r1: Policy Optimization for Knowledge Exploration in Retrieval-Augmented Generation

Title: Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective

Title: ECoRAG: Evidentiality-guided Compression for Long Context RAG

Title: Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Title: Counterfactual reasoning: an analysis of in-context emergence

Title: RELIC: Evaluating Compositional Instruction Following via Language Recognition

Title: The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Title: Improving Low-Resource Morphological Inflection via Self-Supervised Objectives

Title: CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection

Title: Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

Title: ProRefine: Inference-time Prompt Refinement with Textual Feedback

Title: Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Title: Search Arena: Analyzing Search-Augmented LLMs

Title: Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models