2025-08-19

Title: Deep Language Geometry: Constructing a Metric Space from LLM Weights

Title: Can we Evaluate RAGs with Synthetic Data?

Title: Limitation Learning: Catching Adverse Dialog with GAIL

Title: Investigating Transcription Normalization in the Faetar ASR Benchmark

Title: A Multi-Task Evaluation of LLMs' Processing of Academic Text Input

Title: LLM-Guided Planning and Summary-Based Scientific Text Simplification: DS@GT at CLEF 2025 SimpleText

Title: Hallucination Detection and Mitigation in Scientific Text Simplification using Ensemble Approaches: DS@GT at CLEF 2025 SimpleText

Title: Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions

Title: When Does Language Transfer Help? Sequential Fine-Tuning for Cross-Lingual Euphemism Detection

Title: SupraTok: Cross-Boundary Tokenization for Enhanced Language Model Performance

Title: In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning

Title: CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

Title: LLMs Struggle with NLI for Perfect Aspect: A Cross-Linguistic Study in Chinese and Japanese

Title: CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection

Title: Learning Wisdom from Errors: Promoting LLM's Continual Relation Learning through Exploiting Error Cases

Title: Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Title: J6: Jacobian-Driven Role Attribution for Multi-Objective Prompt Optimization in LLMs

Title: STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transition Samples

Title: LLM-as-a-Judge for Privacy Evaluation? Exploring the Alignment of Human and LLM Perceptions of Privacy in Textual Data

Title: Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

Title: Fast, Slow, and Tool-augmented Thinking for LLMs: A Review

Title: The Self-Execution Benchmark: Measuring LLMs' Attempts to Overcome Their Lack of Self-Execution

Title: Legal$Δ$: Enhancing Legal Reasoning in LLMs via Reinforcement Learning with Chain-of-Thought Guided Information Gain

Title: A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation

Title: Consensus or Conflict? Fine-Grained Evaluation of Conflicting Answers in Question-Answering

Title: ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models

Title: MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph

Title: ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads

Title: The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases

Title: Uncovering Emergent Physics Representations Learned In-Context by Large Language Models

Title: M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following

Title: LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

Title: Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models

Title: The Structural Sources of Verb Meaning Revisited: Large Language Models Display Syntactic Bootstrapping

Title: Mitigating Hallucinations in Large Language Models via Causal Reasoning

Title: CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

Title: Beyond Modality Limitations: A Unified MLLM Approach to Automated Speaking Assessment with Effective Curriculum Learning

Title: Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Title: Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

Title: Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Title: Breaking Language Barriers: Equitable Performance in Multilingual Language Models

Title: Leveraging Large Language Models for Predictive Analysis of Human Misery

Title: ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction

Title: DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

Title: LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

Title: CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

Title: From SALAMANDRA to SALAMANDRATA: BSC Submission for WMT25 General Machine Translation Shared Task

Title: HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks

Title: Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Title: When Alignment Hurts: Decoupling Representational Spaces in Multilingual Models

Title: Word Meanings in Transformer Language Models

Title: An LLM Agent-Based Complex Semantic Table Annotation Approach

Title: A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models

Title: Analyzing Information Sharing and Coordination in Multi-Agent Planning

Title: WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents

Title: Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction

Title: Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

Title: Doğal Dil İşlemede Tokenizasyon Standartları ve Ölçümü: Türkçe Üzerinden Büyük Dil Modellerinin Karşılaştırmalı Analizi

Title: Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Title: DocHPLT: A Massively Multilingual Document-Level Translation Dataset

Title: All for law and law for all: Adaptive RAG Pipeline for Legal Research

Title: AutoBnB-RAG: Enhancing Multi-Agent Incident Response with Retrieval-Augmented Generation

Title: Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries

Title: Improving Detection of Watermarked Language Models

Title: OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Title: Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

Title: RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns