2025-08-25

Title: KG-o1: Enhancing Multi-hop Question Answering in Large Language Models via Knowledge Graph Integration

Title: InteChar: A Unified Oracle Bone Character List for Ancient Chinese Language Modeling

Title: Format as a Prior: Quantifying and Analyzing Bias in LLMs for Heterogeneous Data

Title: Do Language Models Agree with Human Perceptions of Suspense in Stories?

Title: Benchmarking the Legal Reasoning of LLMs in Arabic Islamic Inheritance Cases

Title: Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks

Title: Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models

Title: A Framework for Processing Textual Descriptions of Business Processes using a Constrained Language -- Technical Report

Title: LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions

Title: MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Title: ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Title: ALAS: Autonomous Learning Agent for Self-Updating Language Models

Title: SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression

Title: KL-based self-distillation for large language models

Title: Chain-of-Query: Unleashing the Power of LLMs in SQL-Aided Table Understanding via Multi-Agent Collaboration

Title: Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models

Title: From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

Title: SCOPE: A Generative Approach for LLM Prompt Compression

Title: User-Assistant Bias in LLMs

Title: Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables

Title: Research on intelligent generation of structural demolition suggestions based on multi-model collaboration

Title: An Auditable Pipeline for Fuzzy Full-Text Screening in Systematic Reviews: Integrating Contrastive Semantic Highlighting and LLM Judgment

Title: Enhancing Cryptocurrency Sentiment Analysis with Multimodal Features

Title: Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models

Title: DAIQ: Auditing Demographic Attribute Inference from Question in LLMs

Title: Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs

Title: A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains

Title: Scalable Scientific Interest Profiling Using Large Language Models

Title: Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?

Title: A Review of Developmental Interpretability in Large Language Models

Title: Lexical Hints of Accuracy in LLM Reasoning Chains

Title: Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports

Title: CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Title: Mechanistic Exploration of Backdoored Large Language Model Attention Patterns

Title: MedCoT-RAG: Causal Chain-of-Thought RAG for Medical Question Answering

Title: DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document Collections

Title: QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning

Title: Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses

Title: XFinBench: Benchmarking LLMs in Complex Financial Problem Solving and Reasoning

Title: CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Title: NEAT: Concept driven Neuron Attribution in LLMs

Title: DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

Title: Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs

Title: Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Title: Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets

Title: Dancing with Deer: A Constructional Perspective on MWEs in the Era of LLMs

Title: Political Ideology Shifts in Large Language Models

Title: X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents

Title: OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

Title: Ethical Considerations of Large Language Models in Game Playing

Title: Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants

Title: CEQuest: Benchmarking Large Language Models for Construction Estimation

Title: CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency

Title: From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

Title: Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection

Title: XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering

Title: ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects

Title: Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

Title: CMR-SPB: Cross-Modal Multi-Hop Reasoning over Text, Image, and Speech with Path Balance

Title: TULIP: Adapting Open-Source Large Language Models for Underrepresented Languages and Specialized Financial Tasks

Title: MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use

Title: M3TQA: Massively Multilingual Multitask Table Question Answering

Title: From Confidence to Collapse in LLM Factual Robustness

Title: LLMs that Understand Processes: Instruction-tuning for Semantics-Aware Process Mining

Title: LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts

Title: MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering

Title: The Mediomatix Corpus: Parallel Data for Romansh Idioms via Comparable Schoolbooks

Title: ChatGPT-generated texts show authorship traits that identify them as non-human

Title: RoMedQA: The First Benchmark for Romanian Medical Question Answering

Title: Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish

Title: A Probabilistic Inference Scaling Theory for LLM Self-Correction

Title: LLM-as-classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification using Large Language Models

Title: HAMSA: Hijacking Aligned Compact Models via Stealthy Automation