2025-10-03

Title: Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset

Title: Towards Open-Ended Discovery for Low-Resource NLP

Title: Discourse vs emissions: Analysis of corporate narratives, symbolic practices, and mimicry through LLMs

Title: Context Matters: Comparison of commercial large language tools in veterinary medicine

Title: ClaimCheck: Real-Time Fact-Checking with Small Language Models

Title: EEFSUVA: A New Mathematical Olympiad Benchmark

Title: Who is In Charge? Dissecting Role Conflicts in Instruction Following

Title: Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision

Title: Trustworthy Summarization via Uncertainty Quantification and Risk Awareness in Large Language Models

Title: Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks

Title: LLMRank: Understanding LLM Strengths for Model Routing

Title: GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings

Title: Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation

Title: Silent Tokens, Loud Effects: Padding in LLMs

Title: CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM

Title: SKYLENAGE Technical Report: Mathematical Reasoning and Contest-Innovation Benchmarks for Multi-Level Math Evaluation

Title: Redundancy-as-Masking: Formalizing the Artificial Age Score (AAS) to Model Memory Aging in Generative AI

Title: Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Title: Feasibility of Structuring Stress Documentation Using an Ontology-Guided Large Language Model

Title: SeMob: Semantic Synthesis for Dynamic Urban Mobility Prediction

Title: A Comparative Analysis of Sparse Autoencoder and Activation Difference in Language Model Steering

Title: Let's Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models' Understanding of Sports

Title: SSTAG: Structure-Aware Self-Supervised Learning Method for Text-Attributed Graphs

Title: LOCA: Logical Chain Augmentation for Scientific Corpus Cleaning

Title: GemDetox at TextDetox CLEF 2025: Enhancing a Massively Multilingual Model for Text Detoxification on Low-resource Languages

Title: Efficient Uncertainty Estimation for LLM-based Entity Linking in Tabular Data

Title: GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

Title: Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs

Title: Longitudinal Monitoring of LLM Content Moderation of Social Issues

Title: RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs

Title: Measuring Algorithmic Partisanship via Zero-Shot Classification and Its Implications on Political Discourse

Title: In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b

Title: OpenAI's GPT-OSS-20B Model and Safety Alignment Issues in a Low-Resource Language

Title: AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees

Title: Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection

Title: TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models

Title: LLM Based Sentiment Classification From Bangladesh E-Commerce Reviews

Title: TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

Title: Evaluation Sheet for Deep Research: A Use Case for Academic Survey Writing

Title: HiSpec: Hierarchical Speculative Decoding for LLMs

Title: TAG-EQA: Text-And-Graph for Event Question Answering via Structured Prompting Strategies

Title: A-VERT: Agnostic Verification with Embedding Ranking Targets

Title: One More Question is Enough, Expert Question Decomposition (EQD) Model for Domain Quantitative Reasoning

Title: ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning

Title: CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Title: A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation

Title: RAG-BioQA Retrieval-Augmented Generation for Long-Form Biomedical Question Answering

Title: Efficient Training of Robust Traditional Chinese LLaMA-1B on a Single Consumer GPU: Continual Pre-training, SFT, and DPO

Title: AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System

Title: NLP Methods for Detecting Novel LLM Jailbreaks and Keyword Analysis with BERT

Title: Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention

Title: SoK: Measuring What Matters for Closed-Loop Security Agents

Title: MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

Title: FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol

Title: How Do Language Models Compose Functions?

Title: Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation

Title: What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?

Title: Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

Title: Comparison of Unsupervised Metrics for Evaluating Judicial Decision Extraction

Title: Detecting LLM-Generated Spam Reviews by Integrating Language Model Embeddings and Graph Neural Network

Title: Syntactic Blind Spots: How Misalignment Leads to LLMs Mathematical Errors

Title: SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning

Title: Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models

Title: REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration

Title: Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

Title: Inverse Language Modeling towards Robust and Grounded LLMs

Title: Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

Title: Taking a SEAT: Predicting Value Interpretations from Sentiment, Emotion, Argument, and Topic Annotations

Title: Exploring Database Normalization Effects on SQL Generation

Title: LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target

Title: Style Over Story: A Process-Oriented Study of Authorial Creativity in Large Language Models

Title: Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Title: Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems

Title: The Disparate Impacts of Speculative Decoding

Title: RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

Title: Learning to Reason for Hallucination Span Detection

Title: ARUQULA -- An LLM based Text2SPARQL Approach using ReAct and Knowledge Graph Exploration Utilities

Title: Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Title: More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration

Title: AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

Title: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Title: InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

Title: From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

Title: F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data

Title: Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation