2025-07-01

Title: Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Title: AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents

Title: Hallucination Detection with Small Language Models

Title: PromptAug: Fine-grained Conflict Classification Using Data Augmentation

Title: AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text

Title: Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

Title: Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

Title: RExBench: Can coding agents autonomously implement AI research extensions?

Title: Temperature Matters: Enhancing Watermark Robustness Against Paraphrasing Attacks

Title: Evaluating Hybrid Retrieval Augmented Generation using Dynamic Test Sets: LiveRAG Challenge

Title: Assessing the feasibility of Large Language Models for detecting micro-behaviors in team interactions during space missions

Title: VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

Title: Text Production and Comprehension by Human and Artificial Intelligence: Interdisciplinary Workshop Report

Title: The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

Title: Jan-nano Technical Report

Title: Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning

Title: ContextCache: Context-Aware Semantic Cache for Multi-Turn Queries in Large Language Models

Title: MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Title: Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models

Title: Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization

Title: Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems

Title: DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues

Title: Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models

Title: On the Generalizability of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

Title: A Systematic Study of Compositional Syntactic Transformer Language Models

Title: SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Title: Boosting LLM's Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning

Title: From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship

Title: FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

Title: Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models

Title: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning

Title: Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format

Title: LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation

Title: Benchmarking Deep Search over Heterogeneous Enterprise Data

Title: Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions

Title: V-SYNTHESIS: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via V-Entropy

Title: Generalist Reward Models: Found Inside Large Language Models

Title: Two Spelling Normalization Approaches Based on Large Language Models

Title: Objective-Free Local Learning and Emergent Language Structure in Thinking Machines

Title: Information Loss in LLMs' Multilingual Translation: The Role of Training Data, Language Proximity, and Language Family

Title: ATGen: A Framework for Active Text Generation

Title: Perspective Dial: Measuring Perspective of Text and Guiding LLM Outputs

Title: Hierarchical Memory Organization for Wikipedia Generation

Title: Datasets for Fairness in Language Models: An In-Depth Survey

Title: TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Title: What to Keep and What to Drop: Adaptive Table Filtering Framework

Title: Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent

Title: Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably

Title: NEU-ESC: A Comprehensive Vietnamese dataset for Educational Sentiment analysis and topic Classification toward multitask learning

Title: On Recipe Memorization and Creativity in Large Language Models: Is Your Model a Creative Cook, a Bad Cook, or Merely a Plagiator?

Title: Semantic-guided Diverse Decoding for Large Language Model

Title: Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs

Title: L0: Reinforcement Learning to Become General Agents

Title: AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data

Title: Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences

Title: Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It

Title: Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting

Title: The Trilemma of Truth in Large Language Models

Title: IMPACT: Inflectional Morphology Probes Across Complex Typologies

Title: Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages

Title: Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs

Title: Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders

Title: TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation

Title: Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

Title: Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective

Title: EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

Title: STACK: Adversarial Attacks on LLM Safeguard Pipelines

Title: On the Predictive Power of Representation Dispersion in Language Models

Title: Computational Detection of Intertextual Parallels in Biblical Hebrew: A Benchmark Study Using Transformer-Based Language Models