2025-11-18

Title: TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy

Title: MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Title: On the Notion that Language Models Reason

Title: Scaling Open-Weight Large Language Models for Hydropower Regulatory Information Extraction: A Systematic Analysis

Title: Towards Autoformalization of LLM-generated Outputs for Requirement Verification

Title: Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches

Title: MedPT: A Massive Medical Question Answering Dataset for Brazilian-Portuguese Speakers

Title: ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts

Title: Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support

Title: Additive Large Language Models for Semi-Structured Text

Title: InData: Towards Secure Multi-Step, Tool-Based Data Analysis

Title: Improving LLM's Attachment to External Knowledge In Dialogue Generation Tasks Through Entity Anonymization

Title: On the Entropy Calibration of Language Models

Title: A Reasoning Paradigm for Named Entity Recognition

Title: Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

Title: CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs

Title: LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models

Title: PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

Title: AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing

Title: Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

Title: CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

Title: MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues

Title: Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts

Title: Cmprsr: Abstractive Token-Level Question-Agnostic Prompt Compressor

Title: Do LLMs and Humans Find the Same Questions Difficult? A Case Study on Japanese Quiz Answering

Title: Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load

Title: From Phonemes to Meaning: Evaluating Large Language Models on Tamil

Title: Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models

Title: Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

Title: SGuard-v1: Safety Guardrail for Large Language Models

Title: TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

Title: Mitigating Length Bias in RLHF through a Causal Lens

Title: MMWOZ: Building Multimodal Agent for Task-oriented Dialogue

Title: Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

Title: Knots: A Large-Scale Multi-Agent Enhanced Expert-Annotated Dataset and LLM Prompt Optimization for NOTAM Semantic Parsing

Title: Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing

Title: Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data

Title: Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Title: Adaptive Focus Memory for Language Models

Title: On the Brittleness of LLMs: A Journey around Set Membership

Title: Evidence of Phase Transitions in Small Transformer-Based Language Models

Title: LLM Reinforcement in Context

Title: Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing

Title: BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

Title: From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

Title: NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation

Title: From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

Title: Classification of Hope in Textual Data using Transformer-Based Models

Title: Visual Room 2.0: Seeing is Not Understanding for MLLMs

Title: Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty

Title: AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

Title: Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

Title: BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models

Title: Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study

Title: Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction

Title: Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

Title: Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Title: TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine

Title: Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

Title: Seeing isn't Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms

Title: Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Title: Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

Title: Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts

Title: Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction

Title: Applying Large Language Models to Characterize Public Narratives

Title: Beyond SELECT: A Comprehensive Taxonomy-Guided Benchmark for Real-World Text-to-SQL Translation

Title: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

Title: Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues

Title: Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation

Title: Generalist Foundation Models Are Not Clinical Enough for Hospital Operations