2025-03-26

Title: SRMIR: Shadow Reward Models Based on Introspective Reasoning for LLM Alignment

Title: LookAhead Tuning: Safer Language Models via Partial Answer Previews

Title: LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment

Title: Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

Title: Understanding and Improving Information Preservation in Prompt Compression for LLMs

Title: Where is this coming from? Making groundedness count in the evaluation of Document VQA models

Title: Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling

Title: MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks

Title: Language Model Uncertainty Quantification with Attention Chain

Title: Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education

Title: Overtrained Language Models Are Harder to Fine-Tune

Title: A Survey of Large Language Model Agents for Question Answering

Title: SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings

Title: Linguistic Blind Spots of Large Language Models

Title: PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping

Title: MARS: Memory-Enhanced Agents with Reflective Self-improvement

Title: CoMAC: Conversational Agent for Multi-Source Auxiliary Context with Sparse and Symmetric Latent Interactions

Title: Machine-assisted writing evaluation: Exploring pre-trained language models in analyzing argumentative moves

Title: Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees

Title: Substance over Style: Evaluating Proactive Conversational Coaching Agents

Title: DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models

Title: Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning

Title: KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models

Title: DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts

Title: FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

Title: Scaling Laws of Synthetic Data for Language Models

Title: Context-Efficient Retrieval with Factual Decomposition

Title: Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment

Title: The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas

Title: 1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training

Title: HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection

Title: AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Title: HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning vs. Prompt Engineering in Entity-Aware Machine Translation

Title: Writing as a testbed for open ended agents

Title: Gemma 3 Technical Report

Title: SemEval-2025 Task 9: The Food Hazard Detection Challenge

Title: A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950

Title: Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Title: Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Title: CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation