2025-05-26

Title: Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge

Title: Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion

Title: VLM-KG: Multimodal Radiology Knowledge Graph Generation

Title: Assessing GPT's Bias Towards Stigmatized Social Groups: An Intersectional Case Study on Nationality Prejudice and Psychophobia

Title: Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe

Title: Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally

Title: Gender and Positional Biases in LLM-Based Hiring Decisions: Evidence from Comparative CV/Résumé Evaluations

Title: Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning

Title: Embedding-to-Prefix: Parameter-Efficient Personalization for Pre-Trained Large Language Models

Title: SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs

Title: Social preferences with unstable interactive reasoning: Large language models in economic trust games

Title: Are LLMs Ready for English Standardized Tests? A Benchmarking and Elicitation Perspective

Title: DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation

Title: Medalyze: Lightweight Medical Report Summarization Application Using FLAN-T5-Large

Title: SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation

Title: Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models

Title: Synthetic Data RL: Task Definition Is All You Need

Title: Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases

Title: Improving endpoint detection in end-to-end streaming ASR for conversational speech

Title: What's in a prompt? Language models encode literary style in prompt embeddings

Title: Mechanistic Interpretability of GPT-like Models on Summarization Tasks

Title: Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency

Title: Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems

Title: Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

Title: GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace

Title: Not Minds, but Signs: Reframing LLMs through Semiotics

Title: GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data

Title: Scale-invariant Attention

Title: Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization

Title: Informatics for Food Processing

Title: Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models

Title: Large Language Models Implicitly Learn to See and Hear Just By Reading

Title: Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation

Title: TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Title: Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation

Title: Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector

Title: An approach to identify the most semantically informative deep representations of text and images

Title: BanglaByT5: Byte-Level Modelling for Bangla

Title: Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation

Title: P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

Title: RRTL: Red Teaming Reasoning Large Language Models in Tool Learning

Title: Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling

Title: Cultural Value Alignment in Large Language Models: A Prompt-based Analysis of Schwartz Values in Gemini, ChatGPT, and DeepSeek

Title: RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language

Title: Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

Title: From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

Title: After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG

Title: Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models

Title: Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training

Title: NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

Title: Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?

Title: MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Title: Conformal Language Model Reasoning with Coherent Factuality

Title: Relative Bias: A Comparative Framework for Quantifying Bias in LLMs

Title: LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions

Title: When can isotropy help adapt LLMs' next word prediction to numerical domains?

Title: Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations

Title: Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

Title: EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models

Title: Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs

Title: Large Language Models for Predictive Analysis: How Far Are They?

Title: Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions

Title: Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN

Title: PersonaBOT: Bringing Customer Personas to Life with LLMs and RAG

Title: Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting

Title: CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation

Title: Next Token Perception Score: Analytical Assessment of your LLM Perception Skills

Title: FB-RAG: Improving RAG with Forward and Backward Lookup

Title: Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs

Title: Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

Title: ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects

Title: Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)

Title: ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models

Title: The Rise of Parameter Specialization for Knowledge Storage in Large Language Models

Title: CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports

Title: Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

Title: GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations

Title: Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty

Title: SELF: Self-Extend the Context Length With Logistic Growth Function

Title: Refusal Direction is Universal Across Safety-Aligned Languages

Title: From Compression to Expansion: A Layerwise Analysis of In-Context Learning

Title: GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints

Title: SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

Title: Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Title: A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit

Title: AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing

Title: WiNGPT-3.0 Technical Report

Title: Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting

Title: Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation

Title: FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

Title: Conversations: Love Them, Hate Them, Steer Them

Title: DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies

Title: T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering

Title: Discovering Forbidden Topics in Language Models

Title: Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

Title: LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization

Title: Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Title: Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning

Title: SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

Title: FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain

Title: MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning

Title: keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection

Title: Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

Title: CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents

Title: L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Title: Large Language Models Do Multi-Label Classification Differently

Title: Multimodal Conversation Structure Understanding

Title: How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception

Title: Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

Title: PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models

Title: Reasoning Meets Personalization: Unleashing the Potential of Large Reasoning Model for Personalized Generation

Title: Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models

Title: Distilling LLM Agent into Small Models with Retrieval and Code Tools

Title: Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments

Title: Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports

Title: GIM: Improved Interpretability for Large Language Models

Title: EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications

Title: Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

Title: Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Title: MIDB: Multilingual Instruction Data Booster for Enhancing Multilingual Instruction Synthesis

Title: Tuning Language Models for Robust Prediction of Diverse User Behaviors

Title: ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction

Title: Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

Title: Understanding How Value Neurons Shape the Generation of Specified Values in LLMs

Title: Fast Quiet-STaR: Thinking Without Thought Tokens

Title: Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks

Title: Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs

Title: The Real Barrier to LLM Agent Usability is Agentic ROI

Title: EXECUTE: A Multilingual Benchmark for LLM Token Understanding

Title: Compression Hacking: A Supplementary Perspective on Informatics Metric of Language Models from Geometric Distortion

Title: DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

Title: Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Title: Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning

Title: Explaining Sources of Uncertainty in Automated Fact-Checking

Title: MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

Title: Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Title: Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Title: Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models

Title: Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

Title: Counting Cycles with Deepseek

Title: Training with Pseudo-Code for Instruction Following

Title: Contrastive Distillation of Emotion Knowledge from LLMs for Zero-Shot Emotion Recognition

Title: MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving

Title: Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals

Title: QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization

Title: Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Title: ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework

Title: Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

Title: UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification

Title: Frankentext: Stitching random text fragments into long-form narratives

Title: Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

Title: Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

Title: First Finish Search: Efficient Test-Time Scaling in Large Language Models

Title: Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

Title: The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas