2025-05-09

Title: How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks

Title: Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs

Title: A Comparative Benchmark of a Moroccan Darija Toxicity Detection Model (Typica.ai) and Major LLM-Based Moderation APIs (OpenAI, Mistral, Anthropic)

Title: ChatGPT for automated grading of short answer questions in mechanical ventilation

Title: FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights

Title: Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions

Title: Advancing Conversational Diagnostic AI with Multimodal Reasoning

Title: A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient

Title: Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction

Title: AI-Generated Fall Data: Assessing LLMs and Diffusion Model for Wearable Fall Detection

Title: Personalized Risks and Regulatory Strategies of Large Language Models in Digital Advertising

Title: Fine-Tuning Large Language Models and Evaluating Retrieval Methods for Improved Question Answering on Building Codes

Title: Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

Title: REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM

Title: SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding

Title: Osiris: A Lightweight Open-Source Hallucination Detection System

Title: Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards

Title: An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education

Title: Chain-of-Thought Tokens are Computer Program Variables

Title: Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes

Title: Rethinking Invariance in In-context Learning

Title: The Pitfalls of Growing Group Complexity: LLMs and Social Choice-Based Aggregation for Group Recommendations

Title: Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

Title: G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Title: Image-Text Relation Prediction for Multilingual Tweets

Title: Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization

Title: Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction

Title: Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Title: QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation

Title: Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Title: ICon: In-Context Contribution for Automatic Data Selection

Title: Frame In, Frame Out: Do LLMs Generate More Biased News Headlines than Humans?

Title: Crosslingual Reasoning through Test-Time Scaling

Title: Reasoning Models Don't Always Say What They Think

Title: TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

Title: Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data

Title: clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations

Title: UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections

Title: Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

Title: ComPO: Preference Alignment via Comparison Oracles