2025-07-15

Title: SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

Title: Evaluating LLMs in Medicine: A Call for Rigor, Transparency

Title: From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Title: Self-Improving Model Steering

Title: Beyond vividness: Content analysis of induced hallucinations reveals the hidden structure of individual differences in visual imagery

Title: Lizard: An Efficient Linearization Framework for Large Language Models

Title: ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making

Title: OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Title: Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation

Title: CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Title: OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering

Title: RAMA: Retrieval-Augmented Multi-Agent Framework for Misinformation Detection in Multimodal Fact-Checking

Title: Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Title: Banzhida: Advancing Large Language Models for Tibetan with Curated Data and Continual Pre-Training

Title: Psychology-Driven Enhancement of Humour Translation

Title: DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models

Title: Enhancing Clinical Text Classification via Fine-Tuned DRAGON Longformer Models

Title: Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Title: ViSP: A PPO-Driven Framework for Sarcasm Generation with Contrastive Learning

Title: Balanced Training Data Augmentation for Aspect-Based Sentiment Analysis

Title: GoalfyMax: A Protocol-Driven Multi-Agent System for Intelligent Experience Entities

Title: Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models

Title: How Important is `Perfect' English for Machine Translation Prompts?

Title: An Exploration of Knowledge Editing for Arabic

Title: Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?

Title: MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs

Title: Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces

Title: Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding

Title: Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

Title: Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking

Title: Tiny Reward Models

Title: Protective Factor-Aware Dynamic Influence Learning for Suicide Risk Prediction on Social Media

Title: GeLaCo: An Evolutionary Approach to Layer Compression

Title: Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires

Title: Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning

Title: Fusing Large Language Models with Temporal Transformers for Time Series Forecasting

Title: Task-Based Flexible Feature Distillation for LLMs

Title: Abusive text transformation using LLMs

Title: Absher: A Benchmark for Evaluating Large Language Models Understanding of Saudi Dialects

Title: Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation

Title: Using AI to replicate human experimental results: a motion study

Title: From Sequence to Structure: Uncovering Substructure Reasoning in Transformers

Title: Referential ambiguity and clarification requests: comparing human and LLM behaviour

Title: From BERT to Qwen: Hate Detection across architectures

Title: MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking

Title: Can You Detect the Difference?

Title: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Title: CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks