2025-09-22

Title: Synthetic bootstrapped pretraining

Title: Comparative Analysis of Tokenization Algorithms for Low-Resource Language Dzongkha

Title: Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Title: PolBiX: Detecting LLMs' Political Bias in Fact-Checking through X-phemisms

Title: Quantifying Self-Awareness of Knowledge in Large Language Models

Title: Real, Fake, or Manipulated? Detecting Machine-Influenced Text

Title: Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing

Title: Speech Language Models for Under-Represented Languages: Insights from Wolof

Title: Frustratingly Easy Data Augmentation for Low-Resource ASR

Title: Quantifying Uncertainty in Natural Language Explanations of Large Language Models for Question Answering

Title: PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting

Title: Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding

Title: Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models

Title: LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference

Title: How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages

Title: A method for improving multilingual quality and diversity of instruction fine-tuning datasets

Title: DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Title: Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining

Title: LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

Title: Relevance to Utility: Process-Supervised Rewrite for RAG

Title: DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Title: SciEvent: Benchmarking Multi-domain Scientific Event Extraction

Title: Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets

Title: Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models

Title: Multilingual LLM Prompting Strategies for Medical English-Vietnamese Machine Translation

Title: Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations

Title: VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion

Title: Once Upon a Time: Interactive Learning for Storytelling with Small Language Models

Title: REFER: Mitigating Bias in Opinion Summarisation via Frequency Framed Prompting

Title: Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics

Title: UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

Title: Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

Title: Multi-Physics: A Comprehensive Benchmark for Multimodal LLMs Reasoning on Chinese Multi-Subject Physics Problems

Title: Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Title: Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions

Title: Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment

Title: Localmax dynamics for attention in transformers and its asymptotic behavior

Title: BEFT: Bias-Efficient Fine-Tuning of Language Models

Title: Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech

Title: Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses

Title: It Depends: Resolving Referential Ambiguity in Minimal Contexts with Commonsense Knowledge

Title: CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Title: CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs

Title: RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation