2025-10-13

Title: Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

Title: Systematic Diagnosis of Brittle Reasoning in Large Language Models

Title: Confidence, Not Perplexity: A Better Metric for the Creative Era of LLMs

Title: Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation

Title: Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs

Title: Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Title: YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology

Title: LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Title: Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks

Title: MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation

Title: GraphGhost: Tracing Structures Behind Large Language Models

Title: Gender Bias in Large Language Models for Healthcare: Assignment Consistency and Clinical Implications

Title: Iterative LLM-Based Generation and Refinement of Distracting Conditions in Math Word Problems

Title: LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests

Title: JAI-1: A Thai-Centric Large Language Model

Title: From Simulation to Strategy: Automating Personalized Interaction Planning for Conversational Agents

Title: Text2Stories: Evaluating the Alignment Between Stakeholder Interviews and Generated User Stories

Title: PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction

Title: Do LLMs Know They Are Being Tested? Evaluation Awareness and Incentive-Sensitive Failures in GPT-OSS-20B

Title: From What to Why: Thought-Space Recommendation with Small Language Models

Title: ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

Title: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Title: Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression

Title: Formalizing Style in Personal Narratives

Title: A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data

Title: dInfer: An Efficient Inference Framework for Diffusion Language Models

Title: Scaling Laws for Code: A More Data-Hungry Regime

Title: Thinking Longer, Not Always Smarter: Evaluating LLM Capabilities in Hierarchical Legal Reasoning

Title: How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective

Title: How Reliable is Language Model Micro-Benchmarking?

Title: Coordinates from Context: Using LLMs to Ground Complex Location References

Title: Measuring Moral LLM Responses in Multilingual Capacities

Title: Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models

Title: Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective

Title: MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

Title: The Model's Language Matters: A Comparative Privacy Analysis of LLMs

Title: Search-on-Graph: Iterative Informed Navigation for Large Language Model Reasoning on Knowledge Graphs

Title: Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Title: Quality Estimation Reranking for Document-Level Translation

Title: FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

Title: Exploring Multi-Temperature Strategies for Token- and Rollout-Level Control in RLVR

Title: A Unified Biomedical Named Entity Recognition Framework with Large Language Models

Title: Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors

Title: Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions

Title: SOP-Maze: Evaluating Large Language Models on Complicated Business Standard Operating Procedures

Title: Creation of the Chinese Adaptive Policy Communication Corpus

Title: MASA: LLM-Driven Multi-Agent Systems for Autoformalization

Title: DARO: Difficulty-Aware Reweighting Policy Optimization

Title: Decoupling Safety into Orthogonal Subspace: Cost-Efficient and Performance-Preserving Alignment for Large Language Models

Title: LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction

Title: Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise

Title: Exploring Cross-Lingual Knowledge Transfer via Transliteration-Based MLM Fine-Tuning for Critically Low-resource Chakma Language

Title: Large Language Models Do NOT Really Know What They Don't Know

Title: Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation

Title: ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

Title: FrameEOL: Semantic Frame Induction using Causal Language Models

Title: When Retrieval Succeeds and Fails: Rethinking Retrieval-Augmented Generation for LLMs

Title: DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

Title: Augmenting Dialog with Think-Aloud Utterances for Modeling Individual Personality Traits by LLM

Title: LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

Title: DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction

Title: IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data

Title: CrisiText: A dataset of warning messages for LLM training in emergency communication

Title: DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Title: Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

Title: CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation

Title: Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation

Title: CLARity: Reasoning Consistency Alone Can Teach Reinforced Experts

Title: MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics

Title: ShiZhi: A Chinese Lightweight Large Language Model for Court View Generation

Title: Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference

Title: Verifying Chain-of-Thought Reasoning via Its Computational Graph

Title: FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference

Title: LLP: LLM-based Product Pricing in E-commerce

Title: ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering

Title: Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

Title: NL2GenSym: Natural Language to Generative Symbolic Rules for SOAR Cognitive Architecture via Large Language Models

Title: Understanding the Effects of Domain Finetuning on LLMs

Title: Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood

Title: Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph

Title: Active Model Selection for Large Language Models

Title: On the Representations of Entities in Auto-regressive Large Language Models

Title: The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach

Title: KORMo: Korean Open Reasoning Model for Everyone

Title: Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Title: Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Title: Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic

Title: Multimodal Policy Internalization for Conversational Agents

Title: StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

Title: Can We Reliably Rank Model Performance across Domains without Labeled Data?

Title: Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors

Title: SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Title: Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

Title: Hierarchical Indexing with Knowledge Enrichment for Multilingual Video Corpus Retrieval

Title: A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Title: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives

Title: AutoPR: Let's Automate Your Academic Promotion!

Title: Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Title: Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Title: Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation