2025-12-25

Title: Uncovering Competency Gaps in Large Language Models and Their Benchmarks

Title: SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention

Title: TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Title: Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization

Title: Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles

Title: Investigating Model Editing for Unlearning in Large Language Models

Title: Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?

Title: Semantic Deception: When Reasoning Models Can't Compute an Addition

Title: EssayCBM: Rubric-Aligned Concept Bottleneck Models for Transparent Essay Grading

Title: MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs

Title: Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Title: How important is Recall for Measuring Retrieval Quality?

Title: NVIDIA Nemotron 3: Efficient and Open Intelligence

Title: Architectural Trade-offs in Small Language Models Under Compute Constraints

Title: Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation

Title: Neural Probe-Based Hallucination Detection for Large Language Models

Title: MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment

Title: Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Title: Automatic Replication of LLM Mistakes in Medical Conversations

Title: Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation

Title: Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

Title: Semantic Refinement with LLMs for Graph Representations

Title: Semi-Supervised Learning for Large Language Models Safety and Content Moderation

Title: ClarifyMT-Bench: Benchmarking and Improving Multi-Turn Clarification for Conversational Large Language Models

Title: SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Title: SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance

Title: Parallel Token Prediction for Language Models

Title: Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks

Title: C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling