2025-10-30

Title: Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries

Title: Dingtalk DeepResearch: A Unified Multi Agent Framework for Adaptive Intelligence in Enterprise Environments

Title: Confidence is Not Competence

Title: Large Language Models Report Subjective Experience Under Self-Referential Processing

Title: COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations

Title: ProofSketch: Efficient Verified Reasoning for Large Language Models

Title: Towards a Method for Synthetic Generation of PWA Transcripts

Title: Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Title: Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

Title: Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation

Title: Idea2Plan: Exploring AI-Powered Research Planning

Title: RiddleBench: A New Generative Reasoning Benchmark for LLMs

Title: Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

Title: SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

Title: Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale

Title: Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers

Title: Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

Title: GAPMAP: Mapping Scientific Knowledge Gaps in Biomedical Literature Using Large Language Models

Title: Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?

Title: TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors

Title: BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs

Title: DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates

Title: A Survey on Unlearning in Large Language Models

Title: Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR

Title: Model-Document Protocol for AI Search

Title: Testing Cross-Lingual Text Comprehension In LLMs Using Next Sentence Prediction

Title: ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation

Title: Adapting Small Language Models to Low-Resource Domains: A Case Study in Hindi Tourism QA

Title: Teaching Sarcasm: Few-Shot Multimodal Sarcasm Detection via Distillation to a Parameter-Efficient Student

Title: Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

Title: CRMWeaver: Building Powerful Business Agent via Agentic RL and Shared Memories

Title: Not ready for the bench: LLM legal interpretation is unstable and out of step with human judgments

Title: Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs

Title: Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy

Title: Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires

Title: BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains

Title: Serve Programs, Not Prompts

Title: Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media

Title: Implicature in Interaction: Understanding Implicature Improves Alignment in Human-LLM Interaction

Title: RLMEval: Evaluating Research-Level Neural Theorem Proving

Title: Depth and Autonomy: A Framework for Evaluating LLM Applications in Social Science Research

Title: A Critical Study of Automatic Evaluation in Sign Language Translation

Title: Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs

Title: Fine-Tuned Language Models for Domain-Specific Summarization and Tagging

Title: TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation

Title: Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry

Title: FARSIQA: Faithful and Advanced RAG System for Islamic Question Answering

Title: Evaluating the Role of Verifiers in Test-Time Scaling for Legal Reasoning Tasks

Title: Are Language Models Efficient Reasoners? A Perspective from Logic Programming

Title: EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

Title: PairUni: Pairwise Training for Unified Multimodal Language Models

Title: Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML?

Title: The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Title: The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework

Title: Scaling Latent Reasoning via Looped Language Models

Title: Task Completion Agents are Not Ideal Collaborators

Title: DiagramEval: Evaluating LLM-Generated Diagrams via Graphs

Title: Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models

Title: Gaperon: A Peppered English-French Generative Language Model Suite