2025-10-02

Title: Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning

Title: TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding

Title: DRBench: A Realistic Benchmark for Enterprise Deep Research

Title: PrimeX: A Dataset of Worldview, Opinion, and Explanation

Title: Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Title: BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

Title: TASER: Translation Assessment via Systematic Evaluation and Reasoning

Title: Retrieval-Augmented Generation for Electrocardiogram-Language Models

Title: Judging with Confidence: Calibrating Autoraters to Preference Distributions

Title: Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction

Title: SafePassage: High-Fidelity Information Extraction with Black Box LLMs

Title: o-MEGA: Optimized Methods for Explanation Generation and Analysis

Title: CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage

Title: TokMem: Tokenized Procedural Memory for Large Language Models

Title: LongCodeZip: Compress Long Context for Code Language Models

Title: Enhancing Rating Prediction with Off-the-Shelf LLMs Using In-Context User Reviews

Title: Agent Fine-tuning through Distillation for Domain-specific LLMs in Microdomains

Title: Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Title: MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Title: Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs

Title: Copy-Paste to Mitigate Large Language Model Hallucinations

Title: JoyAgent-JDGenie: Technical Report on the GAIA

Title: Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Title: GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Title: Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation

Title: ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

Title: CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs

Title: MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Title: Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation

Title: Inclusive Easy-to-Read Generation for Individuals with Cognitive Impairments

Title: ALARB: An Arabic Legal Argument Reasoning Benchmark

Title: Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese

Title: Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation

Title: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Title: Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

Title: HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Title: Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration

Title: Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

Title: Making, not Taking, the Best of N

Title: Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

Title: Syntax-Guided Diffusion Language Models with User-Integrated Personalization

Title: Interpreting Language Models Through Concept Descriptions: A Survey

Title: Hybrid Dialogue State Tracking for Persian Chatbots: A Language Model-Based Approach

Title: mR3: Multilingual Rubric-Agnostic Reward Reasoning Models

Title: Pay-Per-Search Models are Abstention Models

Title: Backdoor Attacks Against Speech Language Models

Title: Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

Title: GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning

Title: Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Title: Energy-Regularized Sequential Model Editing on Hyperspheres