2025-01-23

Title: Human-like conceptual representations emerge from language prediction

Title: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Title: T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Title: Distillation Quantification for Large Language Models

Title: The potential -- and the pitfalls -- of using pre-trained language models as cognitive science theories

Title: Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

Title: Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression

Title: EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering

Title: NExtLong: Toward Effective Long-Context Training without Long Documents

Title: LLMs as Repositories of Factual Knowledge: Limitations and Solutions

Title: Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana

Title: Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek

Title: Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home

Title: ACEBench: Who Wins the Match Point in Tool Learning?

Title: WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge

Title: Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Title: Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration

Title: FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Title: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Title: Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

Title: OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models

Title: Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities

Title: Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

Title: Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning

Title: Autonomy-of-Experts Models

Title: Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment