2025-02-28

Title: Cognitive networks highlight differences and similarities in the STEM mindsets of human and LLM-simulated trainees, experts and academics

Title: Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents

Title: When Large Language Models Meet Speech: A Survey on Integration Approaches

Title: Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Title: Stay Focused: Problem Drift in Multi-Agent Debate

Title: Do Large Language Models Know How Much They Know?

Title: Where Are We? Evaluating LLM Performance on African Languages

Title: NeoBERT: A Next-Generation BERT

Title: A City of Millions: Mapping Literary Social Networks At Scale

Title: Revisiting Word Embeddings in the LLM Era

Title: Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization

Title: Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review

Title: Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning

Title: Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning

Title: Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

Title: GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration

Title: Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs

Title: Few-Shot Multilingual Open-Domain QA from 5 Examples

Title: CNsum:Automatic Summarization for Chinese News Text

Title: Preference Learning Unlocks LLMs' Psycho-Counseling Skills

Title: R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

Title: XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

Title: HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Title: Beneath the Surface: How Large Language Models Reflect Hidden Bias

Title: PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation

Title: EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models

Title: Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

Title: Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Title: MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue

Title: MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Title: Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

Title: GeoEdit: Geometric Knowledge Editing for Large Language Models

Title: Collaborative Stance Detection via Small-Large Language Model Consistency Verification

Title: Deterministic or probabilistic? The psychology of LLMs as random number generators

Title: The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs

Title: Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Title: Polish-ASTE: Aspect-Sentiment Triplet Extraction Datasets for Polish

Title: Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents

Title: LongRoPE2: Near-Lossless LLM Context Window Scaling

Title: Self-Training Elicits Concise Reasoning in Large Language Models

Title: Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking

Title: Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge

Title: ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models

Title: FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

Title: From Retrieval to Generation: Comparing Different Approaches

Title: Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets

Title: LLM as a Broken Telephone: Iterative Generation Distorts Information

Title: LangProBe: a Language Programs Benchmark

Title: Long-Context Inference with Retrieval-Augmented Speculative Decoding

Title: Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models

Title: Expertise Is What We Want

Title: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Title: Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models

Title: KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model

Title: Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs

Title: Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization