2025-02-25

Title: Integrating Domain Knowledge into Large Language Models for Enhanced Fashion Recommendations

Title: Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction

Title: On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Title: Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset

Title: Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting

Title: Towards Robust ESG Analysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category Generalization

Title: CoME: An Unlearning-based Approach to Conflict-free Model Editing

Title: Pragmatic Reasoning improves LLM Code Generation

Title: Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models

Title: Hallucination Detection in Large Language Models with Metamorphic Relations

Title: Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Title: Forecasting Frontier Language Model Agent Capabilities

Title: Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Title: PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation

Title: Synthetic vs. Gold: The Role of LLM-Generated Labels and Data in Cyberbullying Detection

Title: MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Title: A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

Title: Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models

Title: Mind the Gap! Static and Interactive Evaluations of Large Audio Models

Title: Self-Taught Agentic Long Context Understanding

Title: Improving Consistency in Large Language Models through Chain of Guidance

Title: CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models

Title: AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients

Title: MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning

Title: R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

Title: Sparsity May Be All You Need: Sparse Random Parameter Adaptation

Title: KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Title: Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation

Title: Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Title: Echo: A Large Language Model with Temporal Episodic Memory

Title: Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Title: Chain-of-Description: What I can understand, I can put into words

Title: Understanding Zero-shot Rare Word Recognition Improvements Through LLM Integration

Title: The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Title: Number Representations in LLMs: A Computational Parallel to Human Perception

Title: EPERM: An Evidence Path Enhanced Reasoning Model for Knowledge Graph Question and Answering

Title: Mapping 1,000+ Language Models via the Log-Likelihood Vector

Title: BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking

Title: IPO: Your Language Model is Secretly a Preference Classifier

Title: ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Title: LegalBench.PT: A Benchmark for Portuguese Law

Title: Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores

Title: A generative approach to LLM harmfulness detection with special red flag tokens

Title: Instruction-Tuning LLMs for Event Extraction with Annotation Guidelines

Title: Sequence-level Large Language Model Training with Contrastive Preference Optimization

Title: Contrastive Learning of English Language and Crystal Graphs for Multimodal Representation of Materials Knowledge

Title: Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Title: A Fine-Tuning Approach for T5 Using Knowledge Graphs to Address Complex Tasks

Title: All That Glitters is Not Novel: Plagiarism in AI Generated Research

Title: Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models

Title: FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

Title: GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking

Title: Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

Title: Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation

Title: Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs

Title: Advanced Chain-of-Thought Reasoning for Parameter Extraction from Documents Using Large Language Models

Title: Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection?

Title: Beyond Words: How Large Language Models Perform in Quantitative Management Problem-Solving

Title: Revealing the Pragmatic Dilemma for Moral Reasoning Acquisition in Language Models

Title: MemeIntel: Explainable Detection of Propagandistic and Hateful Memes

Title: CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Title: Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries

Title: CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Title: MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Title: Automatic Input Rewriting Improves Translation with Large Language Models

Title: WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale

Title: Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI

Title: Code Summarization Beyond Function Level

Title: Can ChatGPT Learn to Count Letters?

Title: Beyond Pattern Recognition: Probing Mental Representations of LMs

Title: Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Title: Layer-Wise Evolution of Representations in Fine-Tuned Transformers: Insights from Sparse AutoEncoders

Title: SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Title: Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

Title: A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts

Title: LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Title: MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts

Title: Are Large Language Models Good Data Preprocessors?

Title: Unsupervised Topic Models are Data Mixers for Pre-training Language Models

Title: CoT2Align: Cross-Chain of Thought Distillation via Optimal Transport Alignment for Language Models with Different Tokenizers

Title: Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses

Title: Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization

Title: REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction

Title: "Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts

Title: Sarang at DEFACTIFY 4.0: Detecting AI-Generated Text Using Noised Data and an Ensemble of DeBERTa Models

Title: LongAttn: Selecting Long-context Training Data via Token-level Attention

Title: CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter

Title: DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance

Title: Applying LLMs to Active Learning: Towards Cost-Efficient Cross-Task Text Classification without Manually Labeled Data

Title: Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Title: Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs

Title: GuidedBench: Equipping Jailbreak Evaluation with Guidelines

Title: AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models

Title: Dependency Parsing with the Structuralized Prompt Template

Title: SS-MPC: A Sequence-Structured Multi-Party Conversation System

Title: Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

Title: A Systematic Survey of Automatic Prompt Optimization Techniques

Title: Reasoning Does Not Necessarily Improve Role-Playing Ability

Title: UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings

Title: LongSafety: Evaluating Long-Context Safety of Large Language Models

Title: Hotter and Colder: A New Approach to Annotating Sentiment, Emotions, and Bias in Icelandic Blog Comments

Title: All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Title: Quantifying Logical Consistency in Transformers via Query-Key Alignment

Title: Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization

Title: Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology

Title: Language Model Re-rankers are Steered by Lexical Similarities

Title: PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance

Title: Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

Title: Automatically Evaluating the Paper Reviewing Capability of Large Language Models

Title: WildFrame: Comparing Framing in Humans and LLMs on Naturally Occurring Texts

Title: Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration

Title: LettuceDetect: A Hallucination Detection Framework for RAG Applications

Title: Thus Spake Long-Context Large Language Model

Title: MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

Title: JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning

Title: Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)

Title: Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Title: Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Title: Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks

Title: Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following

Title: CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

Title: Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Title: Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

Title: MonoTODia: Translating Monologue Requests to Task-Oriented Dialogues

Title: Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing

Title: Child vs. machine language learning: Can the logical structure of human language unleash LLMs?

Title: `Generalization is hallucination' through the lens of tensor completions

Title: HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

Title: Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents

Title: Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization

Title: On Relation-Specific Neurons in Large Language Models

Title: Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects

Title: What is a Good Question? Utility Estimation with LLM-based Simulations

Title: Mitigating Bias in RAG: Controlling the Embedder

Title: Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Title: Reasoning with Latent Thoughts: On the Power of Looped Transformers

Title: LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification