2024-02-15

Title: PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

Authors: Yongchao Chen, Jacob Arkin, Yilun Hao, Yang Zhang, Nicholas Roy, Chuchu Fan
Subjects: cs.CL, cs.AI, cs.HC, cs.RO
Abstract URL: https://arxiv.org/abs/2402.08702
Pdf URL: https://arxiv.org/pdf/2402.08702
Copy Paste: [[2402.08702]] PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment(https://arxiv.org/abs/2402.08702)
Keywords: language model, gpt, llm, prompt, agent
Abstract: Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework that incorporates human-designed feedback rules about potential errors to automatically offer direct suggestions for improvement. Our framework is stylized as a genetic algorithm in which an LLM generates new candidate prompts from a parent prompt and its associated feedback; we use a learned heuristic function that predicts prompt performance to efficiently sample from these candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across eight representative multi-step tasks (an average 27.7% and 28.2% improvement to current best methods on GPT-3.5 and GPT-4, respectively). We further show that the score function for tasks can be modified to better align with individual preferences. We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks. Datasets and Codes are available at https://github.com/yongchao98/PROMST. Project Page is available at https://yongchao98.github.io/MIT-REALM-PROMST.
摘要：提示优化旨在为给定任务找到大型语言模型 (LLM) 的最佳提示。法学硕士已成功用于帮助寻找和改进单步任务的提示候选人。然而，代理的实际任务是多步骤的，并带来了新的挑战：（1）提示内容可能更加广泛和复杂，使得法学硕士更难以分析错误，（2）单个步骤的影响很难评估，（3）不同的人对任务执行可能有不同的偏好。虽然人类努力优化提示，但他们擅长提供有关 LLM 输出的反馈；因此，我们引入了一种新的法学硕士驱动的离散提示优化框架，该框架结合了人工设计的有关潜在错误的反馈规则，以自动提供直接的改进建议。我们的框架被程式化为遗传算法，其中法学硕士根据父提示及其相关反馈生成新的候选提示；我们使用学习的启发式函数来预测即时表现，以有效地从这些候选者中进行抽样。这种方法在八个代表性多步骤任务中显着优于人工设计的提示和其他几种提示优化方法（与 GPT-3.5 和 GPT-4 上的当前最佳方法相比，平均分别提高了 27.7% 和 28.2%）。我们进一步表明，可以修改任务的评分函数以更好地符合个人偏好。我们相信我们的工作可以作为 LLM 驱动的多步骤任务自动提示优化的基准。数据集和代码可在 https://github.com/yongchao98/PROMST 获取。项目页面位于 https://yongchao98.github.io/MIT-REALM-PROMST。

Title: PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Authors: Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.08714
Pdf URL: https://arxiv.org/pdf/2402.08714
Copy Paste: [[2402.08714]] PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models(https://arxiv.org/abs/2402.08714)
Keywords: prompt
Abstract: Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts. In this paper, we propose Proximal Reward Difference Prediction (PRDP), enabling stable black-box reward finetuning for diffusion models for the first time on large-scale prompt datasets with over 100K prompts. Our key innovation is the Reward Difference Prediction (RDP) objective that has the same optimal solution as the RL objective while enjoying better training stability. Specifically, the RDP objective is a supervised regression objective that tasks the diffusion model with predicting the reward difference of generated image pairs from their denoising trajectories. We theoretically prove that the diffusion model that obtains perfect reward difference prediction is exactly the maximizer of the RL objective. We further develop an online algorithm with proximal updates to stably optimize the RDP objective. In experiments, we demonstrate that PRDP can match the reward maximization ability of well-established RL-based methods in small-scale training. Furthermore, through large-scale training on text prompts from the Human Preference Dataset v2 and the Pick-a-Pic v1 dataset, PRDP achieves superior generation quality on a diverse set of complex, unseen prompts whereas RL-based methods completely fail.
摘要：奖励微调已成为一种将基础模型与下游目标结合起来的有前景的方法。通过使用强化学习 (RL) 来最大化反映人类偏好的奖励，在语言领域取得了显着的成功。然而，在视觉领域，现有的基于强化学习的奖励微调方法因其在大规模训练中的不稳定性而受到限制，导致它们无法泛化到复杂的、看不见的提示。在本文中，我们提出了近端奖励差异预测（PRDP），首次在具有超过 100K 提示的大规模提示数据集上实现了扩散模型的稳定黑盒奖励微调。我们的关键创新是奖励差异预测（RDP）目标，它具有与 RL 目标相同的最优解，同时具有更好的训练稳定性。具体来说，RDP 目标是一个监督回归目标，它要求扩散模型根据去噪轨迹预测生成的图像对的奖励差异。我们从理论上证明，获得完美奖励差异预测的扩散模型正是 RL 目标的最大化者。我们进一步开发了一种具有最近更新的在线算法，以稳定地优化 RDP 目标。在实验中，我们证明 PRDP 可以在小规模训练中与基于强化学习的成熟方法的奖励最大化能力相匹配。此外，通过对来自人类偏好数据集 v2 和 Pick-a-Pic v1 数据集的文本提示进行大规模训练，PRDP 在各种复杂的、看不见的提示上实现了卓越的生成质量，而基于 RL 的方法完全失败。

Title: Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Authors: Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.08733
Pdf URL: https://arxiv.org/pdf/2402.08733
Copy Paste: [[2402.08733]] Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs(https://arxiv.org/abs/2402.08733)
Keywords: language model
Abstract: Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.
摘要：确定模型 ${\widehat{p}}_{\theta}(Y|X)$ 对它所训练的随机现实世界过程 $p(Y|X)$ 了解多少对于确保避免产生不正确或“幻觉”的答案或采取不安全的行为。但这对于生成模型来说很困难，因为概率预测无法区分每次响应的噪声（任意不确定性）和缺乏对过程的了解（认知不确定性），并且当模型拟合不足时，现有的认知不确定性量化技术往往会过于自信。我们提出了一种通用策略，用于教授模型近似 $p(Y|X)$ 并估计 ${\widehat{p}}_{\theta}(Y|X)$ 和 $p( Y|X)$：训练它预测从真实条件分布中得出的一对独立响应，允许它通过观察一个响应同时预测另一个响应来“作弊”，然后测量它作弊的程度。值得注意的是，我们证明善于作弊（即只要能提高你的预测就进行作弊）相当于进行二阶校准，这是普通校准的原则性扩展，使我们能够为 $p(Y| 构建可证明正确的频率论置信区间。 X)$ 并以高概率检测到不正确的响应。我们凭经验证明，我们的方法可以准确估计模型在模糊图像分类、（合成）语言建模和部分可观察的导航任务中不知道的程度，优于现有技术。

Title: LLM-driven Imitation of Subrational Behavior : Illusion or Reality?

Authors: Andrea Coletta, Kshama Dwarakanath, Penghang Liu, Svitlana Vyetrenko, Tucker Balch
Subjects: cs.AI, econ.GN
Abstract URL: https://arxiv.org/abs/2402.08755
Pdf URL: https://arxiv.org/pdf/2402.08755
Copy Paste: [[2402.08755]] LLM-driven Imitation of Subrational Behavior : Illusion or Reality?(https://arxiv.org/abs/2402.08755)
Keywords: language model, llm, agent
Abstract: Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.
摘要：由于难以校准强化学习模型或收集涉及人类受试者的数据，对人类或经济家庭等亚理性主体进行建模本质上具有挑战性。现有的工作强调了大型语言模型（LLM）解决复杂推理任务和模仿人类交流的能力，而使用 LLM 作为代理的模拟显示了新兴的社会行为，有可能提高我们对人类行为的理解。在本文中，我们建议研究使用 LLM 来生成合成人类演示，然后用于通过模仿学习来学习亚理性代理策略。我们假设法学硕士可以用作人类的隐式计算模型，并提出一个框架，使用从法学硕士衍生的综合演示来模拟人类特征的亚理性行为（例如，近视行为或风险规避偏好）。我们通过四个简单的场景（包括经过充分研究的最后通牒游戏和棉花糖实验）来实验评估我们的框架模拟次理性的能力。为了获得对我们的框架的信心，我们能够复制先前与上述场景相关的人类研究中已得到证实的发现。最后，我们讨论了我们框架的潜在好处、挑战和局限性。

Title: Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models

Authors: Maurice Diesendruck, Jianzhe Lin, Shima Imani, Gayathri Mahalingam, Mingyang Xu, Jie Zhao
Subjects: cs.CL, cs.CV
Abstract URL: https://arxiv.org/abs/2402.08756
Pdf URL: https://arxiv.org/pdf/2402.08756
Copy Paste: [[2402.08756]] Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models(https://arxiv.org/abs/2402.08756)
Keywords: gpt, llm, prompt
Abstract: When LLMs perform zero-shot inference, they typically use a prompt with a task specification, and generate a completion. However, there is no work to explore the possibility of the reverse - going from completion to task specification. In this paper, we employ both directions to perform cycle-supervised learning entirely in-context. Our goal is to create a forward map f : X -> Y (e.g. image -> generated caption), coupled with a backward map g : Y -> X (e.g. caption -> generated image) to construct a cycle-consistency "loss" (formulated as an update to the prompt) to enforce g(f(X)) ~= X. The technique, called CyclePrompt, uses cycle-consistency as a free supervisory signal to iteratively craft the prompt. Importantly, CyclePrompt reinforces model performance without expensive fine-tuning, without training data, and without the complexity of external environments (e.g. compilers, APIs). We demonstrate CyclePrompt in two domains: code generation and image captioning. Our results on the HumanEval coding benchmark put us in first place on the leaderboard among models that do not rely on extra training data or usage of external environments, and third overall. Compared to the GPT4 baseline, we improve accuracy from 80.5% to 87.2%. In the vision-language space, we generate detailed image captions which outperform baseline zero-shot GPT4V captions, when tested against natural (VQAv2) and diagrammatic (FigureQA) visual question-answering benchmarks. To the best of our knowledge, this is the first use of self-supervised learning for prompting.
摘要：当法学硕士执行零样本推理时，他们通常使用带有任务规范的提示，并生成完成。然而，目前还没有工作来探索相反的可能性——从完成到任务规范。在本文中，我们采用两个方向来完全在上下文中执行循环监督学习。我们的目标是创建一个前向映射 f : X -> Y （例如图像 -> 生成的标题），再加上一个后向映射 g : Y -> X （例如标题 -> 生成的图像）来构建循环一致性“损失” “（作为提示的更新）强制执行 g(f(X)) ~= X。该技术称为 CyclePrompt，使用循环一致性作为自由监督信号来迭代地制作提示。重要的是，CyclePrompt 增强了模型性能，无需昂贵的微调，无需训练数据，也无需复杂的外部环境（例如编译器、API）。我们在两个领域演示了 CyclePrompt：代码生成和图像字幕。我们在 HumanEval 编码基准测试中的结果使我们在不依赖额外训练数据或外部环境使用的模型中排名第一，总体排名第三。与 GPT4 基线相比，我们将准确率从 80.5% 提高到 87.2%。在视觉语言空间中，我们生成的详细图像字幕在针对自然 (VQAv2) 和图解 (FigureQA) 视觉问答基准测试时优于基线零样本 GPT4V 字幕。据我们所知，这是首次使用自我监督学习进行提示。

Title: Bayesian Strategic Classification

Authors: Lee Cohen, Saeed Sharifi-Malvajerdi, Kevin Stangl, Ali Vakilian, Juba Ziani
Subjects: cs.LG, cs.GT
Abstract URL: https://arxiv.org/abs/2402.08758
Pdf URL: https://arxiv.org/pdf/2402.08758
Copy Paste: [[2402.08758]] Bayesian Strategic Classification(https://arxiv.org/abs/2402.08758)
Keywords: agent
Abstract: In strategic classification, agents modify their features, at a cost, to ideally obtain a positive classification from the learner's classifier. The typical response of the learner is to carefully modify their classifier to be robust to such strategic behavior. When reasoning about agent manipulations, most papers that study strategic classification rely on the following strong assumption: agents fully know the exact parameters of the deployed classifier by the learner. This often is an unrealistic assumption when using complex or proprietary machine learning techniques in real-world prediction tasks. We initiate the study of partial information release by the learner in strategic classification. We move away from the traditional assumption that agents have full knowledge of the classifier. Instead, we consider agents that have a common distributional prior on which classifier the learner is using. The learner in our model can reveal truthful, yet not necessarily complete, information about the deployed classifier to the agents. The learner's goal is to release just enough information about the classifier to maximize accuracy. We show how such partial information release can, counter-intuitively, benefit the learner's accuracy, despite increasing agents' abilities to manipulate. We show that while it is intractable to compute the best response of an agent in the general case, there exist oracle-efficient algorithms that can solve the best response of the agents when the learner's hypothesis class is the class of linear classifiers, or when the agents' cost function satisfies a natural notion of submodularity as we define. We then turn our attention to the learner's optimization problem and provide both positive and negative results on the algorithmic problem of how much information the learner should release about the classifier to maximize their expected accuracy.
摘要：在策略分类中，代理会以一定的代价修改其特征，以理想地从学习者的分类器中获得积极的分类。学习者的典型反应是仔细修改他们的分类器，使其对这种策略行为具有鲁棒性。在推理代理操作时，大多数研究策略分类的论文都依赖于以下强有力的假设：代理完全知道学习者部署的分类器的确切参数。在现实世界的预测任务中使用复杂或专有的机器学习技术时，这通常是一个不切实际的假设。我们发起了策略分类中学习者部分信息发布的研究。我们放弃了代理完全了解分类器的传统假设。相反，我们认为代理在学习者使用的分类器上具有共同的分布先验。我们模型中的学习器可以向代理揭示有关已部署分类器的真实但不一定完整的信息。学习器的目标是发布有关分类器的足够信息以最大限度地提高准确性。我们展示了这种部分信息的发布如何与直觉相反地有利于学习者的准确性，尽管增加了代理的操纵能力。我们表明，虽然在一般情况下计算代理的最佳响应是很困难的，但是当学习者的假设类是线性分类器的类时，或者当代理的成本函数满足我们定义的子模块性的自然概念。然后，我们将注意力转向学习器的优化问题，并提供关于学习器应该发布多少有关分类器的信息以最大化其预期准确性的算法问题的正面和负面结果。

Title: JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

Authors: Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.08761
Pdf URL: https://arxiv.org/pdf/2402.08761
Copy Paste: [[2402.08761]] JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models(https://arxiv.org/abs/2402.08761)
Keywords: language model, gpt, llm
Abstract: The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to authorship obfuscation to address the unique challenges of authorship obfuscation: lack of supervision data for diverse authorship and domains, and the need for a sufficient level of revision beyond simple paraphrasing to obfuscate the authorship, all the while preserving the original content and fluency. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation that can be in principle applied to any text and authorship. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs, while also reducing the performance gap between small and large language models via algorithmic enhancement. The key idea behind our approach is to boost the creative power of smaller language models through constrained decoding, while also allowing for user-specified controls and flexibility. Experimental results demonstrate that our approach based on GPT2-XL outperforms previous state-of-the-art methods based on comparably small models, while performing competitively against GPT3.5 175B, a propriety model that is two orders of magnitudes larger.
摘要：在线内容的持久性与增强的作者身份识别技术相结合，需要更强大的计算方法来在需要时保护在线作者的身份和隐私，例如科学论文的盲审、匿名在线评论或心理健康论坛中的匿名互动。在本文中，我们提出了一种无监督推理时间方法来进行作者身份混淆，以解决作者身份混淆的独特挑战：缺乏不同作者身份和领域的监督数据，以及除了简单的释义之外还需要进行足够水平的修改来混淆作者身份，同时保留原始内容和流畅性。我们引入了 JAMDEC，这是一种用户控制的推理时间算法，用于作者身份混淆，原则上可以应用于任何文本和作者身份。我们的方法建立在 GPT2-XL 等小型语言模型的基础上，以帮助避免将原始内容泄露给专有的 LLM 的 API，同时还通过算法增强来缩小小型和大型语言模型之间的性能差距。我们的方法背后的关键思想是通过约束解码来提高较小语言模型的创造力，同时还允许用户指定的控制和灵活性。实验结果表明，我们基于 GPT2-XL 的方法优于以前基于相对较小模型的最先进方法，同时与 GPT3.5 175B（大两个数量级的专有模型）相比具有竞争力。

Title: Optimal Task Assignment and Path Planning using Conflict-Based Search with Precedence and Temporal Constraints

Authors: Yu Quan Chong, Jiaoyang Li, Katia Sycara
Subjects: cs.AI, cs.MA
Abstract URL: https://arxiv.org/abs/2402.08772
Pdf URL: https://arxiv.org/pdf/2402.08772
Copy Paste: [[2402.08772]] Optimal Task Assignment and Path Planning using Conflict-Based Search with Precedence and Temporal Constraints(https://arxiv.org/abs/2402.08772)
Keywords: agent
Abstract: The Multi-Agent Path Finding (MAPF) problem entails finding collision-free paths for a set of agents, guiding them from their start to goal locations. However, MAPF does not account for several practical task-related constraints. For example, agents may need to perform actions at goal locations with specific execution times, adhering to predetermined orders and timeframes. Moreover, goal assignments may not be predefined for agents, and the optimization objective may lack an explicit definition. To incorporate task assignment, path planning, and a user-defined objective into a coherent framework, this paper examines the Task Assignment and Path Finding with Precedence and Temporal Constraints (TAPF-PTC) problem. We augment Conflict-Based Search (CBS) to simultaneously generate task assignments and collision-free paths that adhere to precedence and temporal constraints, maximizing an objective quantified by the return from a user-defined reward function in reinforcement learning (RL). Experimentally, we demonstrate that our algorithm, CBS-TA-PTC, can solve highly challenging bomb-defusing tasks with precedence and temporal constraints efficiently relative to MARL and adapted Target Assignment and Path Finding (TAPF) methods.
摘要：多智能体路径查找 (MAPF) 问题需要为一组智能体找到无碰撞路径，引导它们从起始位置到目标位置。然而，MAPF 没有考虑一些与实际任务相关的约束。例如，代理可能需要在特定执行时间的目标位置执行操作，遵守预定的顺序和时间范围。此外，可能没有为代理预定义目标分配，并且优化目标可能缺乏明确的定义。为了将任务分配、路径规划和用户定义的目标合并到一个连贯的框架中，本文研究了具有优先级和时间约束的任务分配和路径查找 (TAPF-PTC) 问题。我们增强了基于冲突的搜索（CBS），以同时生成遵守优先级和时间约束的任务分配和无碰撞路径，从而最大化强化学习（RL）中用户定义的奖励函数的回报所量化的目标。通过实验，我们证明了我们的算法 CBS-TA-PTC 相对于 MARL 和适应的目标分配和路径查找 (TAPF) 方法，可以有效地解决具有优先级和时间约束的高度挑战性的炸弹拆除任务。

Title: Enhanced Deep Q-Learning for 2D Self-Driving Cars: Implementation and Evaluation on a Custom Track Environment

Authors: Sagar Pathak, Bidhya Shrestha, Kritish Pahi
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.08780
Pdf URL: https://arxiv.org/pdf/2402.08780
Copy Paste: [[2402.08780]] Enhanced Deep Q-Learning for 2D Self-Driving Cars: Implementation and Evaluation on a Custom Track Environment(https://arxiv.org/abs/2402.08780)
Keywords: agent
Abstract: This research project presents the implementation of a Deep Q-Learning Network (DQN) for a self-driving car on a 2-dimensional (2D) custom track, with the objective of enhancing the DQN network's performance. It encompasses the development of a custom driving environment using Pygame on a track surrounding the University of Memphis map, as well as the design and implementation of the DQN model. The algorithm utilizes data from 7 sensors installed in the car, which measure the distance between the car and the track. These sensors are positioned in front of the vehicle, spaced 20 degrees apart, enabling them to sense a wide area ahead. We successfully implemented the DQN and also a modified version of the DQN with a priority-based action selection mechanism, which we refer to as modified DQN. The model was trained over 1000 episodes, and the average reward received by the agent was found to be around 40, which is approximately 60% higher than the original DQN and around 50% higher than the vanilla neural network.
摘要：该研究项目介绍了在二维 (2D) 自定义赛道上自动驾驶汽车的深度 Q 学习网络 (DQN) 的实现，旨在增强 DQN 网络的性能。它包括在孟菲斯大学地图周围的赛道上使用 Pygame 开发自定义驾驶环境，以及 DQN 模型的设计和实现。该算法利用安装在汽车上的 7 个传感器的数据，这些传感器测量汽车与轨道之间的距离。这些传感器位于车辆前方，间隔 20 度，使它们能够感知前方的广阔区域。我们成功实现了 DQN 以及具有基于优先级的动作选择机制的 DQN 的修改版本，我们将其称为修改的 DQN。该模型经过 1000 多次训练，发现智能体收到的平均奖励约为 40，比原始 DQN 高出约 60%，比普通神经网络高出约 50%。

Title: InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Authors: Jianing Wang, Junda Wu, Yupeng Hou, Yao Liu, Ming Gao, Julian McAuley
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.08785
Pdf URL: https://arxiv.org/pdf/2402.08785
Copy Paste: [[2402.08785]] InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment(https://arxiv.org/abs/2402.08785)
Keywords: language model, gpt, llm, hallucination
Abstract: Do current large language models (LLMs) better solve graph reasoning and generation tasks with parameter updates? In this paper, we propose InstructGraph, a framework that empowers LLMs with the abilities of graph reasoning and generation by instruction tuning and preference alignment. Specifically, we first propose a structured format verbalizer to unify all graph data into a universal code-like format, which can simply represent the graph without any external graph-specific encoders. Furthermore, a graph instruction tuning stage is introduced to guide LLMs in solving graph reasoning and generation tasks. Finally, we identify potential hallucination problems in graph tasks and sample negative instances for preference alignment, the target of which is to enhance the output's reliability of the model. Extensive experiments across multiple graph-centric tasks exhibit that InstructGraph can achieve the best performance and outperform GPT-4 and LLaMA2 by more than 13\% and 38\%, respectively.
摘要：当前的大型语言模型（LLM）是否可以通过参数更新更好地解决图形推理和生成任务？在本文中，我们提出了 InstructGraph，这是一个框架，通过指令调整和偏好对齐，赋予 LLM 图形推理和生成的能力。具体来说，我们首先提出一种结构化格式语言器，将所有图数据统一为通用的类似代码的格式，该格式可以简单地表示图，而无需任何外部特定于图的编码器。此外，还引入了图形指令调整阶段来指导法学硕士解决图形推理和生成任务。最后，我们识别图任务中潜在的幻觉问题，并对负面实例进行采样以进行偏好对齐，其目标是增强模型输出的可靠性。跨多个以图为中心的任务的大量实验表明，InstructGraph 可以实现最佳性能，并且分别优于 GPT-4 和 LLaMA2 13% 和 38% 以上。

Title: Rethinking Machine Unlearning for Large Language Models

Authors: Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
Subjects: cs.LG, cs.CL
Abstract URL: https://arxiv.org/abs/2402.08787
Pdf URL: https://arxiv.org/pdf/2402.08787
Copy Paste: [[2402.08787]] Rethinking Machine Unlearning for Large Language Models(https://arxiv.org/abs/2402.08787)
Keywords: language model, llm
Abstract: We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning becoming a pivotal element in the life-cycle management of LLMs, potentially standing as an essential foundation for developing generative AI that is not only safe, secure, and trustworthy, but also resource-efficient without the need of full retraining. We navigate the unlearning landscape in LLMs from conceptual formulation, methodologies, metrics, and applications. In particular, we highlight the often-overlooked aspects of existing LLM unlearning research, e.g., unlearning scope, data-model interaction, and multifaceted efficacy assessment. We also draw connections between LLM unlearning and related areas such as model editing, influence functions, model explanation, adversarial training, and reinforcement learning. Furthermore, we outline an effective assessment framework for LLM unlearning and explore its applications in copyright and privacy safeguards and sociotechnical harm reduction.
摘要：我们在大型语言模型 (LLM) 领域探索机器取消学习 (MU)，称为 LLM 取消学习。该举措旨在消除不良数据影响（例如，敏感或非法信息）和相关模型功能，同时保持基本知识生成的完整性，并且不影响因果无关的信息。我们预计法学硕士的忘却学习将成为法学硕士生命周期管理的关键要素，有可能成为开发生成式人工智能的重要基础，这种人工智能不仅安全可靠、值得信赖，而且无需全面再培训即可实现资源高效利用。我们从概念表述、方法论、指标和应用中探索法学硕士中的遗忘景观。我们特别强调了现有法学硕士遗忘研究中经常被忽视的方面，例如遗忘范围、数据模型交互和多方面的效能评估。我们还建立了 LLM 反学习与模型编辑、影响函数、模型解释、对抗性训练和强化学习等相关领域之间的联系。此外，我们概述了法学硕士遗忘的有效评估框架，并探索其在版权和隐私保护以及减少社会技术危害方面的应用。

Title: Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy

Authors: Gioele Barabucci, Victor Shia, Eugene Chu, Benjamin Harack, Nathan Fu
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.08806
Pdf URL: https://arxiv.org/pdf/2402.08806
Copy Paste: [[2402.08806]] Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy(https://arxiv.org/abs/2402.08806)
Keywords: language model, gpt, llm
Abstract: Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications. Methods: Using collective intelligence methods and a dataset of 200 clinical vignettes of real-life cases, we assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs (OpenAI GPT-4, Google PaLM 2, Cohere Command, Meta Llama 2) against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs. Results: We find that aggregating responses from multiple, various LLMs leads to more accurate differential diagnoses (average accuracy for 3 LLMs: $75.3\%\pm 1.6pp$) compared to the differential diagnoses produced by single LLMs (average accuracy for single LLMs: $59.0\%\pm 6.1pp$). Discussion: The use of collective intelligence methods to synthesize differential diagnoses combining the responses of different LLMs achieves two of the necessary steps towards advancing acceptance of LLMs as a diagnostic support tool: (1) demonstrate high diagnostic accuracy and (2) eliminate dependence on a single commercial vendor.
摘要：背景：诸如 OpenAI 的 GPT-4 或 Google 的 PaLM 2 等大型语言模型 (LLM) 被提议作为可行的诊断支持工具，甚至被认为是“路边咨询”的替代品。然而，即使是受过医学主题专门培训的法学硕士也可能缺乏现实生活应用的足够诊断准确性。方法：使用集体智慧方法和 200 个现实案例的临床片段数据集，我们评估并比较了通过询问各个商业法学硕士（OpenAI GPT-4、Google PaLM 2、Cohere Command、Meta Llama 2）获得的鉴别诊断的准确性）与通过聚合相同法学硕士组合的反应而合成的鉴别诊断的准确性进行比较。结果：我们发现，与单个法学硕士产生的鉴别诊断（单个法学硕士的平均准确度： $59.0\%\pm 6.1pp$)。讨论：使用集体智慧方法结合不同法学硕士的反应来综合鉴别诊断，实现了推进法学硕士作为诊断支持工具的接受的两个必要步骤：（1）展示高诊断准确性和（2）消除对法学硕士的依赖。单一商业供应商。

Title: eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

Authors: Bo Peng, Xinyi Ling, Ziru Chen, Huan Sun, Xia Ning
Subjects: cs.CL, cs.AI, cs.IR
Abstract URL: https://arxiv.org/abs/2402.08831
Pdf URL: https://arxiv.org/pdf/2402.08831
Copy Paste: [[2402.08831]] eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data(https://arxiv.org/abs/2402.08831)
Keywords: language model, gpt, llm
Abstract: With tremendous efforts on developing effective e-commerce models, conventional e-commerce models show limited success in generalist e-commerce modeling, and suffer from unsatisfactory performance on new users and new products - a typical out-of-domain generalization challenge. Meanwhile, large language models (LLMs) demonstrate outstanding performance in generalist modeling and out-of-domain generalizability in many fields. Toward fully unleashing their power for e-commerce, in this paper, we construct ECInstruct, the first open-sourced, large-scale, and high-quality benchmark instruction dataset for e-commerce. Leveraging ECInstruct, we develop eCeLLM, a series of e-commerce LLMs, by instruction-tuning general-purpose LLMs. Our comprehensive experiments and evaluation demonstrate that eCeLLM models substantially outperform baseline models, including the most advanced GPT-4, and the state-of-the-art task-specific models in in-domain evaluation. Moreover, eCeLLM exhibits excellent generalizability to out-of-domain settings, including unseen products and unseen instructions, highlighting its superiority as a generalist e-commerce model. Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce. ECInstruct and eCeLLM models are publicly accessible through https://ninglab.github.io/eCeLLM.
摘要：尽管在开发有效的电子商务模型方面付出了巨大的努力，但传统的电子商务模型在通用电子商务建模方面取得的成功有限，并且在新用户和新产品方面的表现不尽如人意，这是典型的域外泛化挑战。同时，大型语言模型（LLM）在许多领域的通才建模和域外泛化性方面表现出了出色的性能。为了充分释放它们对电子商务的力量，在本文中，我们构建了第一个开源、大规模、高质量的电子商务基准指令数据集ECInstruct。利用 ECInstruct，我们通过指令调整通用 LLM 开发了 eCeLLM，这是一系列电子商务 LLM。我们全面的实验和评估表明，eCeLLM 模型的性能大大优于基线模型，包括最先进的 GPT-4 以及领域内评估中最先进的特定任务模型。此外，eCeLLM 对域外设置（包括未见过的产品和未见过的指令）表现出出色的泛化性，凸显了其作为通用电子商务模型的优越性。 ECInstruct 数据集和 eCeLLM 模型在为电子商务提供多功能且有效的法学硕士方面都显示出巨大的潜力。 ECInstruct 和 eCeLLM 模型可通过 https://ninglab.github.io/eCeLLM 公开访问。

Title: Intelligent Agricultural Management Considering N$_2$O Emission and Climate Variability with Uncertainties

Authors: Zhaoan Wang, Shaoping Xiao, Jun Wang, Ashwin Parab, Shivam Patel
Subjects: cs.LG, cs.AI, cs.CY
Abstract URL: https://arxiv.org/abs/2402.08832
Pdf URL: https://arxiv.org/pdf/2402.08832
Copy Paste: [[2402.08832]] Intelligent Agricultural Management Considering N$_2$O Emission and Climate Variability with Uncertainties(https://arxiv.org/abs/2402.08832)
Keywords: agent
Abstract: This study examines how artificial intelligence (AI), especially Reinforcement Learning (RL), can be used in farming to boost crop yields, fine-tune nitrogen use and watering, and reduce nitrate runoff and greenhouse gases, focusing on Nitrous Oxide (N$_2$O) emissions from soil. Facing climate change and limited agricultural knowledge, we use Partially Observable Markov Decision Processes (POMDPs) with a crop simulator to model AI agents' interactions with farming environments. We apply deep Q-learning with Recurrent Neural Network (RNN)-based Q networks for training agents on optimal actions. Also, we develop Machine Learning (ML) models to predict N$_2$O emissions, integrating these predictions into the simulator. Our research tackles uncertainties in N$_2$O emission estimates with a probabilistic ML approach and climate variability through a stochastic weather model, offering a range of emission outcomes to improve forecast reliability and decision-making. By incorporating climate change effects, we enhance agents' climate adaptability, aiming for resilient agricultural practices. Results show these agents can align crop productivity with environmental concerns by penalizing N$_2$O emissions, adapting effectively to climate shifts like warmer temperatures and less rain. This strategy improves farm management under climate change, highlighting AI's role in sustainable agriculture.
摘要：本研究探讨了人工智能 (AI)，尤其是强化学习 (RL) 如何在农业中应用，以提高作物产量、微调氮肥使用和浇水、减少硝酸盐径流和温室气体，重点关注一氧化二氮 (N$ _2$O) 土壤排放。面对气候变化和有限的农业知识，我们使用部分可观察马尔可夫决策过程（POMDP）和作物模拟器来模拟人工智能代理与农业环境的相互作用。我们应用深度 Q 学习和基于循环神经网络 (RNN) 的 Q 网络来训练代理的最佳动作。此外，我们还开发了机器学习 (ML) 模型来预测 N$_2$O 排放量，并将这些预测集成到模拟器中。我们的研究通过概率机器学习方法解决了 N$_2$O 排放估算的不确定性，并通过随机天气模型解决了气候变化，提供了一系列排放结果，以提高预测可靠性和决策。通过纳入气候变化的影响，我们增强了代理商的气候适应能力，旨在实现有弹性的农业实践。结果表明，这些药剂可以通过惩罚 N$_2$O 排放来使作物生产力与环境问题保持一致，有效地适应气候变化，如气温升高和降雨减少。该战略改善了气候变化下的农场管理，凸显了人工智能在可持续农业中的作用。

Title: Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues

Authors: Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F. Cohn, Malihe Alikhani
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.08837
Pdf URL: https://arxiv.org/pdf/2402.08837
Copy Paste: [[2402.08837]] Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues(https://arxiv.org/abs/2402.08837)
Keywords: agent
Abstract: Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to traditional caregiving methods. Crucial to these agents' effectiveness is their ability to simulate non-verbal behaviors, like backchannels, that are pivotal in establishing rapport and understanding in therapeutic contexts but remain under-explored. To improve the rapport-building capabilities of embodied agents we annotated backchannel smiles in videos of intimate face-to-face conversations over topics such as mental health, illness, and relationships. We hypothesized that both speaker and listener behaviors affect the duration and intensity of backchannel smiles. Using cues from speech prosody and language along with the demographics of the speaker and listener, we found them to contain significant predictors of the intensity of backchannel smiles. Based on our findings, we introduce backchannel smile production in embodied agents as a generation problem. Our attention-based generative model suggests that listener information offers performance improvements over the baseline speaker-centric generation approach. Conditioned generation using the significant predictors of smile intensity provides statistically significant improvements in empirical measures of generation quality. Our user study by transferring generated smiles to an embodied agent suggests that agent with backchannel smiles is perceived to be more human-like and is an attractive alternative for non-personal conversations over agent without backchannel smiles.
摘要：解决有效筛查、诊断和治疗的精神卫生资源严重短缺问题仍然是一项重大挑战。这种稀缺凸显了对创新解决方案的需求，特别是在提高治疗支持的可及性和有效性方面。具有先进交互能力的实体代理成为传统护理方法的一种有前途且具有成本效益的补充。这些代理的有效性的关键在于它们模拟非语言行为的能力，例如反向渠道，这对于在治疗环境中建立融洽和理解至关重要，但仍尚未得到充分探索。为了提高实体代理建立融洽关系的能力，我们在有关心理健康、疾病和人际关系等话题的亲密面对面对话的视频中注释了秘密微笑。我们假设说话者和听者的行为都会影响秘密微笑的持续时间和强度。利用语音韵律和语言的线索以及说话者和听众的人口统计数据，我们发现它们包含了隐藏微笑强度的重要预测因素。根据我们的发现，我们将隐身代理中的反向微笑产生作为生成问题引入。我们基于注意力的生成模型表明，听众信息比以说话者为中心的基线生成方法提供了性能改进。使用微笑强度的显着预测因子进行的条件生成在生成质量的经验测量方面提供了统计上显着的改进。我们通过将生成的微笑转移到具体代理进行的用户研究表明，具有反向微笑的代理被认为更像人类，并且对于非个人对话而言，与没有反向微笑的代理相比，是一个有吸引力的替代方案。

Title: An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
Subjects: cs.CL, cs.AI, cs.MM, cs.SD, eess.AS
Abstract URL: https://arxiv.org/abs/2402.08846
Pdf URL: https://arxiv.org/pdf/2402.08846
Copy Paste: [[2402.08846]] An Embarrassingly Simple Approach for LLM with Strong ASR Capacity(https://arxiv.org/abs/2402.08846)
Keywords: language model, llm
Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM. We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task. To be more specific, we benchmark and explore various combinations of LLMs and speech encoders, leading to the optimal LLM-based ASR system, which we call SLAM-ASR. The proposed SLAM-ASR provides a clean setup and little task-specific design, where only the linear projector is trained. To the best of our knowledge, SLAM-ASR achieves the best performance on the Librispeech benchmark among LLM-based ASR models and even outperforms the latest LLM-based audio-universal model trained on massive pair data. Finally, we explore the capability emergence of LLM-based ASR in the process of modal alignment. We hope that our study can facilitate the research on extending LLM with cross-modality capacity and shed light on the LLM-based ASR community.
摘要：在本文中，我们专注于利用语音基础编码器和大型语言模型（LLM）解决语音处理领域最重要的任务之一，即自动语音识别（ASR）。最近的工作具有复杂的设计，例如压缩语音编码器的输出时间、解决投影仪的模态对齐问题以及利用 LLM 的参数高效微调。我们发现，精致的设计是不必要的，而现成的语音编码器、LLM 和唯一可训练的线性投影仪的简单得令人尴尬的组合足以胜任 ASR 任务。更具体地说，我们对 LLM 和语音编码器的各种组合进行基准测试和探索，从而得到基于 LLM 的最佳 ASR 系统，我们称之为 SLAM-ASR。所提出的 SLAM-ASR 提供了一个干净的设置和很少的特定于任务的设计，其中仅训练线性投影仪。据我们所知，SLAM-ASR 在基于 LLM 的 ASR 模型中在 Librispeech 基准上实现了最佳性能，甚至优于最新的基于海量配对数据训练的基于 LLM 的音频通用模型。最后，我们探讨了基于LLM的ASR在模态对齐过程中的能力涌现。我们希望我们的研究能够促进以跨模态能力扩展法学硕士的研究，并为基于法学硕士的 ASR 社区带来光明。

Title: Large Language Model with Graph Convolution for Recommendation

Authors: Yingpeng Du, Ziyan Wang, Zhu Sun, Haoyan Chua, Hongzhi Liu, Zhonghai Wu, Yining Ma, Jie Zhang, Youchen Sun
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.08859
Pdf URL: https://arxiv.org/pdf/2402.08859
Copy Paste: [[2402.08859]] Large Language Model with Graph Convolution for Recommendation(https://arxiv.org/abs/2402.08859)
Keywords: language model, llm, hallucination, prompt
Abstract: In recent years, efforts have been made to use text information for better user profiling and item characterization in recommendations. However, text information can sometimes be of low quality, hindering its effectiveness for real-world applications. With knowledge and reasoning capabilities capsuled in Large Language Models (LLMs), utilizing LLMs emerges as a promising way for description improvement. However, existing ways of prompting LLMs with raw texts ignore structured knowledge of user-item interactions, which may lead to hallucination problems like inconsistent description generation. To this end, we propose a Graph-aware Convolutional LLM method to elicit LLMs to capture high-order relations in the user-item graph. To adapt text-based LLMs with structured graphs, We use the LLM as an aggregator in graph processing, allowing it to understand graph-based information step by step. Specifically, the LLM is required for description enhancement by exploring multi-hop neighbors layer by layer, thereby propagating information progressively in the graph. To enable LLMs to capture large-scale graph information, we break down the description task into smaller parts, which drastically reduces the context length of the token input with each step. Extensive experiments on three real-world datasets show that our method consistently outperforms state-of-the-art methods.
摘要：近年来，人们努力使用文本信息来更好地进行用户分析和推荐中的项目表征。然而，文本信息有时质量较低，阻碍了其在实际应用中的有效性。凭借大型语言模型 (LLM) 中包含的知识和推理能力，利用 LLM 成为描述改进的一种有前景的方法。然而，现有的用原始文本提示法学硕士的方法忽略了用户-项目交互的结构化知识，这可能会导致描述生成不一致等幻觉问题。为此，我们提出了一种图感知卷积 LLM 方法来引发 LLM 捕获用户-项目图中的高阶关系。为了使基于文本的法学硕士与结构化图相适应，我们使用法学硕士作为图处理中的聚合器，使其能够逐步理解基于图的信息。具体来说，LLM需要通过逐层探索多跳邻居来增强描述，从而在图中逐步传播信息。为了使 LLM 能够捕获大规模图形信息，我们将描述任务分解为更小的部分，这大大减少了每个步骤的标记输入的上下文长度。对三个现实世界数据集的广泛实验表明，我们的方法始终优于最先进的方法。

Title: Tree-Based Hard Attention with Self-Motivation for Large Language Models

Authors: Chenxi Lin, Jiayu Ren, Guoxiu He, Zhuoren Jiang, Haiyan Yu, Xiaomin Zhu
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.08874
Pdf URL: https://arxiv.org/pdf/2402.08874
Copy Paste: [[2402.08874]] Tree-Based Hard Attention with Self-Motivation for Large Language Models(https://arxiv.org/abs/2402.08874)
Keywords: language model, llm, prompt
Abstract: While large language models (LLMs) excel at understanding and generating plain text, they are not specifically tailored to handle hierarchical text structures. Extracting the task-desired property from their natural language responses typically necessitates additional processing steps. In fact, selectively comprehending the hierarchical structure of large-scale text is pivotal to understanding its substance. Aligning LLMs more closely with the classification or regression values of specific task through prompting also remains challenging. To this end, we propose a novel framework called Tree-Based Hard Attention with Self-Motivation for Large Language Models (TEAROOM). TEAROOM incorporates a tree-based hard attention mechanism for LLMs to process hierarchically structured text inputs. By leveraging prompting, it enables a frozen LLM to selectively focus on relevant leaves in relation to the root, generating a tailored symbolic representation of their relationship. Moreover, TEAROOM comprises a self-motivation strategy for another LLM equipped with a trainable adapter and a linear layer. The selected symbolic outcomes are integrated into another prompt, along with the predictive value of the task. We iteratively feed output values back into the prompt, enabling the trainable LLM to progressively approximate the golden truth. TEAROOM outperforms existing state-of-the-art methods in experimental evaluations across three benchmark datasets, showing its effectiveness in estimating task-specific properties. Through comprehensive experiments and analysis, we have validated the ability of TEAROOM to gradually approach the underlying golden truth through multiple inferences.
摘要：虽然大型语言模型 (LLM) 擅长理解和生成纯文本，但它们并不是专门为处理分层文本结构而定制的。从自然语言响应中提取任务所需的属性通常需要额外的处理步骤。事实上，有选择地理解大规模文本的层次结构对于理解其实质至关重要。通过提示将法学硕士与特定任务的分类或回归值更紧密地结合起来仍然具有挑战性。为此，我们提出了一种称为大型语言模型自我激励的基于树的硬注意力（TEAROOM）的新颖框架。 TEAROOM 结合了基于树的硬注意力机制，供法学硕士处理分层结构的文本输入。通过利用提示，它使冻结的法学硕士能够有选择地关注与根相关的相关叶子，生成它们关系的定制符号表示。此外，TEAROOM 还包括另一个配备可训练适配器和线性层的法学硕士的自我激励策略。选定的象征性结果与任务的预测价值一起集成到另一个提示中。我们迭代地将输出值反馈到提示中，使可训练的法学硕士能够逐步接近黄金真理。 TEAROOM 在三个基准数据集的实验评估中优于现有的最先进方法，显示了其在估计特定任务属性方面的有效性。通过全面的实验和分析，我们验证了TEAOOM通过多重推论逐步接近底层黄金真理的能力。

Title: The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Authors: Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia
Subjects: cs.LG, stat.ML
Abstract URL: https://arxiv.org/abs/2402.08922
Pdf URL: https://arxiv.org/pdf/2402.08922
Copy Paste: [[2402.08922]] The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes(https://arxiv.org/abs/2402.08922)
Keywords: language model
Abstract: Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
摘要：大型黑盒模型已在众多应用中变得无处不在。了解各个训练数据源对这些模型做出的预测的影响对于提高其可信度至关重要。当前的影响估计技术涉及计算每个训练点的梯度或对不同子集进行重复训练。当扩展到大型数据集和模型时，这些方法面临着明显的计算挑战。在本文中，我们介绍并探索了镜像影响假说，强调了训练数据和测试数据之间影响的相互性质。具体来说，它建议评估训练数据对测试预测的影响可以重新表述为一个等效但相反的问题：评估如果模型在特定测试样本上进行训练，训练样本的预测将如何改变。通过实证和理论验证，我们证明了我们的假设的广泛适用性。受此启发，我们引入了一种估计训练数据影响的新方法，该方法需要计算特定测试样本的梯度，并与每个训练点的前向传播配对。该方法可以利用并发检查的测试样本数量远小于训练数据集规模的场景中常见的不对称性，从而比现有方法获得显着的效率提升。我们展示了我们的方法在一系列场景中的适用性，包括扩散模型中的数据归因、数据泄漏检测、记忆分析、错误标记的数据检测和语言模型中的跟踪行为。我们的代码将在 https://github.com/ruoxi-jia-group/Forward-INF 提供。

Title: MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

Authors: Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang
Subjects: cs.CL, cs.AI, cs.LG, cs.RO
Abstract URL: https://arxiv.org/abs/2402.08925
Pdf URL: https://arxiv.org/pdf/2402.08925
Copy Paste: [[2402.08925]] MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences(https://arxiv.org/abs/2402.08925)
Keywords: language model, gpt
Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. To provide an equitable solution to the problem, we learn a mixture of preference distributions via an expectation-maximization algorithm and propose a MaxMin alignment objective for policy learning inspired by the Egalitarian principle in social choice theory to better represent diverse human preferences. We elucidate the connection of our proposed approach to distributionally robust optimization and general utility RL, thereby highlighting the generality and robustness of our proposed solution. We present comprehensive experimental results on small-scale (GPT-2) and large-scale language models (with Tulu2-7B) and show the efficacy of the proposed approach in the presence of diversity among human preferences. Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms and improves the win-rate (accuracy) for minority groups by over 33% without compromising the performance of majority groups, showcasing the robustness and fairness of our approach. We remark that our findings in this work are not only limited to language models but also extend to reinforcement learning in general.
摘要：人类反馈强化学习 (RLHF) 通过采用从偏好数据派生的单一奖励模型，使语言模型与人类偏好保持一致。然而，这种方法忽视了从多个用户收集的数据中固有的人类偏好的丰富多样性。在这项工作中，我们首先得出了与单一奖励 RLHF 对齐的不可能结果，从而强调了它在代表不同人类偏好方面的不足。为了为问题提供公平的解决方案，我们通过期望最大化算法学习偏好分布的混合，并受社会选择理论中平等原则的启发，提出政策学习的 MaxMin 对齐目标，以更好地代表不同的人类偏好。我们阐明了我们提出的方法与分布鲁棒优化和通用强化学习的联系，从而强调了我们提出的解决方案的通用性和鲁棒性。我们提出了小规模（GPT-2）和大规模语言模型（使用 Tulu2-7B）的综合实验结果，并展示了所提出的方法在人类偏好存在多样性的情况下的有效性。我们的算法比传统 RLHF 算法的胜率平均提高了 16% 以上，并且在不影响多数群体表现的情况下，将少数群体的胜率（准确性）提高了 33% 以上，展示了我们的算法的稳健性和公平性。方法。我们指出，我们在这项工作中的发现不仅限于语言模型，而且还扩展到一般的强化学习。

Title: Premise Order Matters in Reasoning with Large Language Models

Authors: Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou
Subjects: cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.08939
Pdf URL: https://arxiv.org/pdf/2402.08939
Copy Paste: [[2402.08939]] Premise Order Matters in Reasoning with Large Language Models(https://arxiv.org/abs/2402.08939)
Keywords: language model, llm, prompt
Abstract: Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model's accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark.
摘要：大型语言模型（LLM）在各个领域都取得了卓越的推理性能。然而，在推理任务领域，我们发现了一个弱点：法学硕士对于前提的排序非常脆弱，尽管事实上这种排序不会改变底层任务。特别是，我们观察到，当前提顺序与中间推理步骤所需的上下文一致时，法学硕士可以获得最佳性能。例如，在演绎推理任务中，以与提示中的真实事实证明相同的顺序呈现前提（而不是随机排序）可以大大提高模型的准确性。我们首先检查了各种 LLM 中前提顺序对演绎推理的影响，我们的评估表明，置换前提顺序可能会导致性能下降超过 30%。此外，我们发布了基于 GSM8K 的基准测试 R-GSM，以检查数学问题解决的排序效应，我们再次观察到相对于原始 GSM8K 基准测试，准确性显着下降。

Title: Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Authors: Martha Lewis, Melanie Mitchell
Subjects: cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.08955
Pdf URL: https://arxiv.org/pdf/2402.08955
Copy Paste: [[2402.08955]] Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models(https://arxiv.org/abs/2402.08955)
Keywords: language model, gpt, llm
Abstract: Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making abilities previously claimed for LLMs (Webb, Holyoak, & Lu, 2023). We take one set of analogy problems used to evaluate LLMs and create a set of "counterfactual" variants-versions that test the same abstract reasoning abilities but that are likely dissimilar from any pre-training data. We test humans and three GPT models on both the original and counterfactual problems, and show that, while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set. This work provides evidence that, despite previously reported successes of LLMs on analogical reasoning, these models lack the robustness and generality of human analogy-making.
摘要：大型语言模型 (LLM) 在多个推理基准测试中表现良好，包括测试类比推理能力的基准测试。然而，人们一直在争论它们是否实际上是在执行类人的抽象推理，还是采用依赖于与训练数据中所见相似性的不太通用的过程。在这里，我们研究了之前声称的法学硕士的类比能力的普遍性（Webb、Holyoak 和 Lu，2023）。我们采用一组类比问题来评估 LLM，并创建一组“反事实”变体版本，用于测试相同的抽象推理能力，但可能与任何预训练数据不同。我们在原始问题和反事实问题上测试了人类和三个 GPT 模型，结果表明，虽然人类在所有问题上的表现仍然很高，但 GPT 模型在反事实集上的表现却急剧下降。这项工作提供的证据表明，尽管之前报道过法学硕士在类比推理方面取得了成功，但这些模型缺乏人类类比的稳健性和普遍性。

Title: MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Authors: Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang
Subjects: cs.AI, cs.CL, cs.FL, cs.LG, cs.PL
Abstract URL: https://arxiv.org/abs/2402.08957
Pdf URL: https://arxiv.org/pdf/2402.08957
Copy Paste: [[2402.08957]] MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data(https://arxiv.org/abs/2402.08957)
Keywords: language model, llm, prompt, chain-of-thought
Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectiveness of intermediate steps guidance. However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. MUSTARD synthesizes data in three stages: (1) It samples a few mathematical concept seeds as the problem category. (2) Then, it prompts a generative language model with the sampled concepts to obtain both the problems and their step-wise formal solutions. (3) Lastly, the framework utilizes a proof assistant (e.g., Lean Prover) to filter the valid proofs. With the proposed MUSTARD, we present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. Each data point contains an informal statement, an informal proof, and a translated formal proof that passes the prover validation. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data. We further apply the MUSTARDSAUCE for fine-tuning smaller language models. The fine-tuned Llama 2-7B achieves a 15.41% average relative performance gain in automated theorem proving, and 8.18% in math word problems. Codes and data are available at https://github.com/Eleanor-H/MUSTARD.
摘要：最近的大型语言模型（LLM）在各种任务上取得了显着进步，包括数学推理和定理证明。由于这两项任务需要严格且正式的多步骤推理，因此它们是探索法学硕士推理能力的有吸引力的领域，但仍然面临着重要的挑战。先前的研究，例如思想链（CoT），已经揭示了中间步骤指导的有效性。然而，这种逐步注释需要大量的劳动，导致当前基准的训练步骤不足。为了填补这一空白，这项工作引入了 MUSTARD，一种数据生成框架，它掌握了高质量和多样性的定理和证明数据的统一合成。 MUSTARD 分三个阶段合成数据：（1）它采样一些数学概念种子作为问题类别。 (2)然后，它利用采样的概念生成生成语言模型，以获得问题及其逐步的形式化解决方案。 (3)最后，框架利用证明助手（例如Lean Prover）来过滤有效证明。通过提出的 MUSTARD，我们提出了一个具有 5,866 个有效数据点的定理和证明基准 MUSTARDSAUCE。每个数据点都包含一个非正式的陈述、一个非正式的证明和一个通过证明者验证的翻译后的正式证明。我们进行广泛的分析并证明 MUSTARD 生成经过验证的高质量分步数据。我们进一步应用 MUSTARDSAUCE 来微调较小的语言模型。经过微调的 Llama 2-7B 在自动定理证明中实现了 15.41% 的平均相对性能增益，在数学应用题中实现了 8.18% 的平均相对性能增益。代码和数据可在 https://github.com/Eleanor-H/MUSTARD 获取。

Title: Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Authors: Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.08958
Pdf URL: https://arxiv.org/pdf/2402.08958
Copy Paste: [[2402.08958]] Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers(https://arxiv.org/abs/2402.08958)
Keywords: language model
Abstract: With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.
摘要：随着生成式 AI 模型的复杂性不断增加，训练后量化 (PTQ) 已成为在移动设备和电视等边缘设备上部署超大规模模型的有前途的解决方案。然而，现有的 PTQ 方案消耗大量时间和资源，这在需要频繁模型更新和多个超参数调整的实际情况下可能成为瓶颈。作为一种经济高效的替代方案，一次性 PTQ 方案已被提出。尽管如此，性能还是受到了一定的限制，因为他们无法考虑注意力模块内的层间依赖关系，而这是 Transformer 的一个非常重要的特性。因此，在本文中，我们提出了一种平衡准确性和效率的新型 PTQ 算法。所提出的 aespa 算法的关键思想是逐层执行量化以提高效率，同时考虑跨层依赖性以保留注意力分数。通过对各种语言模型和复杂性分析的大量实验，我们证明了 aespa 在量化 Transformer 模型方面是准确且高效的。

Title: GrounDial: Human-norm Grounded Safe Dialog Response Generation

Authors: Siwon Kim, Shuyang Dai, Mohammad Kachuee, Shayan Ray, Tara Taghavi, Sungroh Yoon
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.08968
Pdf URL: https://arxiv.org/pdf/2402.08968
Copy Paste: [[2402.08968]] GrounDial: Human-norm Grounded Safe Dialog Response Generation(https://arxiv.org/abs/2402.08968)
Keywords: language model, llm
Abstract: Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.
摘要：目前基于大语言模型 (LLM) 的对话式人工智能系统已知会生成不安全的响应，同意攻击性的用户输入或包含有毒内容。之前的研究旨在通过手动注释的安全对话历史微调 LLM 来减轻毒性。然而，对额外调整的依赖需要大量成本。为了消除这种依赖性，我们提出了 GrounDial，其中响应安全是通过将响应基于常识性社会规则来实现的，而不需要进行微调。 GrounDial 的情境学习和人类规范引导解码的混合方法使响应在定量和定性上更加安全，即使无需额外的数据或调整。

Title: Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

Authors: Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu
Subjects: cs.LG, stat.ML
Abstract URL: https://arxiv.org/abs/2402.08998
Pdf URL: https://arxiv.org/pdf/2402.08998
Copy Paste: [[2402.08998]] Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path(https://arxiv.org/abs/2402.08998)
Keywords: agent
Abstract: We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes. Our regret upper bound matches the $\Omega(dB_*\sqrt{K})$ lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal.
摘要：我们研究具有线性混合转换内核的随机最短路径（SSP）问题，其中代理反复与随机环境交互，并寻求达到特定目标状态，同时最小化累积成本。现有的工作通常假设成本函数的严格正下限或最优策略的预期长度的上限。在本文中，我们提出了一种新算法来消除这些限制性假设。我们的算法基于具有细粒度方差感知置信集的扩展值迭代，其中方差是从高阶矩递归估计的。我们的算法实现了$\tilde{\mathcal O}(dB_*\sqrt{K})$后悔界限，其中$d$是线性转换内核中特征映射的维度，$B_*$是上限最优策略的总累积成本，$K$ 是事件数。我们的遗憾上限与 Min 等人中线性混合 SSP 的 $\Omega(dB_*\sqrt{K})$ 下限相匹配。（2022），这表明我们的算法几乎是极小极大最优。

Title: Multi-Query Focused Disaster Summarization via Instruction-Based Prompting

Authors: Philipp Seeberger, Korbinian Riedhammer
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.09008
Pdf URL: https://arxiv.org/pdf/2402.09008
Copy Paste: [[2402.09008]] Multi-Query Focused Disaster Summarization via Instruction-Based Prompting(https://arxiv.org/abs/2402.09008)
Keywords: language model, llm, prompt
Abstract: Automatic summarization of mass-emergency events plays a critical role in disaster management. The second edition of CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding with a focus on web sources such as Twitter, Reddit, Facebook, and Webnews. Here, participants are asked to develop systems that can extract key facts from several disaster-related events, which ultimately serve as a summary. This paper describes our method to tackle this challenging task. We follow previous work and propose to use a combination of retrieval, reranking, and an embarrassingly simple instruction-following summarization. The two-stage retrieval pipeline relies on BM25 and MonoT5, while the summarizer module is based on the open-source Large Language Model (LLM) LLaMA-13b. For summarization, we explore a Question Answering (QA)-motivated prompting approach and find the evidence useful for extracting query-relevant facts. The automatic metrics and human evaluation show strong results but also highlight the gap between open-source and proprietary systems.
摘要：群体性突发事件的自动总结在灾害管理中发挥着至关重要的作用。 CrisisFACTS 第二版旨在基于多流事实调查推进灾难总结，重点关注 Twitter、Reddit、Facebook 和 Webnews 等网络资源。在这里，参与者被要求开发一个系统，可以从几个与灾难相关的事件中提取关键事实，最终作为总结。本文描述了我们解决这一具有挑战性的任务的方法。我们遵循之前的工作，并建议结合使用检索、重新排序和极其简单的指令跟踪摘要。两阶段检索管道依赖于 BM25 和 MonoT5，而摘要模块基于开源大型语言模型 (LLM) LLaMA-13b。作为总结，我们探索了一种问答（QA）驱动的提示方法，并找到了对于提取查询相关事实有用的证据。自动指标和人工评估显示出强劲的结果，但也凸显了开源系统和专有系统之间的差距。

Title: Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

Authors: Negar Arabzadeh, Julia Kiseleva, Qingyun Wu, Chi Wang, Ahmed Awadallah, Victor Dibia, Adam Fourney, Charles Clarke
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09015
Pdf URL: https://arxiv.org/pdf/2402.09015
Copy Paste: [[2402.09015]] Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications(https://arxiv.org/abs/2402.09015)
Keywords: language model, llm, agent
Abstract: The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing whether LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the pressing need for methods to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval provides an implementation for the math problems}, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the robustness of quantifier's work.
摘要：大型语言模型（LLM）领域的快速发展导致了促进多个代理之间协作以协助人类完成日常任务的应用程序激增。然而，在评估 LLM 支持的应用程序是否真正增强用户体验和任务执行效率方面仍然存在重大差距。这凸显了对验证法学硕士应用程序实用性的方法的迫切需求，特别是通过确保应用程序的功能和最终用户需求之间的一致性。我们介绍 AgentEval 提供了数学问题的实现，这是一个新颖的框架，旨在通过自动提出一组针对任何给定应用程序的独特目的而定制的标准来简化实用程序验证过程。这样可以进行全面评估，根据建议的标准量化应用程序的效用。我们对量词工作的稳健性进行了全面分析。

Title: SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Authors: Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim
Subjects: cs.CL, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09025
Pdf URL: https://arxiv.org/pdf/2402.09025
Copy Paste: [[2402.09025]] SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks(https://arxiv.org/abs/2402.09025)
Keywords: language model, llm
Abstract: Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB successfully accelerates LLM inference without compromising the linguistic capabilities of these models, making it a promising technique for optimizing the efficiency of LLMs. The code is available at: https://github.com/leapingjagg-dev/SLEB
摘要：事实证明，大型语言模型 (LLM) 在各种自然语言处理任务中都非常有效。然而，它们的大量参数给实际部署带来了重大挑战。修剪是一种旨在减少 LLM 的规模和复杂性的技术，通过从网络中删除冗余组件来提供潜在的解决方案。尽管有修剪的希望，但现有方法通常难以实现大幅的端到端 LLM 推理加速。在本文中，我们介绍了 SLEB，这是一种旨在通过消除冗余变压器块来简化 LLM 的新颖方法。我们选择 Transformer 块作为剪枝的基本单元，因为 LLM 表现出块级冗余，相邻块的输出之间具有高度相似性。这一选择使我们能够有效提升LLM的处理速度。我们的实验结果表明，SLEB 成功地加速了 LLM 推理，而不会影响这些模型的语言能力，使其成为优化 LLM 效率的有前景的技术。代码位于：https://github.com/leapingjagg-dev/SLEB

Title: FGeo-TP: A Language Model-Enhanced Solver for Geometry Problems

Authors: Yiming He, Jia Zou, Xiaokai Zhang, Na Zhu, Tuo Leng
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09047
Pdf URL: https://arxiv.org/pdf/2402.09047
Copy Paste: [[2402.09047]] FGeo-TP: A Language Model-Enhanced Solver for Geometry Problems(https://arxiv.org/abs/2402.09047)
Keywords: language model
Abstract: The application of contemporary artificial intelligence techniques to address geometric problems and automated deductive proof has always been a grand challenge to the interdiscipline field of mathematics and artificial Intelligence. This is the fourth article in a series of our works, in our previous work, we established of a geometric formalized system known as FormalGeo. Moreover we annotated approximately 7000 geometric problems, forming the FormalGeo7k dataset. Despite the FGPS (Formal Geometry Problem Solver) can achieve interpretable algebraic equation solving and human-like deductive reasoning, it often experiences timeouts due to the complexity of the search strategy. In this paper, we introduced FGeo-TP (Theorem Predictor), which utilizes the language model to predict theorem sequences for solving geometry problems. We compared the effectiveness of various Transformer architectures, such as BART or T5, in theorem prediction, implementing pruning in the search process of FGPS, thereby improving its performance in solving geometry problems. Our results demonstrate a significant increase in the problem-solving rate of the language model-enhanced FGeo-TP on the FormalGeo7k dataset, rising from 39.7% to 80.86%. Furthermore, FGeo-TP exhibits notable reductions in solving time and search steps across problems of varying difficulty levels.
摘要：应用当代人工智能技术解决几何问题和自动演绎证明一直是数学与人工智能交叉学科领域的巨大挑战。这是我们系列工作中的第四篇文章，在我们之前的工作中，我们建立了一个称为 FormalGeo 的几何形式化系统。此外，我们注释了大约 7000 个几何问题，形成了 FormalGeo7k 数据集。尽管FGPS（形式几何问题求解器）可以实现可解释的代数方程求解和类似人类的演绎推理，但由于搜索策略的复杂性，它经常遇到超时。在本文中，我们介绍了FGeo-TP（定理预测器），它利用语言模型来预测解决几何问题的定理序列。我们比较了各种 Transformer 架构（例如 BART 或 T5）在定理预测中的有效性，在 FGPS 的搜索过程中实现剪枝，从而提高其解决几何问题的性能。我们的结果表明，语言模型增强的 FGeo-TP 在 FormalGeo7k 数据集上的问题解决率显着提高，从 39.7% 上升到 80.86%。此外，FGeo-TP 显着减少了不同难度级别问题的解决时间和搜索步骤。

Title: FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning

Authors: Jia Zou, Xiaokai Zhang, Yiming He, Na Zhu, Tuo Leng
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09051
Pdf URL: https://arxiv.org/pdf/2402.09051
Copy Paste: [[2402.09051]] FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning(https://arxiv.org/abs/2402.09051)
Keywords: language model, agent
Abstract: The human-like automatic deductive reasoning has always been one of the most challenging open problems in the interdiscipline of mathematics and artificial intelligence. This paper is the third in a series of our works. We built a neural-symbolic system, called FGeoDRL, to automatically perform human-like geometric deductive reasoning. The neural part is an AI agent based on reinforcement learning, capable of autonomously learning problem-solving methods from the feedback of a formalized environment, without the need for human supervision. It leverages a pre-trained natural language model to establish a policy network for theorem selection and employ Monte Carlo Tree Search for heuristic exploration. The symbolic part is a reinforcement learning environment based on geometry formalization theory and FormalGeo\cite{FormalGeo}, which models GPS as a Markov Decision Process\cite{MDP}. In this formal symbolic system, the known conditions and objectives of the problem form the state space, while the set of theorems forms the action space. Leveraging FGeoDRL, we have achieved readable and verifiable automated solutions to geometric problems. Experiments conducted on the formalgeo7k dataset have achieved a problem-solving success rate of 86.40\%. The project is available at https://github.com/PersonNoName/FGeoDRL.
摘要：类人自动演绎推理一直是数学与人工智能交叉学科中最具挑战性的开放问题之一。本文是我们系列作品中的第三篇。我们构建了一个名为 FGeoDRL 的神经符号系统，可以自动执行类似人类的几何演绎推理。神经部分是基于强化学习的人工智能代理，能够从形式化环境的反馈中自主学习解决问题的方法，而不需要人类监督。它利用预先训练的自然语言模型来建立用于定理选择的策略网络，并采用蒙特卡洛树搜索进行启发式探索。符号部分是基于几何形式化理论和 FormalGeo\cite{FormalGeo} 的强化学习环境，它将 GPS 建模为马尔可夫决策过程\cite{MDP}。在这个形式符号系统中，问题的已知条件和目标形成状态空间，而定理集形成动作空间。利用 FGeoDRL，我们已经实现了几何问题的可读且可验证的自动化解决方案。在formalgeo7k数据集上进行的实验取得了86.40\%的问题解决成功率。该项目位于 https://github.com/PersonNoName/FGeoDRL。

Title: L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Authors: Yutaro Yamada, Khyathi Chandu, Yuchen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09052
Pdf URL: https://arxiv.org/pdf/2402.09052
Copy Paste: [[2402.09052]] L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects(https://arxiv.org/abs/2402.09052)
Keywords: language model, gpt, agent
Abstract: Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation.
摘要：基于扩散的图像生成模型（例如 DALL-E 3 和 Stable Diffusion-XL）展示了生成具有真实且独特构图的图像的卓越能力。然而，这些模型在精确推理物体的物理和空间配置方面并不稳健，特别是当接受非常规的、不符合分布的描述时，例如“一把五条腿的椅子”。在本文中，我们提出了一种具有 3D 思想链 (L3GO) 的语言代理，这是一种推理时间方法，可以推理当前数据驱动扩散模型难以解决的非常规对象的基于部分的 3D 网格生成。更具体地说，我们使用大型语言模型作为代理，在 3D 模拟环境中通过反复试验来组成所需的对象。为了便于我们的调查，我们开发了一个新的基准，非常规可行对象 (UFO)，以及 SimpleBlenv，这是一个构建在 Blender 之上的包装器环境，语言代理可以在其中通过 API 调用构建和组合原子构建块。人工和自动 GPT-4V 评估表明，我们的方法超越了标准 GPT-4 和其他语言代理（例如 ReAct 和 Reflexion），用于在 ShapeNet 上生成 3D 网格。此外，在我们的 UFO 基准测试中，我们的方法优于其他基于人类评估的最先进的文本到 2D 图像和文本到 3D 模型。

Title: Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Authors: Leo Schwinn, David Dobre, Sophie Xhonneux, Gauthier Gidel, Stephan Gunnemann
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.09063
Pdf URL: https://arxiv.org/pdf/2402.09063
Copy Paste: [[2402.09063]] Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space(https://arxiv.org/abs/2402.09063)
Keywords: llm, prompt
Abstract: Current research in adversarial robustness of LLMs focuses on discrete input manipulations in the natural language space, which can be directly transferred to closed-source models. However, this approach neglects the steady progression of open-source models. As open-source models advance in capability, ensuring their safety also becomes increasingly imperative. Yet, attacks tailored to open-source LLMs that exploit full model access remain largely unexplored. We address this research gap and propose the embedding space attack, which directly attacks the continuous embedding representation of input tokens. We find that embedding space attacks circumvent model alignments and trigger harmful behaviors more efficiently than discrete attacks or model fine-tuning. Furthermore, we present a novel threat model in the context of unlearning and show that embedding space attacks can extract supposedly deleted information from unlearned LLMs across multiple datasets and models. Our findings highlight embedding space attacks as an important threat model in open-source LLMs. Trigger Warning: the appendix contains LLM-generated text with violence and harassment.
摘要：目前法学硕士对抗鲁棒性的研究重点是自然语言空间中的离散输入操作，这些操作可以直接转移到闭源模型中。然而，这种方法忽视了开源模型的稳步发展。随着开源模型功能的进步，确保其安全性也变得越来越重要。然而，针对利用完整模型访问权限的开源法学硕士量身定制的攻击在很大程度上仍未被探索。我们解决了这一研究空白，并提出了嵌入空间攻击，它直接攻击输入标记的连续嵌入表示。我们发现嵌入空间攻击比离散攻击或模型微调更有效地规避模型对齐并触发有害行为。此外，我们在遗忘的背景下提出了一种新颖的威胁模型，并表明嵌入空间攻击可以从跨多个数据集和模型的未学习的 LLM 中提取所谓已删除的信息。我们的研究结果强调嵌入空间攻击是开源法学硕士中的一个重要威胁模型。触发警告：附录包含法学硕士生成的带有暴力和骚扰的文本。

Title: Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective

Authors: Xiongye Xiao, Chenyu Zhou, Heng Ping, Defu Cao, Yaxing Li, Yizhuo Zhou, Shixuan Li, Paul Bogdan
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09099
Pdf URL: https://arxiv.org/pdf/2402.09099
Copy Paste: [[2402.09099]] Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective(https://arxiv.org/abs/2402.09099)
Keywords: language model, llm
Abstract: Prior studies on the emergence in large models have primarily focused on how the functional capabilities of large language models (LLMs) scale with model size. Our research, however, transcends this traditional paradigm, aiming to deepen our understanding of the emergence within LLMs by placing a special emphasis not just on the model size but more significantly on the complex behavior of neuron interactions during the training process. By introducing the concepts of "self-organization" and "multifractal analysis," we explore how neuron interactions dynamically evolve during training, leading to "emergence," mirroring the phenomenon in natural systems where simple micro-level interactions give rise to complex macro-level behaviors. To quantitatively analyze the continuously evolving interactions among neurons in large models during training, we propose the Neuron-based Multifractal Analysis (NeuroMFA). Utilizing NeuroMFA, we conduct a comprehensive examination of the emergent behavior in LLMs through the lens of both model size and training process, paving new avenues for research into the emergence in large models.
摘要：先前关于大型模型出现的研究主要集中在大型语言模型（LLM）的功能如何随着模型大小而扩展。然而，我们的研究超越了这种传统范式，旨在通过特别强调模型大小，更重要的是训练过程中神经元相互作用的复杂行为，加深我们对法学硕士的出现的理解。通过引入“自组织”和“多重分形分析”的概念，我们探索神经元相互作用如何在训练过程中动态演化，从而导致“出现”，反映了自然系统中简单的微观相互作用产生复杂的宏观相互作用的现象。水平的行为。为了定量分析训练期间大型模型中神经元之间不断演变的相互作用，我们提出了基于神经元的多重分形分析（NeuroMFA）。利用 NeuroMFA，我们从模型大小和训练过程的角度对法学硕士的涌现行为进行了全面检查，为研究大型模型的涌现铺平了新途径。

Title: Exploring the Adversarial Capabilities of Large Language Models

Authors: Lukas Struppek, Minh Hieu Le, Dominik Hintersdorf, Kristian Kersting
Subjects: cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09132
Pdf URL: https://arxiv.org/pdf/2402.09132
Copy Paste: [[2402.09132]] Exploring the Adversarial Capabilities of Large Language Models(https://arxiv.org/abs/2402.09132)
Keywords: language model, llm
Abstract: The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into the security and privacy issues of LLMs, the extent to which these models can exhibit adversarial behavior remains largely unexplored. Addressing this gap, we investigate whether common publicly available LLMs have inherent capabilities to perturb text samples to fool safety measures, so-called adversarial examples resp.~attacks. More specifically, we investigate whether LLMs are inherently able to craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems. Our findings carry significant implications for (semi-)autonomous systems relying on LLMs, highlighting potential challenges in their interaction with existing systems and safety measures.
摘要：大型语言模型（LLM）的激增因其强大的语言生成能力而引起了广泛的兴趣，为工业和研究提供了巨大的潜力。虽然之前的研究深入研究了法学硕士的安全和隐私问题，但这些模型在多大程度上表现出对抗行为在很大程度上仍未得到探索。为了解决这一差距，我们调查了常见的公开的法学硕士是否具有扰乱文本样本以欺骗安全措施（即所谓的对抗性示例或攻击）的固有能力。更具体地说，我们调查法学硕士是否天生能够从良性样本中制作出对抗性样本来欺骗现有的安全轨道。我们的实验专注于仇恨言论检测，表明法学硕士成功地发现了对抗性扰动，有效地破坏了仇恨言论检测系统。我们的研究结果对依赖法学硕士的（半）自主系统具有重大影响，强调了它们与现有系统和安全措施交互的潜在挑战。

Title: DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Authors: Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09136
Pdf URL: https://arxiv.org/pdf/2402.09136
Copy Paste: [[2402.09136]] DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning(https://arxiv.org/abs/2402.09136)
Keywords: language model, llm
Abstract: Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.
摘要：代码大型语言模型（Code LLM）在代码相关任务中表现出了出色的性能。已经提出了几种指令调整方法来提高预训练代码 LLM 的代码生成性能。在本文中，我们介绍了一种具有自我评估代码生成功能的多样化指令模型（DolphCoder）。它学习不同的指令目标并结合代码评估目标来增强其代码生成能力。我们的模型在 HumanEval 和 MBPP 基准测试中实现了卓越的性能，为未来的代码指令调优工作展示了新的见解。我们的主要发现是：（1）通过不同的推理路径增强更多样化的响应可以提高法学硕士的代码能力。 (2) 提高评估代码解决方案正确性的能力也提高了他们创建代码解决方案的能力。

Title: Into the Unknown: Self-Learning Large Language Models

Authors: Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09147
Pdf URL: https://arxiv.org/pdf/2402.09147
Copy Paste: [[2402.09147]] Into the Unknown: Self-Learning Large Language Models(https://arxiv.org/abs/2402.09147)
Keywords: language model, llm, hallucination
Abstract: We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through self-assessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in The Unknown (PiUs), along with one extrinsic and three intrinsic methods for automatic PiUs identification. It facilitates the creation of a self-learning loop that focuses exclusively on the knowledge gap in Points in The Unknown, resulting in a reduced hallucination score. We also developed evaluation metrics for gauging an LLM's self-learning capability. Our experiments revealed that 7B-Mistral models that have been finetuned or aligned are capable of self-learning considerably well. Our self-learning concept allows more efficient LLM updates and opens new perspectives for knowledge exchange. It may also increase public trust in AI.
摘要：我们解决自学法学硕士的主要问题：学什么的问题。我们提出了一个自学法学硕士框架，使法学硕士能够通过自我评估自己的幻觉来独立学习以前未知的知识。利用幻觉评分，我们引入了未知点 (PiU) 的新概念，以及自动识别 PiU 的一种外在方法和三种内在方法。它有助于创建一个自学循环，专门关注“未知点”中的知识差距，从而降低幻觉分数。我们还制定了衡量法学硕士自学能力的评估指标。我们的实验表明，经过微调或对齐的 7B-Mistral 模型具有相当好的自学习能力。我们的自学理念使法学硕士更新更加高效，并为知识交流开辟了新的视角。它还可能增加公众对人工智能的信任。

Title: Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis

Authors: Wei Zhai, Hongzhi Qi, Qing Zhao, Jianqiang Li, Ziqi Wang, Han Wang, Bing Xiang Yang, Guanghui Fu
Subjects: cs.CL, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09151
Pdf URL: https://arxiv.org/pdf/2402.09151
Copy Paste: [[2402.09151]] Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis(https://arxiv.org/abs/2402.09151)
Keywords: language model
Abstract: In the current environment, psychological issues are prevalent and widespread, with social media serving as a key outlet for individuals to share their feelings. This results in the generation of vast quantities of data daily, where negative emotions have the potential to precipitate crisis situations. There is a recognized need for models capable of efficient analysis. While pre-trained language models have demonstrated their effectiveness broadly, there's a noticeable gap in pre-trained models tailored for specialized domains like psychology. To address this, we have collected a huge dataset from Chinese social media platforms and enriched it with publicly available datasets to create a comprehensive database encompassing 3.36 million text entries. To enhance the model's applicability to psychological text analysis, we integrated psychological lexicons into the pre-training masking mechanism. Building on an existing Chinese language model, we performed adaptive training to develop a model specialized for the psychological domain. We assessed our model's effectiveness across four public benchmarks, where it not only surpassed the performance of standard pre-trained models but also showed a inclination for making psychologically relevant predictions. Due to concerns regarding data privacy, the dataset will not be made publicly available. However, we have made the pre-trained models and codes publicly accessible to the community via: https://github.com/zwzzzQAQ/Chinese-MentalBERT.
摘要：在当前环境下，心理问题普遍存在，社交媒体成为个人分享感受的重要渠道。这导致每天产生大量数据，其中负面情绪有可能引发危机局势。人们普遍认识到需要能够进行有效分析的模型。虽然预训练的语言模型已经广泛证明了其有效性，但针对心理学等专业领域定制的预训练模型还存在明显的差距。为了解决这个问题，我们从中国社交媒体平台收集了庞大的数据集，并利用公开的数据集对其进行了丰富，以创建一个包含 336 万条文本条目的综合数据库。为了增强模型对心理文本分析的适用性，我们将心理词典集成到预训练掩蔽机制中。在现有的中文语言模型的基础上，我们进行了适应性训练，开发了专门针对心理领域的模型。我们在四个公共基准中评估了我们的模型的有效性，它不仅超越了标准预训练模型的性能，而且还表现出了做出心理相关预测的倾向。出于对数据隐私的考虑，该数据集不会公开。但是，我们已通过以下方式向社区公开访问预训练的模型和代码：https://github.com/zwzzzQAQ/Chinese-MentalBERT。

Title: Attacking Large Language Models with Projected Gradient Descent

Authors: Simon Geisler, Tom Wollschläger, M. H. I. Abdalla, Johannes Gasteiger, Stephan Günnemann
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.09154
Pdf URL: https://arxiv.org/pdf/2402.09154
Copy Paste: [[2402.09154]] Attacking Large Language Models with Projected Gradient Descent(https://arxiv.org/abs/2402.09154)
Keywords: language model, llm, prompt
Abstract: Current LLM alignment methods are readily broken through specifically crafted adversarial prompts. While crafting adversarial prompts using discrete optimization is highly effective, such attacks typically use more than 100,000 LLM calls. This high computational cost makes them unsuitable for, e.g., quantitative analyses and adversarial training. To remedy this, we revisit Projected Gradient Descent (PGD) on the continuously relaxed input prompt. Although previous attempts with ordinary gradient-based attacks largely failed, we show that carefully controlling the error introduced by the continuous relaxation tremendously boosts their efficacy. Our PGD for LLMs is up to one order of magnitude faster than state-of-the-art discrete optimization to achieve the same devastating attack results.
摘要：当前的法学硕士对齐方法很容易通过专门设计的对抗性提示而被突破。虽然使用离散优化来制作对抗性提示非常有效，但此类攻击通常使用超过 100,000 个 LLM 调用。这种高计算成本使它们不适合定量分析和对抗性训练等。为了解决这个问题，我们在连续放松的输入提示上重新考虑了投影梯度下降（PGD）。尽管之前基于普通梯度的攻击的尝试基本上失败了，但我们表明，仔细控制连续松弛引入的误差可以极大地提高其功效。我们的法学硕士 PGD 比最先进的离散优化快一个数量级，以实现相同的破坏性攻击结果。

Title: Role-Playing Simulation Games using ChatGPT

Authors: Rita Stampfl, Igor Ivkić, Barbara Geyer
Subjects: cs.AI, cs.HC
Abstract URL: https://arxiv.org/abs/2402.09161
Pdf URL: https://arxiv.org/pdf/2402.09161
Copy Paste: [[2402.09161]] Role-Playing Simulation Games using ChatGPT(https://arxiv.org/abs/2402.09161)
Keywords: language model, gpt, llm, chat
Abstract: Since the COVID-19 pandemic, educational institutions have embarked on digital transformation projects. The success of these projects depends on integrating new technologies and understanding the needs of digitally literate students. The "learning by doing" approach suggests that real success in learning new skills is achieved when students can try out and practise these skills. In this article, we demonstrate how Large Language Models (LLMs) can enhance the quality of teaching by using ChatGPT in a role-playing simulation game scenario to promote active learning. Moreover, we discuss how LLMs can boost students' interest in learning by allowing them to practice real-life scenarios using ChatGPT.
摘要：自 COVID-19 大流行以来，教育机构开始实施数字化转型项目。这些项目的成功取决于整合新技术和了解具有数字素养的学生的需求。 “边做边学”的方法表明，当学生能够尝试和练习这些技能时，学习新技能才能取得真正的成功。在本文中，我们展示了大型语言模型 (LLM) 如何通过在角色扮演模拟游戏场景中使用 ChatGPT 来促进主动学习，从而提高教学质量。此外，我们还讨论了法学硕士如何通过使用 ChatGPT 练习现实生活场景来提高学生的学习兴趣。

Title: Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

Authors: Yixin Cheng, Markos Georgopoulos, Volkan Cevher, Grigorios G. Chrysos
Subjects: cs.LG, cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.09177
Pdf URL: https://arxiv.org/pdf/2402.09177
Copy Paste: [[2402.09177]] Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks(https://arxiv.org/abs/2402.09177)
Keywords: language model, llm
Abstract: Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which aim to extract harmful information by subtly modifying the attack query. As defense mechanisms evolve, directly obtaining harmful information becomes increasingly challenging for Jailbreaking attacks. In this work, inspired by human practices of indirect context to elicit harmful information, we focus on a new attack form called Contextual Interaction Attack. The idea relies on the autoregressive nature of the generation process in LLMs. We contend that the prior context--the information preceding the attack query--plays a pivotal role in enabling potent Jailbreaking attacks. Specifically, we propose an approach that leverages preliminary question-answer pairs to interact with the LLM. By doing so, we guide the responses of the model toward revealing the 'desired' harmful information. We conduct experiments on four different LLMs and demonstrate the efficacy of this attack, which is black-box and can also transfer across LLMs. We believe this can lead to further developments and understanding of the context vector in LLMs.
摘要：大型语言模型 (LLM) 很容易受到越狱攻击，这种攻击旨在通过巧妙地修改攻击查询来提取有害信息。随着防御机制的发展，直接获取有害信息对于越狱攻击来说变得越来越困难。在这项工作中，受到人类通过间接上下文引出有害信息的做法的启发，我们重点研究一种称为上下文交互攻击的新攻击形式。这个想法依赖于法学硕士生成过程的自回归性质。我们认为，先前的上下文（攻击查询之前的信息）在启用有效的越狱攻击方面发挥着关键作用。具体来说，我们提出了一种利用初步问答对与法学硕士互动的方法。通过这样做，我们引导模型的反应揭示“所需的”有害信息。我们在四个不同的 LLM 上进行了实验，并证明了这种攻击的有效性，该攻击是黑盒的，也可以跨 LLM 转移。我们相信这可以促进法学硕士上下文向量的进一步发展和理解。

Title: (Ir)rationality and Cognitive Biases in Large Language Models

Authors: Olivia Macmillan-Scott, Mirco Musolesi
Subjects: cs.CL, cs.AI, cs.HC
Abstract URL: https://arxiv.org/abs/2402.09193
Pdf URL: https://arxiv.org/pdf/2402.09193
Copy Paste: [[2402.09193]] (Ir)rationality and Cognitive Biases in Large Language Models(https://arxiv.org/abs/2402.09193)
Keywords: language model, llm
Abstract: Do large language models (LLMs) display rational reasoning? LLMs have been shown to contain human biases due to the data they have been trained on; whether this is reflected in rational reasoning remains less clear. In this paper, we answer this question by evaluating seven language models using tasks from the cognitive psychology literature. We find that, like humans, LLMs display irrationality in these tasks. However, the way this irrationality is displayed does not reflect that shown by humans. When incorrect answers are given by LLMs to these tasks, they are often incorrect in ways that differ from human-like biases. On top of this, the LLMs reveal an additional layer of irrationality in the significant inconsistency of the responses. Aside from the experimental results, this paper seeks to make a methodological contribution by showing how we can assess and compare different capabilities of these types of models, in this case with respect to rational reasoning.
摘要：大型语言模型 (LLM) 是否能表现出理性推理？法学硕士已被证明因接受培训的数据而存在人为偏见；这是否反映在理性推理中尚不清楚。在本文中，我们通过使用认知心理学文献中的任务评估七种语言模型来回答这个问题。我们发现，像人类一样，法学硕士在这些任务中表现出非理性。然而，这种非理性的表现方式并不反映人类的表现方式。当法学硕士对这些任务给出错误答案时，他们的错误方式通常与人类的偏见不同。除此之外，法学硕士在回答的严重不一致中揭示了另一层不合理性。除了实验结果之外，本文还试图通过展示我们如何评估和比较这些类型模型的不同能力（在本例中涉及理性推理）来做出方法论贡献。

Title: Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

Authors: Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang
Subjects: cs.CL, cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09199
Pdf URL: https://arxiv.org/pdf/2402.09199
Copy Paste: [[2402.09199]] Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling(https://arxiv.org/abs/2402.09199)
Keywords: language model, llm
Abstract: With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
摘要：随着大语言模型（LLM）应用的迅速增加，其滥用引发了假新闻、学术不诚实、信息污染等许多不良社会问题。这使得人工智能生成文本（AIGT）检测变得非常重要。在现有的方法中，白盒方法在性能和通用性方面通常优于黑盒方法，但它们需要访问LLM的内部状态并且不适用于黑盒设置。在本文中，我们建议通过多次重采样将单词生成概率估计为伪白盒特征，以帮助改进黑盒设置下的 AIGT 检测。具体来说，我们设计了 POGER，一种代理引导的高效重采样方法，它选择一小部分代表性单词（例如 10 个单词）来在黑盒 AIGT 检测中执行多重重采样。对包含来自人类和七个法学硕士的文本的数据集的实验表明，POGER 在黑盒、部分白盒和分布外设置下优于宏 F1 中的所有基线，并保持比现有同行更低的重采样成本。

Title: Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

Authors: Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Yankai Lin, Zhong Zhang, Zhiyuan Liu, Maosong Sun
Subjects: cs.CL, cs.AI, cs.HC
Abstract URL: https://arxiv.org/abs/2402.09205
Pdf URL: https://arxiv.org/pdf/2402.09205
Copy Paste: [[2402.09205]] Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents(https://arxiv.org/abs/2402.09205)
Keywords: language model, agent
Abstract: Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
摘要：当前的语言模型驱动的代理通常缺乏有效的用户参与机制，考虑到用户指令中常见的模糊性，这一点至关重要。尽管擅长制定策略和执行任务，但这些代理在寻求澄清和掌握准确的用户意图方面遇到了困难。为了弥补这一差距，我们引入了交互意图（IN3），这是一种新颖的基准，旨在通过显式查询检查用户的隐式意图。接下来，我们建议将模型专家纳入代理设计的上游，以增强用户与代理的交互。使用 IN3，我们根据经验训练 Mistral-Interact，这是一个强大的模型，可以主动评估任务模糊性，询问用户意图，并在开始下游代理任务执行之前将其细化为可操作的目标。将其集成到 XAgent 框架中，我们全面评估了有关用户指令理解和执行的增强代理系统，表明我们的方法在识别模糊用户任务、恢复和总结关键缺失信息、设置精确且必要的代理执行目标以及最小化冗余工具的使用，从而提高整体效率。所有数据和代码均已公开。

Title: Scaling the Authoring of AutoTutors with Large Language Models

Authors: Sankalan Pal Chowdhury, Vilém Zouhar, Mrinmaya Sachan
Subjects: cs.CL, cs.HC
Abstract URL: https://arxiv.org/abs/2402.09216
Pdf URL: https://arxiv.org/pdf/2402.09216
Copy Paste: [[2402.09216]] Scaling the Authoring of AutoTutors with Large Language Models(https://arxiv.org/abs/2402.09216)
Keywords: language model, gpt, llm
Abstract: Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject experts, the overall pedagogical design still needs to be handcrafted for the best learning results. Based on this principle, we create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer. This approach retains the structure and the pedagogy of traditional tutoring systems that has been developed over the years by learning scientists but brings in additional flexibility of LLM-based approaches. Through a human evaluation study on two datasets based on math word problems, we show that our hybrid approach achieves a better overall tutoring score than an instructed, but otherwise free-form, GPT-4. MWPTutor is completely modular and opens up the scope for the community to improve its performance by improving individual modules or using different teaching strategies that it can follow
摘要：大型语言模型 (LLM) 在教育领域发现了多个用例，从自动问题生成到论文评估。在本文中，我们探讨了使用大型语言模型 (LLM) 编写智能辅导系统的潜力。法学硕士的一个常见陷阱是他们偏离了期望的教学策略，例如向学生泄露答案，并且通常不提供任何保证。我们认为，虽然具有一定护栏的法学硕士可以取代学科专家，但整体教学设计仍然需要手工设计才能获得最佳学习效果。基于这一原理，我们创建了一个名为 MWPTutor 的示例端到端辅导系统，它使用 LLM 来填充预定义的有限状态传感器的状态空间。这种方法保留了学习科学家多年来开发的传统辅导系统的结构和教学法，但带来了基于法学硕士的方法的额外灵活性。通过对基于数学应用题的两个数据集进行人类评估研究，我们表明，我们的混合方法比受指导但自由形式的 GPT-4 取得了更好的整体辅导分数。 MWPTutor 是完全模块化的，为社区提供了通过改进各个模块或使用可以遵循的不同教学策略来提高其绩效的空间

Title: Spectral Filters, Dark Signals, and Attention Sinks

Authors: Nicola Cancedda
Subjects: cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.09221
Pdf URL: https://arxiv.org/pdf/2402.09221
Copy Paste: [[2402.09221]] Spectral Filters, Dark Signals, and Attention Sinks(https://arxiv.org/abs/2402.09221)
Keywords: llm
Abstract: Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.
摘要：将中间表示投影到词汇表上是基于 Transformer 的 LLM 越来越流行的解释工具，也称为 Logit Lens。我们提出了对该方法的定量扩展，并基于将词汇嵌入和非嵌入矩阵的奇异向量划分为频带来定义中间表示的谱滤波器。我们发现频谱尾部交换的信号导致了注意力下沉（Xiao et al. 2023），我们对此提供了解释。我们发现，只要保留注意力下沉，尽管以依赖于层的方式抑制了嵌入频谱的相当大的部分，预训练模型的损失仍然可以保持在较低水平。最后，我们发现吸引许多令牌注意的令牌表示在频谱的尾端有很大的投影。

Title: Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar
Subjects: cs.LG, cs.AI, math.ST, stat.ML
Abstract URL: https://arxiv.org/abs/2402.09236
Pdf URL: https://arxiv.org/pdf/2402.09236
Copy Paste: [[2402.09236]] Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models(https://arxiv.org/abs/2402.09236)
Keywords: language model
Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.
摘要：要构建智能机器学习系统，有两种主要方法。一种方法是建立本质上可解释的模型，正如不断发展的因果表示学习领域所努力的那样。另一种方法是构建高性能的基础模型，然后投入精力了解它们的工作原理。在这项工作中，我们将这两种方法联系起来，并研究如何从数据中学习人类可解释的概念。将两个领域的想法交织在一起，我们正式定义了概念的概念，并证明它们可以从不同的数据中恢复。对合成数据和大型语言模型的实验表明了我们统一方法的实用性。

Title: Switch EMA: A Free Lunch for Better Flatness and Sharpness

Authors: Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li
Subjects: cs.LG, cs.CV
Abstract URL: https://arxiv.org/abs/2402.09240
Pdf URL: https://arxiv.org/pdf/2402.09240
Copy Paste: [[2402.09240]] Switch EMA: A Free Lunch for Better Flatness and Sharpness(https://arxiv.org/abs/2402.09240)
Keywords: language model
Abstract: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of modification, i.e., switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA). From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness. To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling. Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.
摘要：指数移动平均 (EMA) 是一种广泛使用的权重平均 (WA) 正则化，用于学习平坦最优值以实现更好的泛化，而无需在深度神经网络 (DNN) 优化中产生额外成本。尽管实现了更好的平坦度，现有的 WA 方法可能会陷入更差的最终性能或需要额外的测试时间计算。这项工作通过单行修改揭示了 EMA 的全部潜力，即在每个 epoch 后将 EMA 参数切换到原始模型，称为 Switch EMA (SEMA)。从理论和实证方面，我们证明 SEMA 可以帮助 DNN 达到泛化最优，从而更好地在平坦度和清晰度之间进行权衡。为了验证SEMA的有效性，我们在视觉和语言数据集上进行了判别、生成和回归任务的比较实验，包括图像分类、自监督学习、对象检测和分割、图像生成、视频预测、属性回归和语言造型。流行优化器和网络的综合结果表明，SEMA 通过提高性能和提高收敛速度，成为 DNN 训练的免费午餐。

Title: SyntaxShap: Syntax-aware Explainability Method for Text Generation

Authors: Kenza Amara, Rita Sevastjanova, Mennatallah El-Assady
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09259
Pdf URL: https://arxiv.org/pdf/2402.09259
Copy Paste: [[2402.09259]] SyntaxShap: Syntax-aware Explainability Method for Text Generation(https://arxiv.org/abs/2402.09259)
Keywords: language model
Abstract: To harness the power of large language models in safety-critical domains we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, complexity, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful, coherent, and interpretable explanations for predictions by autoregressive models.
摘要：为了在安全关键领域利用大型语言模型的力量，我们需要确保其预测的可解释性。然而，尽管人们非常关注模型的可解释性，但在使用针对文本数据定制的方法来解释序列到序列任务方面仍然存在未开发的领域。本文介绍了 SyntaxShap，这是一种与模型无关的本地可解释性文本生成方法，该方法考虑了文本数据中的语法。所提出的工作扩展了 Shapley 值以考虑基于解析的语法依赖性。采用博弈论方法，SyntaxShap 仅考虑依赖树的联盟约束。我们采用基于模型的评估，将 SyntaxShap 及其加权形式与适用于文本生成任务的最先进的可解释性方法进行比较，使用不同的指标，包括模型解释的忠实性、复杂性、一致性和语义对齐。我们表明，我们的语法感知方法产生的解释有助于为自回归模型的预测建立更忠实、连贯和可解释的解释。

Title: Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09267
Pdf URL: https://arxiv.org/pdf/2402.09267
Copy Paste: [[2402.09267]] Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation(https://arxiv.org/abs/2402.09267)
Keywords: language model, llm, hallucination, prompt
Abstract: Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
摘要：尽管大型语言模型（LLM）表现出越来越像人类的能力，但它们经常与事实不准确（即“幻觉”）作斗争，即使它们拥有相关知识。为了解决这些幻觉，当前的方法通常需要高质量的人类事实注释。在这项工作中，我们探索了事实性的自我调整，利用法学硕士的自我评估能力来提供训练信号，引导模型走向事实性。具体来说，我们纳入了自我评估（一种自我评估组件），以促使法学硕士仅根据其内部知识来验证其自己生成的回答的真实性。此外，我们设计了自我知识调整（SK-Tuning），通过改进模型的置信度估计和校准来增强法学硕士的自我评估能力。然后，我们利用这些自注释响应通过直接偏好优化算法微调模型。我们表明，所提出的自对准方法在 TruthfulQA 和 BioGEN 的三个关键知识密集型任务中大大提高了 Llama 家族模型的事实准确性。

Title: Personalized Large Language Models

Authors: Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, Jan Kocoń
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09269
Pdf URL: https://arxiv.org/pdf/2402.09269
Copy Paste: [[2402.09269]] Personalized Large Language Models(https://arxiv.org/abs/2402.09269)
Keywords: language model, llm, chat
Abstract: Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models. Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods across different LLM architectures. These findings underscore the importance of personalization for enhancing LLM capabilities in subjective text perception tasks.
摘要：近年来，大型语言模型 (LLM) 显着推进了自然语言处理 (NLP) 任务。然而，它们的通用性在需要个性化响应的场景中造成了限制，例如推荐系统和聊天机器人。本文研究了个性化法学硕士的方法，比较了主观任务上的微调和零样本推理方法。结果表明，与非个性化模型相比，个性化微调可以改善模型推理。对情感识别和仇恨语音检测数据集的实验表明，不同法学硕士架构中的个性化方法具有一致的性能增益。这些发现强调了个性化对于增强法学硕士在主观文本感知任务中的能力的重要性。

Title: Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies

Authors: Yining Huang
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.09282
Pdf URL: https://arxiv.org/pdf/2402.09282
Copy Paste: [[2402.09282]] Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies(https://arxiv.org/abs/2402.09282)
Keywords: language model, gpt, llm, hallucination, prompt
Abstract: The integration of Large Language Models (LLMs) like GPT-4 into traditional Natural Language Processing (NLP) tasks has opened new avenues for enhancing model performance while reducing the reliance on extensive human annotations. This paper presents a novel approach that leverages the Chain of Thought (CoT) prompting technique to distill knowledge from GPT-4, subsequently applying it to improve the efficiency and effectiveness of a smaller model, BERT, on Named Entity Recognition (NER) tasks. Our method involves a two-phase training process: initially employing GPT-4 annotated data for pre-training and then refining the model with a combination of distilled and original human-annotated data. The results demonstrate that our mixed-training strategy significantly outperforms models trained solely on human annotations, achieving superior F1-scores and showcasing a cost-effective solution for resource-limited or closed-network settings. The study also discusses the challenges encountered, such as LLM output variability and the tendency towards hallucinations, proposing future work directions to enhance prompt design and annotation selection. Our findings indicate a promising synergy between LLM insights and traditional NLP techniques, paving the way for more accessible and robust NLP applications.
摘要：将 GPT-4 等大型语言模型 (LLM) 集成到传统自然语言处理 (NLP) 任务中，为增强模型性能开辟了新途径，同时减少了对大量人工注释的依赖。本文提出了一种新颖的方法，利用思想链 (CoT) 提示技术从 GPT-4 中提取知识，随后将其应用于提高较小模型 BERT 在命名实体识别 (NER) 任务上的效率和有效性。我们的方法涉及两个阶段的训练过程：首先使用 GPT-4 带注释的数据进行预训练，然后结合提取的数据和原始的人工注释数据来完善模型。结果表明，我们的混合训练策略显着优于仅基于人工注释训练的模型，实现了优异的 F1 分数，并为资源有限或封闭网络设置展示了一种经济高效的解决方案。该研究还讨论了所遇到的挑战，例如法学硕士输出的可变性和幻觉的倾向，提出了未来的工作方向，以加强提示设计和注释选择。我们的研究结果表明，LLM 见解与传统 NLP 技术之间存在着良好的协同作用，为更易于访问和更强大的 NLP 应用铺平了道路。

Title: Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Authors: Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.09283
Pdf URL: https://arxiv.org/pdf/2402.09283
Copy Paste: [[2402.09283]] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey(https://arxiv.org/abs/2402.09283)
Keywords: language model, llm
Abstract: Large Language Models (LLMs) are now commonplace in conversation applications. However, their risks of misuse for generating harmful responses have raised serious societal concerns and spurred recent research on LLM conversation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and evaluations. Our goal is to provide a structured summary that enhances understanding of LLM conversation safety and encourages further investigation into this important subject. For easy reference, we have categorized all the studies mentioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM-conversation-safety.
摘要：大型语言模型 (LLM) 现在在对话应用程序中很常见。然而，它们被滥用而产生有害反应的风险引起了严重的社会关注，并刺激了最近对法学硕士对话安全性的研究。因此，在本次调查中，我们对近期研究进行了全面概述，涵盖了 LLM 对话安全的三个关键方面：攻击、防御和评估。我们的目标是提供结构化摘要，以增强对 LLM 对话安全性的理解，并鼓励对这一重要主题进行进一步调查。为了便于参考，我们根据我们的分类法对本次调查中提到的所有研究进行了分类，可访问：https://github.com/niconi19/LLM-conversation-safety。

Title: ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Authors: Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09320
Pdf URL: https://arxiv.org/pdf/2402.09320
Copy Paste: [[2402.09320]] ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization(https://arxiv.org/abs/2402.09320)
Keywords: language model, llm
Abstract: Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to ensure the generation of safe content. Due to the heavy cost associated with fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM decoding with external auxiliary methods. However, these methods do not essentially enhance the LLM itself. In this paper, we rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after In-context Learning (ICL). Accordingly, we propose a novel approach called In-Context Direct Preference Optimization (ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with ICL, generating well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded scorer, both offering benefits. Extensive experiments show its effectiveness, particularly in outperforming two fine-tuning-free baselines, and it exhibits competitiveness with SFT + LoRA. We also conduct detailed analyses to offer comprehensive insights into ICDPO.
摘要：大型语言模型 (LLM) 依靠人类偏好对齐 (HPA) 来确保安全内容的生成。由于与微调相关的高昂成本，出现了免微调方法，通常使用外部辅助方法修改LLM解码。然而，这些方法并不能从本质上增强法学硕士本身。在本文中，我们重新思考了 DPO 的推导过程，在此基础上，我们利用 LLM 在上下文学习（ICL）前后的状态反过来构建了即时评分器。因此，我们提出了一种称为上下文直接偏好优化（ICDPO）的新方法。它使法学硕士能够借用具有 ICL 的高级法学硕士的 HPA 功能，生成上述即时评分器估计的一致响应，从而提高最终表现。 ICDPO 可以通过两级检索器和升级的记分器进一步增强，两者都有好处。大量的实验证明了它的有效性，特别是在优于两个免微调基线方面，并且它与 SFT + LoRA 相比具有竞争力。我们还进行详细分析，以提供对 ICDPO 的全面见解。

Title: AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach

Authors: Maryam Amirizaniani, Tanya Roosta, Aman Chadha, Chirag Shah
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09334
Pdf URL: https://arxiv.org/pdf/2402.09334
Copy Paste: [[2402.09334]] AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach(https://arxiv.org/abs/2402.09334)
Keywords: language model, llm, hallucination
Abstract: As Large Language Models (LLMs) gain wider adoption in various contexts, it becomes crucial to ensure they are reasonably safe, consistent, and reliable for an application at hand. This may require probing or auditing them. Probing LLMs with varied iterations of a single question could reveal potential inconsistencies in their knowledge or functionality. However, a tool for performing such audits with simple workflow and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to evaluate the performance of various LLMs in a methodical way. AuditLLM's core functionality lies in its ability to test a given LLM by auditing it using multiple probes generated from a single question, thereby identifying any inconsistencies in the model's understanding or operation. A reasonably robust, reliable, and consistent LLM should output semantically similar responses for a question asked differently or by different people. Based on this assumption, AuditLLM produces easily interpretable results regarding the LLM's consistencies from a single question that the user enters. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.
摘要：随着大型语言模型 (LLM) 在各种环境中得到更广泛的采用，确保它们对于手头的应用程序来说相当安全、一致和可靠变得至关重要。这可能需要探测或审核它们。通过对单个问题进行不同的迭代来探究法学硕士可能会揭示其知识或功能中潜在的不一致之处。然而，目前缺乏一种工作流程简单、技术门槛低的审计工具。在此演示中，我们介绍“AuditLLM”，这是一种新颖的工具，旨在以有条不紊的方式评估各种法学硕士的表现。 AuditLLM 的核心功能在于它能够通过使用单个问题生成的多个探针对其进行审核来测试给定的 LLM，从而识别模型理解或操作中的任何不一致之处。一个相当稳健、可靠和一致的法学硕士应该针对不同的问题或不同的人提出的问题输出语义相似的答案。基于此假设，AuditLLM 通过用户输入的单个问题生成有关 LLM 一致性的易于解释的结果。一定程度的不一致已被证明是潜在偏见、幻觉和其他问题的指标。然后，人们可以使用 AuditLLM 的输出来进一步调查上述 LLM 的问题。为了便于演示和实际使用，AuditLLM提供了两种关键模式：（1）实时模式，允许通过分析对实时查询的响应来即时审计LLM； (2) 批处理模式，通过一次处理多个查询进行深入分析，促进全面的 LLM 审核。该工具对研究人员和普通用户都有好处，因为它增强了我们对法学硕士使用标准化审核平台生成回复的能力的理解。

Title: Mitigating Reward Hacking via Information-Theoretic Reward Modeling

Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09345
Pdf URL: https://arxiv.org/pdf/2402.09345
Copy Paste: [[2402.09345]] Mitigating Reward Hacking via Information-Theoretic Reward Modeling(https://arxiv.org/abs/2402.09345)
Keywords: language model
Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge, which primarily stems from limitations in reward modeling, i.e., generalizability of the reward model and inconsistency in the preference dataset. In this work, we tackle this problem from an information theoretic-perspective, and propose a generalizable and robust framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information and developing a mechanism for model complexity modulation. Notably, we further identify a correlation between overoptimization and outliers in the latent space, establishing InfoRM as a promising tool for detecting reward overoptimization. Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies. Extensive experiments on a wide range of settings and model scales (70M, 440M, 1.4B, and 7B) support the effectiveness of InfoRM. Further analyses reveal that InfoRM's overoptimization detection mechanism is effective, potentially signifying a notable advancement in the field of RLHF. Code will be released upon acceptance.
摘要：尽管基于人类反馈的强化学习（RLHF）在使语言模型与人类价值观保持一致方面取得了成功，但奖励黑客攻击（也称为奖励过度优化）仍然是一个严峻的挑战，这主要源于奖励建模的局限性，即奖励模型的泛化性和偏好数据集中的不一致。在这项工作中，我们从信息论的角度解决了这个问题，并通过引入变分信息瓶颈目标来过滤掉不相关信息并开发模型复杂性调制机制，提出了一种可泛化且鲁棒的奖励建模框架，即 InfoRM。值得注意的是，我们进一步确定了潜在空间中过度优化和异常值之间的相关性，将 InfoRM 确立为检测奖励过度优化的有前途的工具。受这一发现的启发，我们提出了集成集群偏差分数（ICDS），它量化了潜在空间中的偏差，作为奖励过度优化的指标，以促进在线缓解策略的开发。在各种设置和模型规模（70M、440M、1.4B 和 7B）上进行的大量实验支持了 InfoRM 的有效性。进一步分析表明，InfoRM 的过度优化检测机制是有效的，这可能标志着 RLHF 领域的显着进步。代码将在接受后发布。

Title: Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop

Authors: Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, Chirag Shah
Subjects: cs.AI
Abstract URL: https://arxiv.org/abs/2402.09346
Pdf URL: https://arxiv.org/pdf/2402.09346
Copy Paste: [[2402.09346]] Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop(https://arxiv.org/abs/2402.09346)
Keywords: language model, llm, hallucination, prompt
Abstract: As LLMs become more pervasive across various users and scenarios, identifying potential issues when using these models becomes essential. Examples include bias, inconsistencies, and hallucination. Although auditing the LLM for these problems is desirable, it is far from being easy or solved. An effective method is to probe the LLM using different versions of the same question. This could expose inconsistencies in its knowledge or operation, indicating potential for bias or hallucination. However, to operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically. In this paper we propose an automatic and scalable solution, where one uses a different LLM along with human-in-the-loop. This approach offers verifiability and transparency, while avoiding circular reliance on the same LLMs, and increasing scientific rigor and generalizability. Specifically, we present a novel methodology with two phases of verification using humans: standardized evaluation criteria to verify responses, and a structured prompt template to generate desired probes. Experiments on a set of questions from TruthfulQA dataset show that we can generate a reliable set of probes from one LLM that can be used to audit inconsistencies in a different LLM. The criteria for generating and applying auditing probes is generalizable to various LLMs regardless of the underlying structure or training mechanism.
摘要：随着法学硕士在各种用户和场景中变得越来越普遍，识别使用这些模型时的潜在问题变得至关重要。例子包括偏见、不一致和幻觉。尽管针对这些问题对法学硕士进行审核是可取的，但这远非易事或解决。一种有效的方法是使用同一问题的不同版本来探讨 LLM。这可能会暴露其知识或操作的不一致，表明存在偏见或幻觉的可能性。然而，为了大规模实施这种审计方法，我们需要一种方法来可靠且自动地创建这些探针。在本文中，我们提出了一种自动且可扩展的解决方案，其中使用不同的法学硕士以及人机交互。这种方法提供了可验证性和透明度，同时避免了对相同法学硕士的循环依赖，并提高了科学严谨性和普遍性。具体来说，我们提出了一种新颖的方法，其中使用人类进行两个阶段的验证：用于验证响应的标准化评估标准，以及用于生成所需探针的结构化提示模板。对 TruthfulQA 数据集中的一组问题进行的实验表明，我们可以从一个 LLM 生成一组可靠的探针，可用于审核不同 LLM 中的不一致情况。生成和应用审计探针的标准可推广到各种法学硕士，无论底层结构或培训机制如何。

Title: Integrating ChatGPT into Secure Hospital Networks: A Case Study on Improving Radiology Report Analysis

Authors: Kyungsu Kim, Junhyun Park, Saul Langarica, Adham Mahmoud Alkhadrawi, Synho Do
Subjects: cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09358
Pdf URL: https://arxiv.org/pdf/2402.09358
Copy Paste: [[2402.09358]] Integrating ChatGPT into Secure Hospital Networks: A Case Study on Improving Radiology Report Analysis(https://arxiv.org/abs/2402.09358)
Keywords: gpt, chat
Abstract: This study demonstrates the first in-hospital adaptation of a cloud-based AI, similar to ChatGPT, into a secure model for analyzing radiology reports, prioritizing patient data privacy. By employing a unique sentence-level knowledge distillation method through contrastive learning, we achieve over 95% accuracy in detecting anomalies. The model also accurately flags uncertainties in its predictions, enhancing its reliability and interpretability for physicians with certainty indicators. These advancements represent significant progress in developing secure and efficient AI tools for healthcare, suggesting a promising future for in-hospital AI applications with minimal supervision.
摘要：这项研究展示了基于云的人工智能（类似于 ChatGPT）首次在医院内适应于分析放射学报告的安全模型，优先考虑患者数据隐私。通过对比学习，采用独特的句子级知识蒸馏方法，我们在异常检测方面实现了 95% 以上的准确率。该模型还准确地标记了其预测中的不确定性，通过确定性指标增强了医生的可靠性和可解释性。这些进步代表了在开发安全高效的医疗保健人工智能工具方面取得的重大进展，预示着在最少监管的情况下医院内人工智能应用的前景光明。

Title: HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Authors: Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09360
Pdf URL: https://arxiv.org/pdf/2402.09360
Copy Paste: [[2402.09360]] HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference(https://arxiv.org/abs/2402.09360)
Keywords: language model, llm
Abstract: Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache. On the other hand, recent works show that LLMs can maintain quality with significant sparsity/redundancy in the feedforward (FFN) layers by appropriately training the model to operate on a top-$k$ fraction of rows/columns (where $k \approx 0.05$), there by suggesting a way to reduce the transfer of model parameters, and hence latency. However, exploiting this sparsity for improving latency is hindered by the fact that identifying top rows/columns is data-dependent and is usually performed using full matrix operations, severely limiting potential gains. To address these issues, we introduce HiRE (High Recall Approximate Top-k Estimation). HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47\times$ on a single TPUv5e device.
摘要：在加速器 (GPU/TPU) 上使用生成式大语言模型 (LLM) 进行自回归解码通常受内存限制，其中大部分时间都花在将模型参数从高带宽内存 (HBM) 传输到缓存上。另一方面，最近的工作表明，LLM 可以通过适当地训练模型在行/列的 top-$k$ 部分上运行（其中 $k \approx 0.05$），提出了一种减少模型参数传输的方法，从而减少延迟。然而，利用这种稀疏性来改善延迟会受到以下事实的阻碍：识别顶部行/列是数据相关的，并且通常使用全矩阵运算来执行，从而严重限制了潜在的收益。为了解决这些问题，我们引入了 HiRE（高召回率近似 Top-k 估计）。 HiRE 包含两个新颖的组件：(i) 一种压缩方案，用于以较低的成本预测具有高召回率的 top-$k$ 行/列，然后进行仅限于预测子集的完整计算，以及 (ii) DA-TOP-$k$：高效的多设备近似 top-$k$ 运算符。我们证明，在 10 亿个参数模型上，HiRE 应用于 softmax 和前馈层，实现了几乎匹配的预训练和下游精度，并在单个 TPUv5e 设备上将推理延迟加快了 1.47 美元\倍$。

Title: Copyright Traps for Large Language Models

Authors: Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye
Subjects: cs.CL, cs.CR
Abstract URL: https://arxiv.org/abs/2402.09363
Pdf URL: https://arxiv.org/pdf/2402.09363
Copy Paste: [[2402.09363]] Copyright Traps for Large Language Models(https://arxiv.org/abs/2402.09363)
Keywords: language model, llm
Abstract: Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being very actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize a lot, we hypothesize--and later confirm--that they will not work against models that do not naturally memorize, e.g. medium-size 1B models. We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. We carefully design an experimental setup, randomly inserting traps into original content (books) and train a 1.3B LLM. We first validate that the use of content in our target model would be undetectable using existing methods. We then show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods. However, we show that longer sequences repeated a large number of times can be reliably detected (AUC=0.75) and used as copyright traps. We further improve these results by studying how the number of times a sequence is seen improves detectability, how sequences with higher perplexity tend to be memorized more, and how taking context into account further improves detectability.
摘要：关于公平使用受版权保护的内容来训练大型语言模型 (LLM) 的问题正在引起非常激烈的争论。文档级推理已被提出作为一项新任务：通过对训练模型的黑盒访问来推断在训练期间是否已看到一段内容。然而，SOTA 方法依赖于自然发生的（部分）内容记忆。虽然对于记忆大量的模型非常有效，但我们假设并随后证实它们不会对不自然记忆的模型起作用，例如中型 1B 型号。我们在这里建议使用版权陷阱，即在原始内容中包含虚构条目，来检测法学硕士中受版权保护的材料的使用，重点关注不会自然发生记忆的模型。我们精心设计了一个实验设置，在原始内容（书籍）中随机插入陷阱并训练 1.3B LLM。我们首先验证使用现有方法无法检测到目标模型中内容的使用。然后，我们发现，与直觉相反，即使是中等长度的陷阱句子重复了很多次（100 次），使用现有方法也无法检测到。然而，我们表明可以可靠地检测到重复多次的较长序列（AUC = 0.75）并用作版权陷阱。我们通过研究序列被看到的次数如何提高可检测性、具有较高困惑度的序列如何更容易被记住以及考虑上下文如何进一步提高可检测性来进一步改进这些结果。

Title: Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

Authors: Yi Fung, Ruining Zhao, Jae Doo, Chenkai Sun, Heng Ji
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.09369
Pdf URL: https://arxiv.org/pdf/2402.09369
Copy Paste: [[2402.09369]] Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking(https://arxiv.org/abs/2402.09369)
Keywords: language model
Abstract: Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively multicultural knowledge acquisition. Specifically, our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Leveraging this valuable source of data collection, we construct the CultureAtlas dataset, which covers a wide range of sub-country level geographical regions and ethnolinguistic groups, with data cleaning and preprocessing to ensure textual assertion sentence self-containment, as well as fine-grained cultural profile information extraction. Our dataset not only facilitates the evaluation of language model performance in culturally diverse contexts but also serves as a foundational tool for the development of culturally sensitive and aware language models. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI, to promote a more inclusive and balanced representation of global cultures in the digital domain.
摘要：预训练的大型语言模型已经彻底改变了许多应用程序，但仍然面临与文化偏见和缺乏对指导跨文化交流和互动至关重要的文化常识相关的挑战。认识到现有方法在捕捉世界各地多样化和丰富文化方面的缺点，本文介绍了一种大规模多元文化知识获取的新方法。具体来说，我们的方法战略性地从有关文化主题的信息丰富的维基百科文档导航到广泛的链接页面网络。利用这一宝贵的数据收集来源，我们构建了 CultureAtlas 数据集，该数据集涵盖了广泛的次国家级地理区域和民族语言群体，并进行了数据清理和预处理，以确保文本断言句子的自我包含性以及细粒度文化概况信息提取。我们的数据集不仅有助于评估不同文化背景下的语言模型性能，而且还可以作为开发文化敏感和意识语言模型的基础工具。我们的工作标志着朝着更深入地理解和弥合人工智能文化差异差距迈出的重要一步，以促进全球文化在数字领域更具包容性和平衡的表现。

Title: Transformers Can Achieve Length Generalization But Not Robustly

Authors: Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou
Subjects: cs.LG, cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.09371
Pdf URL: https://arxiv.org/pdf/2402.09371
Copy Paste: [[2402.09371]] Transformers Can Achieve Length Generalization But Not Robustly(https://arxiv.org/abs/2402.09371)
Keywords: language model
Abstract: Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively straightforward tasks. In this paper, we test the Transformer's ability of length generalization using the task of addition of two integers. We show that the success of length generalization is intricately linked to the data format and the type of position encoding. Using the right combination of data format and position encodings, we show for the first time that standard Transformers can extrapolate to a sequence length that is 2.5x the input length. Nevertheless, unlike in-distribution generalization, length generalization remains fragile, significantly influenced by factors like random weight initialization and training data order, leading to large variances across different random seeds.
摘要：长度泛化，定义为从较短的训练序列推断到较长的测试序列的能力，是语言模型的重大挑战。即使大型 Transformer 处理相对简单的任务，这个问题仍然存在。在本文中，我们使用两个整数相加的任务来测试 Transformer 的长度泛化能力。我们表明，长度泛化的成功与数据格式和位置编码类型密切相关。使用数据格式和位置编码的正确组合，我们首次证明标准 Transformer 可以推断出输入长度 2.5 倍的序列长度。然而，与分布内泛化不同，长度泛化仍然脆弱，受到随机权重初始化和训练数据顺序等因素的显着影响，导致不同随机种子之间存在较大差异。

Title: HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

Authors: Yihao Fang, Stephen W. Thomas, Xiaodan Zhu
Subjects: cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.09390
Pdf URL: https://arxiv.org/pdf/2402.09390
Copy Paste: [[2402.09390]] HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation(https://arxiv.org/abs/2402.09390)
Keywords: language model, llm, hallucination
Abstract: With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.
摘要：随着大语言模型（LLM）在众多应用中的广泛采用，事实性的挑战和幻觉的倾向引起了人们的严重担忧。为了解决这个问题，特别是在检索增强的上下文学习中，我们引入了思想层次图（HGOT），这是一种结构化的多层图方法，旨在增强上下文学习期间相关段落的检索。该框架利用了法学硕士的紧急规划功能，采用分而治之的策略将复杂的查询分解为可管理的子查询。它改进了答案选择的自我一致性多数投票，其中结合了最近提出的引用回忆和精确度指标来评估思想的质量，将答案的可信度本质上与思想的质量联系起来。这种方法在多数投票中引入了加权系统，根据思想的引用质量对答案进行优先级排序。此外，我们提出了一种评估检索到的段落的评分机制，考虑引用频率和质量、自洽置信度以及检索模块的排名等因素。实验表明，HGOT 在不同数据集上的性能优于其他检索增强型上下文学习方法，包括演示-搜索-预测 (DSP)、ReAct、Self-Ask 和 Retrieve-then-Read，性能提升高达 $7\%$，证明它在提高法学硕士的真实性方面的功效。

Title: LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

Authors: Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, Huan Sun
Subjects: cs.AI, cs.CE, cs.CL
Abstract URL: https://arxiv.org/abs/2402.09391
Pdf URL: https://arxiv.org/pdf/2402.09391
Copy Paste: [[2402.09391]] LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset(https://arxiv.org/abs/2402.09391)
Keywords: language model, gpt, llm
Abstract: Chemistry plays a crucial role in many domains, such as drug discovery and material science. While large language models (LLMs) such as GPT-4 exhibit remarkable capabilities on natural language processing tasks, existing work shows their performance on chemistry tasks is discouragingly low. In this paper, however, we demonstrate that our developed LLMs can achieve very strong results on a comprehensive set of chemistry tasks, outperforming the most advanced GPT-4 across all the tasks by a substantial margin and approaching the SoTA task-specific models. The key to our success is a large-scale, comprehensive, high-quality dataset for instruction tuning named SMolInstruct. It contains 14 meticulously selected chemistry tasks and over three million high-quality samples, laying a solid foundation for training and evaluating LLMs for chemistry. Based on SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks. We further conduct analysis on the impact of trainable parameters, providing insights for future research.
摘要：化学在许多领域发挥着至关重要的作用，例如药物发现和材料科学。虽然 GPT-4 等大型语言模型 (LLM) 在自然语言处理任务上表现出卓越的能力，但现有的工作表明它们在化学任务上的表现却低得令人沮丧。然而，在本文中，我们证明了我们开发的 LLM 可以在一系列全面的化学任务上取得非常出色的结果，在所有任务上都大幅优于最先进的 GPT-4，并接近 SoTA 特定任务模型。我们成功的关键是一个名为 SMolInstruct 的大规模、全面、高质量的指令调优数据集。它包含14个精心挑选的化学任务和超过300万个高质量样本，为化学法学硕士的培训和评估奠定了坚实的基础。基于 SMolInstruct，我们微调了一组开源 LLM，其中，我们发现 Mistral 是化学任务的最佳基础模型。我们进一步对可训练参数的影响进行分析，为未来的研究提供见解。

Title: Long-form evaluation of model editing

Authors: Domenic Rosati, Robie Gonzales, Jinkun Chen, Xuemin Yu, Melis Erkan, Yahya Kayani, Satya Deepika Chavatapalli, Frank Rudzicz, Hassan Sajjad
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.09394
Pdf URL: https://arxiv.org/pdf/2402.09394
Copy Paste: [[2402.09394]] Long-form evaluation of model editing(https://arxiv.org/abs/2402.09394)
Keywords: prompt
Abstract: Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (\textbf{\textit{LEME}}) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.
摘要：目前，模型编辑的评估仅使用提示后的“接下来的几个标记”完成。因此，这些方法对较长时间的自然语言生成的影响在很大程度上尚不清楚。我们引入了模型编辑的长格式评估（\textbf{\textit{LEME}}），这是一种新颖的评估协议，用于衡量长格式生成设置中模型编辑的功效和影响。我们的协议由机器评分调查和与人类评分密切相关的分类器组成。重要的是，我们发现我们的协议与以前的短格式指标关系不大（尽管旨在将功效、泛化性、局部性和可移植性扩展到长格式设置中），这表明我们的方法引入了一组新颖的维度了解模型编辑方法。使用该协议，我们对多种模型编辑技术进行了基准测试，并提出了一些发现，包括虽然某些方法（ROME 和 MEMIT）在有限范围内进行一致编辑方面表现良好，但它们比其他方法更容易受到事实漂移的影响。最后，我们提出了定性分析，说明长格式生成环境中的常见故障模式，包括内部一致性、词汇衔接和局部性问题。

Title: Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Authors: Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, Beidi Chen
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.09398
Pdf URL: https://arxiv.org/pdf/2402.09398
Copy Paste: [[2402.09398]] Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference(https://arxiv.org/abs/2402.09398)
Keywords: language model, llm
Abstract: Many computational factors limit broader deployment of large language models. In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically reduce the memory footprint of the cache, they can have limited success in tasks that require recollecting a majority of previous tokens. To alleviate this issue, we propose LESS, a simple integration of a (nearly free) constant sized cache with eviction-based cache methods, such that all tokens can be queried at later decoding steps. Its ability to retain information throughout time shows merit on a variety of tasks where we demonstrate LESS can help reduce the performance gap from caching everything, sometimes even matching it, all while being efficient.
摘要：许多计算因素限制了大型语言模型的更广泛部署。在本文中，我们重点关注键值 (KV) 缓存带来的内存瓶颈，这是一种计算快捷方式，需要在解码期间存储先前的 KV 对。虽然现有的 KV 缓存方法通过修剪或驱逐大量相对不太重要的 KV 对来解决这个问题，以显着减少缓存的内存占用，但它们在需要重新收集大多数先前令牌的任务中效果有限。为了缓解这个问题，我们提出了 LESS，即（几乎免费的）恒定大小的缓存与基于驱逐的缓存方法的简单集成，以便可以在以后的解码步骤中查询所有令牌。它在整个时间内保留信息的能力在各种任务中表现出了优点，我们证明 LESS 可以帮助缩小缓存所有内容（有时甚至匹配它）的性能差距，同时保持高效。

Title: Reinforcement Learning from Human Feedback with Active Queries

Authors: Kaixuan Ji, Jiafan He, Quanquan Gu
Subjects: cs.LG, cs.AI, cs.CL, math.OC, stat.ML
Abstract URL: https://arxiv.org/abs/2402.09401
Pdf URL: https://arxiv.org/pdf/2402.09401
Copy Paste: [[2402.09401]] Reinforcement Learning from Human Feedback with Active Queries(https://arxiv.org/abs/2402.09401)
Keywords: language model, llm
Abstract: Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/\Delta)$ regret bound and an $\tilde{O}(d^2/\Delta^2)$ query complexity, where $d$ is the dimension of feature space and $\Delta$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
摘要：将大语言模型 (LLM) 与人类偏好保持一致在构建现代生成模型中发挥着关键作用，可以通过人类反馈强化学习 (RLHF) 来实现。尽管性能优越，但当前的 RLHF 方法通常需要大量人工标记的偏好数据，而收集这些数据的成本很高。在本文中，受到主动学习成功的启发，我们通过提出查询高效的 RLHF 方法来解决这个问题。我们首先将对齐问题形式化为上下文决斗强盗问题，并设计一个基于主动查询的近端策略优化（APPO）算法，其具有 $\tilde{O}(d^2/\Delta)$ 遗憾界限和 $\ tilde{O}(d^2/\Delta^2)$ 查询复杂度，其中 $d$ 是特征空间的维度，$\Delta$ 是所有上下文的次优差距。然后，我们提出 ADPO，这是我们基于直接偏好优化 (DPO) 的算法的实用版本，并将其应用于微调 LLM。我们的实验表明，ADPO 虽然只根据人类偏好进行大约一半的查询，但与最先进的 DPO 方法的性能相匹配。

Title: AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

Authors: Siwei Yang, Bingchen Zhao, Cihang Xie
Subjects: cs.CL, cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.09404
Pdf URL: https://arxiv.org/pdf/2402.09404
Copy Paste: [[2402.09404]] AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability(https://arxiv.org/abs/2402.09404)
Keywords: language model, gpt, llm
Abstract: This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS). The key feature of our evaluation benchmark lies in its interactive evaluation protocol -- for example, in DFS, the availability of each node's connected edge is contingent upon the model's traversal to that node, thereby necessitating the LLM's ability to effectively remember visited nodes and strategize subsequent moves. We comprehensively build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search, and to evaluate the sequential reasoning ability of 12 different LLMs. Our investigations reveal several interesting findings: (1) Closed-source models like GPT-4 and Gemini generally show strong sequential reasoning ability, significantly outperforming open-source LLMs. (2) Naively providing interactive examples may inadvertently hurt few-shot performance. (3) A very limited number of predecessor steps following the optimal policy can substantially boost small models' performance. (4) The scaling correlation between performance and model size is not always significant, sometimes even showcasing an inverse trend. We hope our study can catalyze future work on advancing the understanding and enhancement of LLMs' capabilities in sequential reasoning. The code is available at https://github.com/UCSC-VLAA/AQA-Bench.
摘要：本文介绍了 AQA-Bench，这是一种新颖的基准，用于评估深度优先搜索 (DFS) 等算法环境中大型语言模型 (LLM) 的顺序推理能力。我们的评估基准的关键特征在于其交互式评估协议——例如，在 DFS 中，每个节点连接边的可用性取决于模型对该节点的遍历，因此需要 LLM 能够有效记住访问过的节点并制定策略后续动作。我们用二分搜索、深度优先搜索和广度优先搜索三种不同的算法综合构建了AQA-Bench，并评估了12个不同LLM的顺序推理能力。我们的调查揭示了几个有趣的发现：（1）像 GPT-4 和 Gemini 这样的闭源模型通常表现出很强的顺序推理能力，显着优于开源 LLM。 (2) 单纯地提供交互式示例可能会无意中损害小样本的性能。 (3) 遵循最优策略的非常有限数量的前驱步骤可以显着提高小模型的性能。 (4) 性能和模型大小之间的尺度相关性并不总是显着的，有时甚至表现出相反的趋势。我们希望我们的研究能够促进未来的工作，促进对法学硕士顺序推理能力的理解和增强。代码可在 https://github.com/UCSC-VLAA/AQA-Bench 获取。