指令HumanEval

摘要

InstructHumanEval是OpenAI HumanEval的修改版本。对于给定的提示，我们提取了它的特征签名、文档字符串以及头部，创建了一个灵活的设置，允许评估经过指令调整的LLM。指令调整过程中使用的分隔符可以用来构建指令，使模型能够激发其最佳能力。以下是使用示例

可以根据模型的指令调整分隔符构建提示，如下所示

from datasets import load_dataset
ds = load_dataset("codeparrot/instructhumaneval", split="test", use_auth_token=True)
prompt_0 = "Human\n" + ds[0]["instruction"] + "\nAssistant\n" + ds[0]["context"] 
print(prompt_0)

输出

Human:
Write a function has_close_elements(numbers: List[float], threshold: float) -> bool to solve the following problem:
Check if in given list of numbers, are any two numbers closer to each other than given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True 
Assistant:
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:

因此，模型可以完成指令并产生更好的结果，因为它符合其训练过程。

您还可以在 BigCode-evaluation-harness 中找到在数据集上评估模型的代码。以下各节详细介绍了数据集。

数据集描述

此数据集是经过修改的 OpenAI HumanEval 的版本，旨在适应指令经过调整的模型的基准。事实上，HumanEval评估完成给定签名、文档字符串以及可能的辅助函数的代码的能力。

数据集构建

为了构建HumanEval的指令版本，我们从原始版本的提示列中提取相关信息。

签名：这是要完成的函数的签名。它的格式如def function_name(args:type):-> return_type。
文档字符串：这是函数的文档字符串。它是描述函数目的的文本。
上下文：这表示为帮助模型完成函数而提供的所有附加信息。它包括导入和辅助函数。我们的想法是从HumanEval的原始格式中移出来。

<context>
<signature>
<docstring>

并构建一个指令如下

Write a function <signature> to solve the following problem:
<docstring>

根据这个指令，我们可以设计用于指令经过调整的语言模型的评估流程。

评估

指令调整的LLM是通过在指令数据集上对基础LLM进行微调来构建的。这个指令数据集包含多个对，每个对表示用户提交的指令以及对应的正确答案。这些对被组织成多轮会话，每个轮次都有特殊的令牌来表示交互的每个成员，如Q user_token Human:，Assistant:和end_token \n，表示每个轮次的结束。

代码完成

在这种情况下，LLM提供以下提示：

user_token + <instruction> + <end_token> + <assistant_token> + <context>

它要求完成以解决指令所描述的问题的函数。它与原始评估非常相似，但它的优势是它使模型处于最佳状态，以理解所要求解决的任务。评估是在<assistant_token>后生成的部分上进行的。

文档字符串生成代码

这种设置更复杂，因为它要求模型考虑指令中包含的信息，例如函数签名。LLM提供以下提示：

user_token + <instruction> + <end_token> + <assistant_token>

模型必须生成一个具有正确签名的函数，以适当地解决问题。通过识别生成中函数内容（通过查找正确的entry_point/function_name），并将其与提供的<context>连接起来来进行评估。

如何使用数据集

from datasets import load_dataset

ds = load_dataset("codeparrot/instructhumaneval")

ds
DatasetDict({
    test: Dataset({
        features: ['task_id', 'prompt', 'canonical_solution', 'test', 'entry_point', 'signature', 'docstring', 'context', 'instruction'],
        num_rows: 164
    })
})

作者:

codeparrot

数据集大小:

164.66 KB