数据集:
codeparrot/apps
APPS 是一个包含 10000 个问题的代码生成基准。它可以用于评估语言模型根据自然语言规范生成代码的能力。您可以在这里的中心找到 APPS 指标 codeparrot/apps_metric 。
数据集中包含英语的问题和 Python 的代码解决方案。
from datasets import load_dataset load_dataset("codeparrot/apps") DatasetDict({ train: Dataset({ features: ['problem_id', 'question', 'solutions', 'input_output', 'difficulty', 'url', 'starter_code'], num_rows: 5000 }) test: Dataset({ features: ['problem_id', 'question', 'solutions', 'input_output', 'difficulty', 'url', 'starter_code'], num_rows: 5000 }) })
您可以使用以下两行代码加载并迭代训练拆分的数据集:
from datasets import load_dataset import json ds = load_dataset("codeparrot/apps", split="train") sample = next(iter(ds)) # non-empty solutions and input_output features can be parsed from text format this way: sample["solutions"] = json.loads(sample["solutions"]) sample["input_output"] = json.loads(sample["input_output"]) print(sample) #OUTPUT: { 'problem_id': 0, 'question': 'Polycarp has $n$ different binary words. A word called binary if it contains only characters \'0\' and \'1\'. For example...', 'solutions': ["for _ in range(int(input())):\n n = int(input())\n mass = []\n zo = 0\n oz = 0\n zz = 0\n oo = 0\n...",...], 'input_output': {'inputs': ['4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n'], 'outputs': ['1\n3 \n-1\n0\n\n2\n1 2 \n']}, 'difficulty': 'interview', 'url': 'https://codeforces.com/problemset/problem/1259/D', 'starter_code': ''} }
每个样本包含一个用英语表示的编程问题陈述,一些真实的 Python 解决方案,以及根据输入和输出定义的测试用例和(如果提供)功能名称,以及有关问题难度和来源的一些元数据。
如果样本具有非空的 input_output 特征,您可以将其读取为带有输入和输出键以及 fn_name(如果存在)的字典,并且您可以将解决方案解析为解决方案列表,如上面的代码所示。
您还可以根据难度级别过滤数据集:入门、面试和比赛。只需将困难程度列表传递给过滤器。例如,如果您想要最具挑战性的问题,您需要选择比赛级别:
ds = load_dataset("codeparrot/apps", split="train", difficulties=["competition"]) print(next(iter(ds))["question"]) #OUTPUT: """\ Codefortia is a small island country located somewhere in the West Pacific. It consists of $n$ settlements connected by ... For each settlement $p = 1, 2, \dots, n$, can you tell what is the minimum time required to travel between the king's residence and the parliament house (located in settlement $p$) after some roads are abandoned? -----Input----- The first line of the input contains four integers $n$, $m$, $a$ and $b$ ... -----Output----- Output a single line containing $n$ integers ... -----Examples----- Input 5 5 20 25 1 2 25 ... Output 0 25 60 40 20 ...
Field | Type | Description |
---|---|---|
problem_id | int | problem id |
question | string | problem description |
solutions | string | some python solutions |
input_output | string | Json string with "inputs" and "outputs" of the test cases, might also include "fn_name" the name of the function |
difficulty | string | difficulty level of the problem |
url | string | url of the source of the problem |
starter_code | string | starter code to include in prompts |
我们提到只有少数样本指定了 fn_name 和 starter_code
数据集包含5000个样本的训练集和测试集拆分。
为了创建 APPS 数据集,作者从程序员互相分享问题的开放获取网站(包括 Codewars、AtCoder、Kattis 和 Codeforces)手动筛选了问题。有关更多详细信息,请参阅原始文献 paper 。
在 AlphaCode 中作者发现,由于测试覆盖不足,该数据集可能会生成许多误报为正确的错误提交。
@article{hendrycksapps2021, title={Measuring Coding Challenge Competence With APPS}, author={Dan Hendrycks and Steven Basart and Saurav Kadavath and Mantas Mazeika and Akul Arora and Ethan Guo and Collin Burns and Samir Puranik and Horace He and Dawn Song and Jacob Steinhardt}, journal={NeurIPS}, year={2021} }