数据集:
openai_humaneval
任务:
文生文语言:
en计算机处理:
monolingual大小:
n<1K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:2107.03374其他:
code-generation许可:
mitThe HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.
The programming problems are written in Python and contain English natural text in comments and docstrings.
from datasets import load_dataset load_dataset("openai_humaneval") DatasetDict({ test: Dataset({ features: ['task_id', 'prompt', 'canonical_solution', 'test', 'entry_point'], num_rows: 164 }) })
An example of a dataset instance:
{ "task_id": "test/0", "prompt": "def return1():\n", "canonical_solution": " return 1", "test": "def check(candidate):\n assert candidate() == 1", "entry_point": "return1" }
The dataset only consists of a test split with 164 samples.
Since code generation models are often trained on dumps of GitHub a dataset not included in the dump was necessary to properly evaluate the model. However, since this dataset was published on GitHub it is likely to be included in future dumps.
The dataset was handcrafted by engineers and researchers at OpenAI.
Initial Data Collection and Normalization[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Annotation process[More Information Needed]
Who are the annotators?[More Information Needed]
None.
Make sure you execute generated Python code in a safe environment when evauating against this dataset as generated code could be harmful.
With this dataset code generating models can be better evaluated which leads to fewer issues introduced when using such models.
[More Information Needed]
[More Information Needed]
OpenAI
MIT License
@misc{chen2021evaluating, title={Evaluating Large Language Models Trained on Code}, author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba}, year={2021}, eprint={2107.03374}, archivePrefix={arXiv}, primaryClass={cs.LG} }
Thanks to @lvwerra for adding this dataset.