数据集:

nuprl/MultiPL-E-synthetic-solutions

语言:

en

许可:

openrail
中文

Dataset Card

This is a dataset of partial solutions to the HumanEval and MBPP code generation benchmarks tranlated into 18+ programming languages. The original benchmark problems were in Python, and we build the dataset as follows:

  • We translate the prompts into a new language using MultiPL-E;
  • We use code-davinci-002 to generate 200 completions for each problem at temperature 0.8;
  • We select a working solution (if one exists) for each problem-language pair.
  • This notebook carried out the steps described above.

    Note that the dataset does not have solutions for every problem-language pair, since code-davinci-002 cannot produce a correct solution to every problem.