数据集:

theblackcat102/codex-math-qa

英文

Codex Math QA

使用Python编程通过codex-davinci-002解决math_qa问题。

Since OpenAI decided to shut off code-davinci-002 behind Azure ,此数据集旨在分享code-davinci-002 OpenAI的176B代码生成模型的生成结果。

数据拆分

name train validation test
main 25065 4133 2985
rational - 4151 2985
  • 基本提示:仅包含问题
  • dataset = load_dataset("theblackcat102/codex-math-qa", "main")
    

    示例:

    the average weight of 8 person ' s increases by 1.5 kg when a new person comes in place of one of them weighing 75 kg. what might be the weight of the new person ?
    Write a short snippet of python function which solve this problem. No need to explain the answer, let the code output the answer.
    

    输出

    def average_weight(weights):
        total_weight = sum(weights)
        return total_weight / len(weights)
    
    def new_person_weight(weights):
        average_weight = average_weight(weights)
        new_person_weight = average_weight + 1.5
        return new_person_weight
    
    weights = [75, 80, 85, 90, 95]
    print(new_person_weight(weights))
    
  • 带有解释的提示:包括帮助进行编程规划的解释文本
  • dataset = load_dataset("theblackcat102/codex-math-qa", "rational")
    

    示例:

    one copy machine can make 20 copies a minute, and a second copy machine makes 10 copies a minute. if the two copiers work together, how long would it take them to make 900 copies ?
    Rationale: "total work done by both machines in a minute = 20 + 10 = 30 copies total number of copies required = 900 time = 900 / 30 = 30 mins answer d"
    Write a short snippet of python function which solve this problem within 30 lines of code. Add comments in the function block for explaination.
    

    输出

    def copy_machine(copies, copy_speed):
        """
        :param copies: number of copies required
        :param copy_speed: speed of the copy machine
        :return: time in minutes
        """
        total_work = sum(copy_speed)
        time = copies / total_work
        return time
    
    
    if __name__ == "__main__":
        print(copy_machine(900, [20, 10]))
    

    注意事项:

    生成的结果未经验证,即来自codex-davinci-002的实际输出。因此,大部分答案都是不正确的,且代码存在语法错误。然而,这是未来研究的工作,此数据集的目的是为codex-davinci-002提供基于代码的数学答案的来源或参考。

    数据集创建

    数据集来自 math_qa 并在节的末尾追加提示以生成答案的Python解决方案。这是为了提供galactica中的工作卸载数据集的目标。

    code-davinci-02的生成配置如下:

    name value
    max_tokens 2048
    temperature 0.5
    top_p 0.7

    引用信息

    @inproceedings{amini-etal-2019-mathqa,
        title = "{M}ath{QA}: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms",
        author = "Amini, Aida  and
          Gabriel, Saadia  and
          Lin, Shanchuan  and
          Koncel-Kedziorski, Rik  and
          Choi, Yejin  and
          Hajishirzi, Hannaneh",
        booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
        month = jun,
        year = "2019",
        address = "Minneapolis, Minnesota",
        publisher = "Association for Computational Linguistics",
        url = "https://aclanthology.org/N19-1245",
        doi = "10.18653/v1/N19-1245",
        pages = "2357--2367",
    }