数据集:
race
任务:
子任务:
multiple-choice-qa语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1704.04683许可:
RACE是一个大规模的阅读理解数据集,包含超过28,000篇文章和近100,000个问题。该数据集收集自中国的英语考试,旨在为初中和高中学生设计。该数据集可用作机器阅读理解的训练和测试集。
"train"的一个示例如下所示。
This example was too long and was cropped: { "answer": "A", "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...", "example_id": "high132.txt", "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"], "question": "The girls at Paget High School are not allowed to wear skirts in that _ ." }high
"train"的一个示例如下所示。
This example was too long and was cropped: { "answer": "A", "article": "\"Schoolgirls have been wearing such short skirts at Paget High School in Branston that they've been ordered to wear trousers ins...", "example_id": "high132.txt", "options": ["short skirts give people the impression of sexualisation", "short skirts are too expensive for parents to afford", "the headmaster doesn't like girls wearing short skirts", "the girls wearing short skirts will be at the risk of being laughed at"], "question": "The girls at Paget High School are not allowed to wear skirts in that _ ." }middle
"train"的一个示例如下所示。
This example was too long and was cropped: { "answer": "B", "article": "\"There is not enough oil in the world now. As time goes by, it becomes less and less, so what are we going to do when it runs ou...", "example_id": "middle3.txt", "options": ["There is more petroleum than we can use now.", "Trees are needed for some other things besides making gas.", "We got electricity from ocean tides in the old days.", "Gas wasn't used to run cars in the Second World War."], "question": "According to the passage, which of the following statements is TRUE?" }
在所有拆分中数据字段相同。
所有的name | train | validation | test |
---|---|---|---|
all | 87866 | 4887 | 4934 |
high | 62445 | 3451 | 3498 |
middle | 25421 | 1436 | 1436 |
http://www.cs.cmu.edu/~glai1/data/race/
RACE数据集仅可供非商业研究目的使用。
所有文章均来自互联网,不属于卡内基梅隆大学的财产。我们对这些文章的内容和含义不负责任。
您同意不以任何商业目的复制、复制、出售、交易、转售或利用上下文的任何部分和派生数据的任何部分。
我们保留随时终止您对RACE数据集的访问权利。
@inproceedings{lai-etal-2017-race, title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations", author = "Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D17-1082", doi = "10.18653/v1/D17-1082", pages = "785--794", }
感谢 @abarbosa94 、 @patrickvonplaten 、 @lewtun 、 @thomwolf 、 @mariamabarham 添加了这个数据集。