数据集:
cmrc2018
任务:
问答子任务:
extractive-qa语言:
zh计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original许可:
cc-by-sa-4.0这是一个用于中文机器阅读理解的抽取式问题数据集,旨在为该领域添加语言多样性。该数据集由人工专家在维基百科段落上注释了近20,000个真实问题。我们还对一个挑战集进行了注释,其中包含需要全面理解和多句推理的问题。
'验证'的一个示例如下所示。
This example was too long and was cropped: { "answers": { "answer_start": [11, 11], "text": ["光荣和ω-force", "光荣和ω-force"] }, "context": "\"《战国无双3》()是由光荣和ω-force开发的战国无双系列的正统第三续作。本作以三大故事为主轴,分别是以武田信玄等人为主的《关东三国志》,织田信长等人为主的《战国三杰》,石田三成等人为主的《关原的年轻武者》,丰富游戏内的剧情。此部份专门介绍角色,欲知武...", "id": "DEV_0_QUERY_0", "question": "《战国无双3》是由哪两个公司合作开发的?" }
数据字段在所有拆分中都是相同的。
默认name | train | validation | test |
---|---|---|---|
default | 10142 | 3219 | 1002 |
@inproceedings{cui-emnlp2019-cmrc2018, title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension", author = "Cui, Yiming and Liu, Ting and Che, Wanxiang and Xiao, Li and Chen, Zhipeng and Ma, Wentao and Wang, Shijin and Hu, Guoping", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-1600", doi = "10.18653/v1/D19-1600", pages = "5886--5891", }
感谢 @patrickvonplaten , @mariamabarham , @lewtun , @thomwolf 添加了该数据集。