数据集:
cosmos_qa
任务:
子任务:
multiple-choice-qa语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1909.00277许可:
Cosmos QA 是一个大规模的数据集,包含了35.6K个需要基于常识的阅读理解问题,以多选题的形式提出。它侧重于从人们日常叙述的各种故事中阅读"行间小字",并提出需要超越上下文中具体文本范围的推理来回答关于事件可能的原因或影响的问题
"验证"集的一个示例如下所示。
This example was too long and was cropped:
{
"answer0": "If he gets married in the church he wo nt have to get a divorce .",
"answer1": "He wants to get married to a different person .",
"answer2": "He wants to know if he does nt like this girl can he divorce her ?",
"answer3": "None of the above choices .",
"context": "\"Do i need to go for a legal divorce ? I wanted to marry a woman but she is not in the same religion , so i am not concern of th...",
"id": "3BFF0DJK8XA7YNK4QYIGCOG1A95STE##3180JW2OT5AF02OISBX66RFOCTG5J7##A2LTOS0AZ3B28A##Blog_56156##q1_a1##378G7J1SJNCDAAIN46FM2P7T6KZEW2",
"label": 1,
"question": "Why is this person asking about divorce ?"
}
所有拆分中的数据字段均相同。
defaultname | train | validation | test |
---|---|---|---|
default | 25262 | 2985 | 6963 |
据Yejin Choi通过电子邮件报告,该数据集是根据 CC BY 4.0 许可证授权的。
@inproceedings{huang-etal-2019-cosmos,
title = "Cosmos {QA}: Machine Reading Comprehension with Contextual Commonsense Reasoning",
author = "Huang, Lifu and
Le Bras, Ronan and
Bhagavatula, Chandra and
Choi, Yejin",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1243",
doi = "10.18653/v1/D19-1243",
pages = "2391--2401",
}
感谢 @patrickvonplaten , @lewtun , @albertvillanova , @thomwolf 添加了这个数据集。