数据集:
hotpot_qa
任务:
问答语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1809.09600其他:
multi-hop许可:
cc-by-sa-4.0HotpotQA是一个包含113,000个基于维基百科的问题-答案对的数据集,具有四个关键特征:(1)问题要求查找和推理多个支持文档以回答;(2)问题多样化,并不受任何预先存在的知识库或知识模式的限制;(3)我们提供了需要推理的句级支持事实,允许QA系统进行强有力的监督和解释预测;(4)我们提供了一种新类型的事实对比问题,用于测试QA系统提取相关事实并进行必要比较的能力。
'validation'的示例如下所示。
{ "answer": "This is the answer", "context": { "sentences": [["Sent 1"], ["Sent 21", "Sent 22"]], "title": ["Title1", "Title 2"] }, "id": "000001", "level": "medium", "question": "What is the answer?", "supporting_facts": { "sent_id": [0, 1, 3], "title": ["Title of para 1", "Title of para 2", "Title of para 3"] }, "type": "comparison" }fullwiki
'train'的示例如下所示。
{ "answer": "This is the answer", "context": { "sentences": [["Sent 1"], ["Sent 2"]], "title": ["Title1", "Title 2"] }, "id": "000001", "level": "hard", "question": "What is the answer?", "supporting_facts": { "sent_id": [0, 1, 3], "title": ["Title of para 1", "Title of para 2", "Title of para 3"] }, "type": "bridge" }
数据字段在所有拆分中都是相同的。
distractortrain | validation | |
---|---|---|
distractor | 90447 | 7405 |
train | validation | test | |
---|---|---|---|
fullwiki | 90447 | 7405 | 7405 |
HotpotQA在 CC BY-SA 4.0 License 下分发。
@inproceedings{yang2018hotpotqa, title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering}, author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.}, booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})}, year={2018} }
感谢 @albertvillanova , @ghomasHudson 添加了此数据集。