数据集:
qasc
语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1910.11473许可:
QASC是一个以句子构成为重点的问答数据集。它包括9,980个关于小学科学的八选一问题(8,134个训练集、926个开发集、920个测试集),并附带一个含有17M个句子的语料库。
“验证”样例如下所示。
{ "answerKey": "F", "choices": { "label": ["A", "B", "C", "D", "E", "F", "G", "H"], "text": ["sand", "occurs over a wide range", "forests", "Global warming", "rapid changes occur", "local weather conditions", "measure of motion", "city life"] }, "combinedfact": "Climate is generally described in terms of local weather conditions", "fact1": "Climate is generally described in terms of temperature and moisture.", "fact2": "Fire behavior is driven by local weather conditions such as winds, temperature and moisture.", "formatted_question": "Climate is generally described in terms of what? (A) sand (B) occurs over a wide range (C) forests (D) Global warming (E) rapid changes occur (F) local weather conditions (G) measure of motion (H) city life", "id": "3NGI5ARFTT4HNGVWXAMLNBMFA0U1PG", "question": "Climate is generally described in terms of what?" }
数据字段在所有拆分中相同。
默认name | train | validation | test |
---|---|---|---|
default | 8134 | 926 | 920 |
该数据集是在 CC BY 4.0 许可下发布的。
@article{allenai:qasc, author = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal}, title = {QASC: A Dataset for Question Answering via Sentence Composition}, journal = {arXiv:1910.11473v2}, year = {2020}, }
感谢 @thomwolf 、 @patrickvonplaten 、 @lewtun 添加了该数据集。