数据集:
quartz
许可:
源数据集:
original批注创建人:
crowdsourced语言创建人:
crowdsourced大小:
1K<n<10K计算机处理:
monolingual语言:
任务:
Quartz是一个众包数据集,包含3864个关于开放领域定性关系的多项选择问题。每个问题与一个不同的背景句子(有时是短段落)配对。QuaRTz V1数据集包含3864个关于开放领域定性关系的问题。每个问题与一个不同的背景句子(有时是短段落)配对。
数据集分为训练集(2696)、开发集(384)和测试集(784)。一个背景句子只会出现在一个数据集拆分中。
'train'的一个示例如下所示。
{ "answerKey": "A", "choices": { "label": ["A", "B"], "text": ["higher", "lower"] }, "id": "QRQA-10116-3", "para": "Electrons at lower energy levels, which are closer to the nucleus, have less energy.", "para_anno": { "cause_dir_sign": "LESS", "cause_dir_str": "closer", "cause_prop": "distance from a nucleus", "effect_dir_sign": "LESS", "effect_dir_str": "less", "effect_prop": "energy" }, "para_id": "QRSent-10116", "question": "Electrons further away from a nucleus have _____ energy levels than close ones.", "question_anno": { "less_cause_dir": "electron energy levels", "less_cause_prop": "nucleus", "less_effect_dir": "lower", "less_effect_prop": "electron energy levels", "more_effect_dir": "higher", "more_effect_prop": "electron energy levels" } }
所有拆分的数据字段都是相同的。
defaultname | train | validation | test |
---|---|---|---|
default | 2696 | 384 | 784 |
该数据集的许可协议为 创意共享 Attribution 4.0 International (CC BY 4.0) 。
@InProceedings{quartz, author = {Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark}, title = {"QUARTZ: An Open-Domain Dataset of Qualitative Relationship Questions"}, year = {"2019"}, }
感谢 @patrickvonplaten , @lewtun , @thomwolf 添加此数据集。