数据集:
gap
任务:
标记分类语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1810.05201许可:
license:unknownGAP 是一个性别平衡的数据集,包含了8,908对已进行核心指代标注的(不明确的代词,先行词名称)配对数据,这些数据是从维基百科中采样并由Google AI Language发布,用于评估实际应用中核心指代解析的性能。
'验证'的一个示例如下。
{ "A": "aliquam ultrices sagittis", "A-coref": false, "A-offset": 208, "B": "elementum curabitur vitae", "B-coref": false, "B-offset": 435, "ID": "validation-1", "Pronoun": "condimentum mattis pellentesque", "Pronoun-offset": 948, "Text": "Lorem ipsum dolor", "URL": "sem fringilla ut" }
数据字段在所有拆分中是相同的。
defaultname | train | validation | test |
---|---|---|---|
default | 2000 | 454 | 2000 |
@article{webster-etal-2018-mind, title = "Mind the {GAP}: A Balanced Corpus of Gendered Ambiguous Pronouns", author = "Webster, Kellie and Recasens, Marta and Axelrod, Vera and Baldridge, Jason", journal = "Transactions of the Association for Computational Linguistics", volume = "6", year = "2018", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q18-1042", doi = "10.1162/tacl_a_00240", pages = "605--617", }
感谢 @thomwolf , @patrickvonplaten , @otakumesi , @lewtun 添加了该数据集。