数据集:
anli
任务:
文本分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found预印本库:
arxiv:1910.14599许可:
cc-by-nc-4.0Adversarial Natural Language Inference (ANLI) 是一个新的大规模 NLI 基准数据集。该数据集是通过迭代的、对抗性的人工与模型相结合的过程收集的。ANLI 比其先前的数据集,包括 SNLI 和 MNLI 要困难得多。它包含三个轮次,每个轮次有训练集、开发集和测试集。
英语
'train_r2' 的一个示例如下所示。
This example was too long and was cropped: { "hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.", "label": 0, "premise": "\"Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...", "reason": "", "uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712" }
所有拆分之间的数据字段相同。
plain_textname | train_r1 | dev_r1 | train_r2 | dev_r2 | train_r3 | dev_r3 | test_r1 | test_r2 | test_r3 |
---|---|---|---|---|---|---|---|---|---|
plain_text | 16946 | 1000 | 45460 | 1000 | 100459 | 1200 | 1000 | 1000 | 1200 |
cc-4 Attribution-NonCommercial
@InProceedings{nie2019adversarial, title={Adversarial NLI: A New Benchmark for Natural Language Understanding}, author={Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe}, booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", }
感谢 @thomwolf , @easonnie , @lhoestq , @patrickvonplaten 添加了这个数据集。