数据集:

anli

任务:

文本分类

子任务:

natural-language-inference multi-input-text-classification

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

found

批注创建人:

crowdsourced machine-generated

源数据集:

original extended|hotpot_qa

预印本库:

arxiv:1910.14599

许可:

cc-by-nc-4.0

数据集介绍文件清单

英文

"anli" 数据集卡片

数据集摘要

Adversarial Natural Language Inference (ANLI) 是一个新的大规模 NLI 基准数据集。该数据集是通过迭代的、对抗性的人工与模型相结合的过程收集的。ANLI 比其先前的数据集，包括 SNLI 和 MNLI 要困难得多。它包含三个轮次，每个轮次有训练集、开发集和测试集。

支持的任务和排行榜

More Information Needed

语言

英语

数据集结构

数据实例

plain_text

下载的数据集文件大小： 18.62 MB
生成的数据集大小： 77.12 MB
使用的总磁盘空间： 95.75 MB

'train_r2' 的一个示例如下所示。

This example was too long and was cropped:

{
    "hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.",
    "label": 0,
    "premise": "\"Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...",
    "reason": "",
    "uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712"
}

数据字段

所有拆分之间的数据字段相同。

plain_text

uid : 一个字符串类型的特征。
premise : 一个字符串类型的特征。
hypothesis : 一个字符串类型的特征。
label : 一个分类标签，可能的值包括蕴涵 (0)，中性 (1)，矛盾 (2)。
reason : 一个字符串类型的特征。

数据拆分

name	train_r1	dev_r1	train_r2	dev_r2	train_r3	dev_r3	test_r1	test_r2	test_r3
plain_text	16946	1000	45460	1000	100459	1200	1000	1000	1200

数据集创建

策划理由

More Information Needed

源数据

数据收集和规范化

More Information Needed

源语言生成者是谁？

More Information Needed

注释

注释过程

More Information Needed

注释者是谁？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

cc-4 Attribution-NonCommercial

引用信息

@InProceedings{nie2019adversarial,
    title={Adversarial NLI: A New Benchmark for Natural Language Understanding},
    author={Nie, Yixin
                and Williams, Adina
                and Dinan, Emily
                and Bansal, Mohit
                and Weston, Jason
                and Kiela, Douwe},
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    year = "2020",
    publisher = "Association for Computational Linguistics",
}

贡献者

感谢 @thomwolf ， @easonnie ， @lhoestq ， @patrickvonplaten 添加了这个数据集。

作者:

佚名

数据集大小:

16.55 KB

"anli" 数据集卡片

数据集摘要

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献者