数据集:
multi_nli_mismatch
多类型自然语言推理(MultiNLI)语料库是一个众包集合,包含了433,000个带有文本蕴涵信息的句对注释。该语料库基于SNLI语料库进行建模,但不同之处在于覆盖了多种口语和书面文本类型,并支持独特的跨类型通用性评估。该语料库提供了EMNLP哥本哈根RepEval 2017研讨会的共享任务基础。
'train'的一个示例如下:
{
"hypothesis": "independence",
"label": "contradiction",
"premise": "correlation"
}
数据字段在所有拆分中都相同。
plain_textname | train | validation |
---|---|---|
plain_text | 392702 | 10000 |
@InProceedings{N18-1101,
author = "Williams, Adina
and Nangia, Nikita
and Bowman, Samuel",
title = "A Broad-Coverage Challenge Corpus for
Sentence Understanding through Inference",
booktitle = "Proceedings of the 2018 Conference of
the North American Chapter of the
Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long
Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "1112--1122",
location = "New Orleans, Louisiana",
url = "http://aclweb.org/anthology/N18-1101"
}
感谢 @thomwolf , @patrickvonplaten , @mariamabarham 添加了此数据集。