数据集:
multi_nli_mismatch
多类型自然语言推理(MultiNLI)语料库是一个众包集合,包含了433,000个带有文本蕴涵信息的句对注释。该语料库基于SNLI语料库进行建模,但不同之处在于覆盖了多种口语和书面文本类型,并支持独特的跨类型通用性评估。该语料库提供了EMNLP哥本哈根RepEval 2017研讨会的共享任务基础。
'train'的一个示例如下:
{ "hypothesis": "independence", "label": "contradiction", "premise": "correlation" }
数据字段在所有拆分中都相同。
plain_textname | train | validation |
---|---|---|
plain_text | 392702 | 10000 |
@InProceedings{N18-1101, author = "Williams, Adina and Nangia, Nikita and Bowman, Samuel", title = "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference", booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", year = "2018", publisher = "Association for Computational Linguistics", pages = "1112--1122", location = "New Orleans, Louisiana", url = "http://aclweb.org/anthology/N18-1101" }
感谢 @thomwolf , @patrickvonplaten , @mariamabarham 添加了此数据集。