AmericasNLI 是 XNLI(Conneau et al., 2018)的扩展,它是一个自然语言推理(NLI)数据集,涵盖了15种高资源语言和10种在美洲使用的低资源土著语言:Ashaninka,Aymara,Bribri,Guarani,Nahuatl,Otomi,Quechua,Raramuri,Shipibo-Konibo 和 Wixarika。与MNLI一样,目标是预测文本蕴含关系(句子A是否蕴含/矛盾/无关句子B),这是一个分类任务(给定两个句子,预测其中一个的三个标签之一)。
{'language': 'aym', 'premise': "Ukhamaxa, janiw ukatuqits lup'kayätti, ukhamarus wali phiñasitayätwa, ukatx jupampiw mayamp aruskipañ qallanttha.", 'hypothesis': 'Janiw mayamp jupampix p arlxapxti.', 'label': 2}aym
一个 bzd 测试集示例如下:
{'premise': "Ukhamaxa, janiw ukatuqits lup'kayätti, ukhamarus wali phiñasitayätwa, ukatx jupampiw mayamp aruskipañ qallanttha.", 'hypothesis': 'Janiw mayamp jupampix parlxapxti.', 'label ': 2}bzd
一个 cni 测试集示例如下:
{'premise': "Bua', kèq ye' kũ e' bikeitsök erë ye' chkénãwã tã ye' ujtémĩne ie' tã páxlĩnẽ.", 'hypothesis': "Kèq ye' ùtẽnẽ ie' tã páxlĩ.", 'label': 2}cni
一个 gn 测试集示例如下:
{'premise': 'Kameetsa, tee nokenkeshireajeroji, iro kantaincha tee nomateroji aisati nintajaro noñanatajiri iroakera.', 'hypothesis': 'Tee noñatajeriji.', 'label': 2}gn
一个 hch 测试集示例如下:
{'premise': "Néi, ni napensaikurihína upéva rehe, ajepichaiterei ha añepyrûjey añe'ê hendive.", 'hypothesis': "Nañe'êvéi hendive.", 'label': 2}hch
一个 nah 测试集示例如下:
{'premise': 'mu hekwa.', 'hypothesis': 'neuka tita xatawe m+k+ mat+a.', 'label': 2}nah
一个 oto 测试集示例如下:
{'premise': 'Cualtitoc, na axnimoihliaya ino, nicualaniztoya queh naha nicamohuihqui', 'hypothesis': 'Ayoc nicamohuihtoc', 'label': 2}oto
一个 quy 测试集示例如下:
{'premise': 'mi-ga, nin mibⴘy mbô̮nitho ane guenu, guedi mibⴘy nho ⴘnmⴘy xi di mⴘdi o ñana nen nⴘua manaigui', 'hypothesis': 'hin din bi pengui nen nⴘa', 'label': 2}quy
一个 shp 测试集示例如下:
.', 'label': 2}shp
一个 tar 测试集示例如下:
{'premise': 'Jakon riki, ja shinanamara ea ike, ikaxbi kikin frustradara ea ike jakopira ea jabe yoyo iribake.', 'hypothesis': 'Eara jabe yoyo iribiama iki.', 'label': 2}tar
{'premise': 'Ga’lá ju, ke tási newalayé nejé echi kítira, we ne majáli, a’lí ko uchécho ne yua ku ra’íchaki.', 'hypothesis': 'Tási ne uchecho yua ra’ícha échi rejói.', 'label': 2}
- language: a multilingual string variable, with languages including ar, bg, de, el, en. - premise: a multilingual string variable, with languages including ar, bg, de, el, en. - hypothesis: a multilingual string variable, with possible languages including ar, bg, de, el, en. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).aym
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).bzd
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).cni
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).hch
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).nah
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).oto
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).quy
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).shp
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).tar
- premise: a string feature. - hypothesis: a string feature. - label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
Language | ISO | Family | Dev | Test |
all_languages | -- | -- | 6457 | 7486 |
Aymara | aym | Aymaran | 743 | 750 |
Ashaninka | cni | Arawak | 658 | 750 |
Bribri | bzd | Chibchan | 743 | 750 |
Guarani | gn | Tupi-Guarani | 743 | 750 |
Nahuatl | nah | Uto-Aztecan | 376 | 738 |
Otomi | oto | Oto-Manguean | 222 | 748 |
Quechua | quy | Quechuan | 743 | 750 |
Raramuri | tar | Uto-Aztecan | 743 | 750 |
Shipibo-Konibo | shp | Panoan | 743 | 750 |
Wixarika | hch | Uto-Aztecan | 743 | 750 |
作者从 XNLI 的西班牙语子集翻译而来。
AmericasNLI 是 XNLI(Conneau et al., 2018)子集的翻译。由于西班牙语与目标语言之间的翻译者更容易找到,我们选择从西班牙语版本进行翻译。
根据 original paper 第3.1段的描述。
感谢 @fdschmidt93 添加此数据集。