数据集:
juletxara/xstory_cloze
计算机处理:
multilingual大小:
1K<n<10K批注创建人:
found源数据集:
extended|story_cloze预印本库:
arxiv:2112.10668许可:
cc-by-sa-4.0XStoryCloze 数据集是 Meta AI 发布的 10 种非英语语言的 English StoryCloze dataset (2016 年春季版本)的专业翻译。
常识推理
英语(en)、俄语(ru)、简体中文(zh)、拉丁美洲西班牙语(es)、阿拉伯语(ar)、印地语(hi)、印度尼西亚语(id)、泰卢固语(te)、斯瓦希里语(sw)、巴斯克语(eu)、缅甸语(my)
“train”的示例如下。
{'answer_right_ending': 1, 'input_sentence_1': 'Rick grew up in a troubled household.', 'input_sentence_2': 'He never found good support in family, and turned to gangs.', 'input_sentence_3': "It wasn't long before Rick got shot in a robbery.", 'input_sentence_4': 'The incident caused him to turn a new leaf.', 'sentence_quiz1': 'He is happy now.', 'sentence_quiz2': 'He joined a gang.', 'story_id': '138d5bfb-05cc-41e3-bf2c-fa85ebad14e2'}
数据字段在所有拆分中都是相同的。
该数据集旨在用于评估多语言语言模型的零和少量样本学习能力。我们将每种语言的数据拆分为训练集和测试集(分别为 360 个和 1510 个示例)。不同语言的发布数据文件保持逐行对齐。
name | train | test |
---|---|---|
en | 360 | 1510 |
ru | 360 | 1510 |
zh | 360 | 1510 |
es | 360 | 1510 |
ar | 360 | 1510 |
hi | 360 | 1510 |
id | 360 | 1510 |
te | 360 | 1510 |
sw | 360 | 1510 |
eu | 360 | 1510 |
my | 360 | 1510 |
XStoryCloze 在 CC BY-SA 4.0 下开源,与原始的英文 StoryCloze 使用相同的许可证。
@article{DBLP:journals/corr/abs-2112-10668, author = {Xi Victoria Lin and Todor Mihaylov and Mikel Artetxe and Tianlu Wang and Shuohui Chen and Daniel Simig and Myle Ott and Naman Goyal and Shruti Bhosale and Jingfei Du and Ramakanth Pasunuru and Sam Shleifer and Punit Singh Koura and Vishrav Chaudhary and Brian O'Horo and Jeff Wang and Luke Zettlemoyer and Zornitsa Kozareva and Mona T. Diab and Veselin Stoyanov and Xian Li}, title = {Few-shot Learning with Multilingual Language Models}, journal = {CoRR}, volume = {abs/2112.10668}, year = {2021}, url = {https://arxiv.org/abs/2112.10668}, eprinttype = {arXiv}, eprint = {2112.10668}, timestamp = {Tue, 04 Jan 2022 15:59:27 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
感谢 @juletx 。