数据集:

juletxara/xstory_cloze_mt

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

批注创建人:

found

预印本库:

arxiv:2112.10668
中文

Dataset Card for XStoryCloze MT

Dataset Summary

XStoryCloze consists of the professionally translated version of the English StoryCloze dataset (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI. This dataset is the machine-translated version of XstoryCloze to en from ru, zh, es, ar, hi, id, te, sw, eu, my.

Supported Tasks and Leaderboards

commonsense reasoning

Languages

This dataset is the machine-translated version of XstoryCloze to en from ru, zh (Simplified), es (Latin America), ar, hi, id, te, sw, eu, my.

Dataset Structure

Data Instances

  • Size of downloaded dataset files: 2.03 MB
  • Size of the generated dataset: 2.03 MB
  • Total amount of disk used: 2.05 MB

An example of 'train' looks as follows.

{'answer_right_ending': 1,
 'input_sentence_1': 'Rick grew up in a troubled household.',
 'input_sentence_2': 'He never found good support in family, and turned to gangs.',
 'input_sentence_3': "It wasn't long before Rick got shot in a robbery.",
 'input_sentence_4': 'The incident caused him to turn a new leaf.',
 'sentence_quiz1': 'He is happy now.',
 'sentence_quiz2': 'He joined a gang.',
 'story_id': '138d5bfb-05cc-41e3-bf2c-fa85ebad14e2'}

Data Fields

The data fields are the same among all splits.

  • input_sentence_1 : The first statement in the story.
  • input_sentence_2 : The second statement in the story.
  • input_sentence_3 : The third statement in the story.
  • input_sentence_4 : The forth statement in the story.
  • sentence_quiz1 : first possible continuation of the story.
  • sentence_quiz2 : second possible continuation of the story.
  • answer_right_ending : correct possible ending; either 1 or 2.
  • story_id : story id.

Data Splits

This dataset is intended to be used for evaluating the zero- and few-shot learning capabilities of multlingual language models. We split the data for each language into train and test (360 vs. 1510 examples, respectively). The released data files for different languages maintain a line-by-line alignment.

name test
ru 1510
zh 1510
es 1510
ar 1510
hi 1510
id 1510
te 1510
sw 1510
eu 1510
my 1510

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

More Information Needed

Additional Information

Dataset Curators

More Information Needed

Licensing Information

XStoryCloze is opensourced under CC BY-SA 4.0 , the same license as the original English StoryCloze.

Citation Information

@article{DBLP:journals/corr/abs-2112-10668,
  author    = {Xi Victoria Lin and
               Todor Mihaylov and
               Mikel Artetxe and
               Tianlu Wang and
               Shuohui Chen and
               Daniel Simig and
               Myle Ott and
               Naman Goyal and
               Shruti Bhosale and
               Jingfei Du and
               Ramakanth Pasunuru and
               Sam Shleifer and
               Punit Singh Koura and
               Vishrav Chaudhary and
               Brian O'Horo and
               Jeff Wang and
               Luke Zettlemoyer and
               Zornitsa Kozareva and
               Mona T. Diab and
               Veselin Stoyanov and
               Xian Li},
  title     = {Few-shot Learning with Multilingual Language Models},
  journal   = {CoRR},
  volume    = {abs/2112.10668},
  year      = {2021},
  url       = {https://arxiv.org/abs/2112.10668},
  eprinttype = {arXiv},
  eprint    = {2112.10668},
  timestamp = {Tue, 04 Jan 2022 15:59:27 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributions

Thanks to @juletx .