数据集:
albertvillanova/sat
SAT (Style Augmented Translation) dataset contains roughly 3.3 million English-Vietnamese pairs of texts.
The languages in the dataset are:
{ 'translation': { 'en': 'Rachel Pike : The science behind a climate headline', 'vi': 'Khoa học đằng sau một tiêu đề về khí hậu' } }
The dataset is split in "train" and "test".
train | test | |
---|---|---|
Number of examples | 3359574 | 7221 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Unknown.
Unknown.
Thanks to @albertvillanova for adding this dataset.