数据集:
flores
任务:
翻译计算机处理:
translation大小:
1K<n<10K语言创建人:
found批注创建人:
found预印本库:
arxiv:1902.01382许可:
cc-by-4.0低资源机器翻译的评估数据集:尼泊尔语-英语和僧伽罗语-英语。
"验证集"的示例如下所示。
This example was too long and was cropped: { "translation": "{\"en\": \"This is the wrong translation!\", \"ne\": \"यस वाहेक आगम पूजा, तारा पूजा, व्रत आदि पनि घरभित्र र वाहिर दुवै स्थानमा गरेको पा..." }sien
"验证集"的示例如下所示。
This example was too long and was cropped: { "translation": "{\"en\": \"This is the wrong translation!\", \"si\": \"එවැනි ආවරණයක් ලබාදීමට රක්ෂණ සපයන්නෙකු කැමති වුවත් ඒ සාමාන් යයෙන් බොහෝ රටවල පොදු ..." }
数据字段在所有拆分中都相同。
neenname | validation | test |
---|---|---|
neen | 2560 | 2836 |
sien | 2899 | 2767 |
@misc{guzmn2019new, title={Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English}, author={Francisco Guzman and Peng-Jen Chen and Myle Ott and Juan Pino and Guillaume Lample and Philipp Koehn and Vishrav Chaudhary and Marc'Aurelio Ranzato}, year={2019}, eprint={1902.01382}, archivePrefix={arXiv}, primaryClass={cs.CL} }
感谢 @thomwolf , @patrickvonplaten , @lewtun 添加了此数据集。