数据集:
un_pc
任务:
翻译计算机处理:
multilingual大小:
10M<n<100M语言创建人:
found批注创建人:
found源数据集:
original许可:
license:unknown这个平行语料库包括了过去25年(1990年至2014年)联合国文件的手动翻译版本,涵盖了六种官方联合国语言,即阿拉伯语、中文、英语、法语、俄语和西班牙语。共有6种语言,15个双语文本
底层任务是机器翻译。
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
源语言产生者是谁?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
@inproceedings{ziemski-etal-2016-united, title = "The {U}nited {N}ations Parallel Corpus v1.0", author = "Ziemski, Micha{\\l} and Junczys-Dowmunt, Marcin and Pouliquen, Bruno", booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)", month = may, year = "2016", address = "Portoro{\v{z}}, Slovenia", publisher = "European Language Resources Association (ELRA)", url = "https://www.aclweb.org/anthology/L16-1561", pages = "3530--3534", abstract = "This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish. The corpus is freely available for download under a liberal license. Apart from the pairwise aligned documents, a fully aligned subcorpus for the six official UN languages is distributed. We provide baseline BLEU scores of our Moses-based SMT systems trained with the full data of language pairs involving English and for all possible translation directions of the six-way subcorpus.", }
感谢 @patil-suraj 添加了这个数据集。