数据集:
capes
任务:
计算机处理:
multilingual大小:
1M<n<10M语言创建人:
found批注创建人:
found源数据集:
original许可:
A parallel corpus of theses and dissertations abstracts in English and Portuguese were collected from the CAPES website (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) - Brazil. The corpus is sentence aligned for all language pairs. Approximately 240,000 documents were collected and aligned using the Hunalign algorithm.
The underlying task is machine translation.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{soares2018parallel,
title={A Parallel Corpus of Theses and Dissertations Abstracts},
author={Soares, Felipe and Yamashita, Gabrielli Harumi and Anzanello, Michel Jose},
booktitle={International Conference on Computational Processing of the Portuguese Language},
pages={345--352},
year={2018},
organization={Springer}
}
Thanks to @patil-suraj for adding this dataset.