数据集:
opus_dogc
任务:
翻译计算机处理:
translation大小:
1M<n<10M语言创建人:
expert-generated批注创建人:
no-annotation源数据集:
original许可:
cc0-1.0OPUS DOGC 是由加泰罗尼亚政府官方期刊中的文件组成的数据集,使用加泰罗尼亚语和西班牙语,由加泰罗尼亚自治大学的Antoni Oliver Gonzalez提供。
[需要更多信息]
数据集是多语言的,含有以下平行文本:
[需要更多信息]
数据实例包含以下字段:
[需要更多信息]
[需要更多信息]
[需要更多信息]
谁是源语言生产者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
该数据集在 CC0 1.0 下授权为公共领域。
@inproceedings{tiedemann-2012-parallel, title = "Parallel Data, Tools and Interfaces in {OPUS}", author = {Tiedemann, J{\"o}rg}, booktitle = "Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)", month = may, year = "2012", address = "Istanbul, Turkey", publisher = "European Language Resources Association (ELRA)", url = "http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf", pages = "2214--2218", abstract = "This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.", }
感谢 @albertvillanova 添加了这个数据集。