数据集:

tep_en_fa_para

任务:

翻译

语言:

en fa

计算机处理:

translation

大小:

100K<n<1M

语言创建人:

found

批注创建人:

found

源数据集:

original
中文

Dataset Card for [tep_en_fa_para]

Dataset Summary

TEP: Tehran English-Persian parallel corpus. The first free Eng-Per corpus, provided by the Natural Language and Text Processing Laboratory, University of Tehran.

Supported Tasks and Leaderboards

The underlying task is machine translation for language pair English-Persian

Languages

English, Persian

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

M. T. Pilevar, H. Faili, and A. H. Pilevar, “TEP: Tehran English-Persian Parallel Corpus”, in proceedings of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2011).

Contributions

Thanks to @spatil6 for adding this dataset.