数据集:

udhr

任务:

翻译

计算机处理:

multilingual

大小:

n<1K

语言创建人:

found

批注创建人:

no-annotation

源数据集:

original
中文

Dataset Card for The Universal Declaration of Human Rights (UDHR)

Dataset Summary

The Universal Declaration of Human Rights (UDHR) is a milestone document in the history of human rights. Drafted by representatives with different legal and cultural backgrounds from all regions of the world, it set out, for the first time, fundamental human rights to be universally protected. The Declaration was adopted by the UN General Assembly in Paris on 10 December 1948 during its 183rd plenary meeting.

© 1996 – 2009 The Office of the High Commissioner for Human Rights

This plain text version prepared by the "UDHR in Unicode" project, https://www.unicode.org/udhr .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The dataset includes translations of the document in over 400 languages and dialects. The list of languages can be found here .

Dataset Structure

Data Instances

Each instance corresponds to a different language and includes information about the language and the full document text.

Data Fields

  • text : The full document text with each line of text delimited by a newline ( \n ).
  • lang_key : The unique identifier of a given translation.
  • lang_name : The textual description of language/dialect.
  • iso639-3 : The iso639-3 language identifier.
  • iso15924 : The iso15924 language identifier.
  • bcp47 : The BCP 47 language identifier.

Data Splits

Only a train split included which includes the full document in all languages.

train
Number of examples 488

Dataset Creation

Curation Rationale

In addition to its social significance, the document set a world record in 1999 for being the most translated document in the world and as such can be useful for settings requiring paired text between many languages.

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

In addition to the social and political significance of the United Nations' Universal Declaration of Human Rights, the document set a world record in 1999 for being the most translated document in the world and as such can be useful for settings requiring paired text between many languages including those that are low resource and significantly underrepresented in NLP research.

Discussion of Biases

[More Information Needed]

Other Known Limitations

Although the document is translated into a very large number of languages, the text is very short and therefore may have limited usefulness for most types of modeling and evaluation.

Additional Information

Dataset Curators

The txt/xml data files used here were compiled by The Unicode Consortium, which can be found here . The original texts can be found on the United Nations website .

Licensing Information

Source text © 1996 – 2022 The Office of the High Commissioner for Human Rights

The Unicode license applies to these translations.

Citation Information

United Nations. (1998). The Universal Declaration of Human Rights, 1948-1998. New York: United Nations Dept. of Public Information.

Contributions

Thanks to @joeddav for adding this dataset. Updated May 2022 @leondz .