数据集:

yoruba_bbc_topics

语言:

yo

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for Yoruba BBC News Topic Classification dataset (yoruba_bbc_topics)

Dataset Summary

A news headline topic classification dataset, similar to AG-news, for Yorùbá. The news headlines were collected from BBC Yoruba .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Yorùbá (ISO 639-1: yo)

Dataset Structure

Data Instances

An instance consists of a news title sentence and the corresponding topic label as well as publishing information (date and website id).

Data Fields

  • news_title : A news title.
  • label : The label describing the topic of the news title. Can be one of the following classes: africa, entertainment, health, nigeria, politics, sport or world.
  • date : The publication date (in Yorùbá).
  • bbc_url_id : The identifier of the article in the BBC URL.

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

[More Information Needed]

Contributions

Thanks to @michael-aloys for adding this dataset.