数据集:

dengue_filipino

语言:

tl

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

crowdsourced

源数据集:

original
中文

Dataset Card for Dengue Dataset in Filipino

Dataset Summary

Benchmark dataset for low-resource multiclass classification, with 4,015 training, 500 testing, and 500 validation examples, each labeled as part of five classes. Each sample can be a part of multiple classes. Collected as tweets.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The dataset is primarily in Filipino, with the addition of some English words commonly used in Filipino vernacular.

Dataset Structure

Data Instances

Sample data:

{
  "text": "Tapos ang dami pang lamok.",
  "absent": "0",
  "dengue": "0",
  "health": "0",
  "mosquito": "1",
  "sick": "0"
}

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Jan Christian Cruz

Licensing Information

[More Information Needed]

Citation Information

@INPROCEEDINGS{8459963, author={E. D. {Livelo} and C. {Cheng}}, booktitle={2018 IEEE International Conference on Agents (ICA)}, title={Intelligent Dengue Infoveillance Using Gated Recurrent Neural Learning and Cross-Label Frequencies}, year={2018}, volume={}, number={}, pages={2-7}, doi={10.1109/AGENTS.2018.8459963}} }

Contributions

Thanks to @anaerobeth for adding this dataset.