数据集:

cdt

语言:

pl

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

other

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for [Dataset Name]

Dataset Summary

The Cyberbullying Detection task was part of 2019 edition of PolEval competition. The goal is to predict if a given Twitter message contains a cyberbullying (harmful) content.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • sentence: an anonymized tweet in polish
  • target: 1 if tweet is described as bullying, 0 otherwise. The test set doesn't have labels so -1 is used instead.

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

BSD 3-Clause

Citation Information

[More Information Needed]

Contributions

Thanks to @abecadel for adding this dataset.