数据集:

polemo2

语言:

pl

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

other

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for [Dataset Name]

Dataset Summary

The PolEmo2.0 is a set of online reviews from medicine and hotels domains. The task is to predict the sentiment of a review. There are two separate test sets, to allow for in-domain (medicine and hotels) as well as out-of-domain (products and university) validation.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • sentence: string, the review
  • target: sentiment of the sentence class

The same tag system is used in plWordNet Emo for lexical units: [+m] (strong positive), [+s] (weak positive), [-m] (strong negative), [-s] (weak negative), [amb] (ambiguous) and [0] (neutral).

Note that the test set doesn't have targets so -1 is used instead

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

Dataset provided for research purposes only. Please check dataset license for additional information.

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-NC-SA 4.0

Citation Information

[More Information Needed]

Contributions

Thanks to @abecadel for adding this dataset.