数据集:

nsmc

语言:

ko

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

found

批注创建人:

crowdsourced

源数据集:

original

许可:

cc-by-2.0
中文

Dataset Card for Naver sentiment movie corpus

Dataset Summary

[More Information Needed]

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

Each instance is a movie review written by Korean internet users on Naver, the most commonly used search engine in Korea. Each row can be broken down into the following fields:

  • id : A unique review ID, provided by Naver
  • document : The actual movie review
  • label : Binary labels for sentiment analysis, where 0 denotes negative, and 1 , positive

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@InProceedings{Park:2016,
  title        = "Naver Sentiment Movie Corpus",
  author       = "Lucy Park",
  year         = "2016",
  howpublished = {\\url{https://github.com/e9t/nsmc}}
}

Contributions

Thanks to @jaketae for adding this dataset.