数据集:

senti_lex

任务:

文本分类

子任务:

sentiment-classification

语言:

计算机处理:

multilingual

大小:

1K<n<10K n<1K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original

许可:

gpl-3.0

数据集介绍文件清单

中文

Dataset Card for SentiWS

Dataset Summary

This dataset add sentiment lexicons for 81 languages generated via graph propagation based on a knowledge graph--a graphical representation of real-world entities and the links between them

Supported Tasks and Leaderboards

Sentiment-Classification

Languages

Afrikaans Aragonese Arabic Azerbaijani Belarusian Bulgarian Bengali Breton Bosnian Catalan; Valencian Czech Welsh Danish German Greek, Modern Esperanto Spanish; Castilian Estonian Basque Persian Finnish Faroese French Western Frisian Irish Scottish Gaelic; Gaelic Galician Gujarati Hebrew (modern) Hindi Croatian Haitian; Haitian Creole Hungarian Armenian Interlingua Indonesian Ido Icelandic Italian Japanese Georgian Khmer Kannada Korean Kurdish Kirghiz, Kyrgyz Latin Luxembourgish, Letzeburgesch Lithuanian Latvian Macedonian Marathi (Marāṭhī) Malay Maltese Dutch Norwegian Nynorsk Norwegian Polish Portuguese Romansh Romanian, Moldavian, Moldovan Russian Slovak Slovene Albanian Serbian Swedish Swahili Tamil Telugu Thai Turkmen Tagalog Turkish Ukrainian Urdu Uzbek Vietnamese Volapük Walloon Yiddish Chinese Zhoa

Dataset Structure

Data Instances

{
"word":"die",
"sentiment": 0, #"negative"
}

Data Fields

word: one word as a string,
sentiment-score: the sentiment classification of the word as a string either negative (0) or positive (1)

Data Splits

[Needs More Information]

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

GNU General Public License v3.

It is distributed here under the GNU General Public License . Note that this is the full GPL, which allows many free uses, but does not allow its incorporation into any type of distributed proprietary software, even in part or in translation. For commercial applications please contact the dataset creators (see "Citation Information").

Citation Information

This dataset was collected by Yanqing Chen and Steven Skiena. If you use it in your work, please cite the following paper:

@inproceedings{chen-skiena-2014-building,
    title = "Building Sentiment Lexicons for All Major Languages",
    author = "Chen, Yanqing  and
      Skiena, Steven",
    booktitle = "Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jun,
    year = "2014",
    address = "Baltimore, Maryland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P14-2063",
    doi = "10.3115/v1/P14-2063",
    pages = "383--389",
}

Contributions

Thanks to @KMFODA for adding this dataset.

作者:

佚名

数据集大小:

1.58 MB