数据集:

philschmid/emotion

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

machine-generated

批注创建人:

machine-generated

源数据集:

original

许可:

other
英文

"emotion"数据集的数据卡片

数据集摘要

Emotion是一个包含六种基本情绪(愤怒,恐惧,喜悦,爱,悲伤和惊讶)的英文推特消息数据集。详细信息请参考论文。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

一个示例如下。

{
  "text": "im feeling quite sad and sorry for myself but ill snap out of it soon",
  "label": 0
}

数据字段

数据字段包括:

  • text:一个字符串特征。
  • label:一个分类标签,可能的值包括悲伤(0),喜悦(1),爱(2),愤怒(3),恐惧(4),惊讶(5)。

数据拆分

数据集有两个配置:

  • 拆分:共有20,000个示例,拆分为训练、验证和测试集。
  • 未拆分:一个单独的训练集,共有416,809个示例。
name train validation test
split 16000 2000 2000
unsplit 416809 n/a n/a

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和标准化

More Information Needed

源语言制作者是谁?

More Information Needed

注释

注释流程

More Information Needed

注释者是谁?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据的社会影响

More Information Needed

潜在偏见讨论

More Information Needed

其他已知限制

More Information Needed

附加信息

数据集策划者

More Information Needed

许可信息

该数据集仅用于教育和研究目的。

引用信息

如果使用该数据集,请引用:

@inproceedings{saravia-etal-2018-carer,
    title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
    author = "Saravia, Elvis  and
      Liu, Hsien-Chi Toby  and
      Huang, Yen-Hao  and
      Wu, Junlin  and
      Chen, Yi-Shin",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D18-1404",
    doi = "10.18653/v1/D18-1404",
    pages = "3687--3697",
    abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}

贡献

感谢 @lhoestq @thomwolf @lewtun 添加了这个数据集。