数据集:

emo

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

expert-generated

源数据集:

original
英文

"emo" 数据集卡片

数据集概述

在这个数据集中,给定一个文本对话,即一个话语以及两个先前回合的上下文,目标是通过从四种情绪类别(快乐、悲伤、愤怒和其他)中选择,推断出话语的潜在情绪。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

emo2019
  • 下载的数据集文件大小: 3.37 MB
  • 生成的数据集大小: 2.85 MB
  • 总使用的磁盘空间: 6.22 MB

"train" 的示例如下所示。

{
    "label": 0,
    "text": "don't worry  i'm girl hmm how do i know if you are what's ur name"
}

数据字段

数据字段在所有拆分中都是相同的。

emo2019
  • text : 一个字符串特征。
  • label : 一个分类标签,可能的值包括 others (0), happy (1), sad (2), angry (3)。

数据拆分

name train test
emo2019 30160 5509

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和规范化

More Information Needed

谁是源语言的生产者?

More Information Needed

注释

注释过程

More Information Needed

谁是标注者?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据集的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

其他信息

数据集的策划者

More Information Needed

许可信息

More Information Needed

引用信息

@inproceedings{chatterjee-etal-2019-semeval,
    title={SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text},
    author={Ankush Chatterjee and Kedhar Nath Narahari and Meghana Joshi and Puneet Agrawal},
    booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation},
    year={2019},
    address={Minneapolis, Minnesota, USA},
    publisher={Association for Computational Linguistics},
    url={https://www.aclweb.org/anthology/S19-2005},
    doi={10.18653/v1/S19-2005},
    pages={39--48},
    abstract={In this paper, we present the SemEval-2019 Task 3 - EmoContext: Contextual Emotion Detection in Text. Lack of facial expressions and voice modulations make detecting emotions in text a challenging problem. For instance, as humans, on reading ''Why don't you ever text me!'' we can either interpret it as a sad or angry emotion and the same ambiguity exists for machines. However, the context of dialogue can prove helpful in detection of the emotion. In this task, given a textual dialogue i.e. an utterance along with two previous turns of context, the goal was to infer the underlying emotion of the utterance by choosing from four emotion classes - Happy, Sad, Angry and Others. To facilitate the participation in this task, textual dialogues from user interaction with a conversational agent were taken and annotated for emotion classes after several data processing steps. A training data set of 30160 dialogues, and two evaluation data sets, Test1 and Test2, containing 2755 and 5509 dialogues respectively were released to the participants. A total of 311 teams made submissions to this task. The final leader-board was evaluated on Test2 data set, and the highest ranked submission achieved 79.59 micro-averaged F1 score. Our analysis of systems submitted to the task indicate that Bi-directional LSTM was the most common choice of neural architecture used, and most of the systems had the best performance for the Sad emotion class, and the worst for the Happy emotion class}
}

贡献者

感谢 @thomwolf , @lordtt13 , @lhoestq 添加了这个数据集。