数据集:

daily_dialog

任务:

文本分类

子任务:

multi-label-classification

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

其他:

emotion-classification dialog-act-classification

许可:

cc-by-nc-sa-4.0

数据集介绍文件清单

英文

daily_dialog的数据集卡片

数据集摘要

我们开发了一个高质量的多轮对话数据集，DailyDialog，它在几个方面非常引人入胜。该数据集的语言是人工编写的，噪音较小。数据集中的对话反映了我们日常的交流方式，并涵盖了关于日常生活的各种主题。我们还手动为开发的数据集添加了交流意图和情感信息。然后，我们评估了在DailyDialog数据集上的现有方法，并希望它对对话系统研究领域有所裨益。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

默认

下载的数据集文件大小：4.48 MB
生成的数据集大小：8.63 MB
使用的总磁盘空间：13.11 MB

“验证”示例如下所示。

This example was too long and was cropped:

{
    "act": [2, 1, 1, 1, 1, 2, 3, 2, 3, 4],
    "dialog": "[\"Good afternoon . This is Michelle Li speaking , calling on behalf of IBA . Is Mr Meng available at all ? \", \" This is Mr Meng ...",
    "emotion": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
}

数据字段

所有拆分的数据字段相同。

默认

对话：一系列字符串特征。
行为：分类标签列表，可能的值包括__dummy__（0）、inform（1）、question（2）、directive（3）和commissive（4）。
情感：分类标签列表，可能的值包括没有情感（0）、愤怒（1）、厌恶（2）、恐惧（3）、快乐（4）、悲伤（5）和惊讶（6）。

数据拆分

name	train	validation	test
default	11118	1000	1000

数据集创建

策划原因

More Information Needed

源数据

初始数据收集和标准化

More Information Needed

谁是源语言的生产者？

More Information Needed

注释

注释过程

More Information Needed

谁是注释者？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的考虑事项

数据的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

仅供研究目的提供数据集。有关详细信息，请查看数据集许可证。

其他信息

数据集策划者

More Information Needed

许可信息

DailyDialog数据集的许可证为 CC BY-NC-SA 4.0 。

引用信息

@InProceedings{li2017dailydialog,
    author = {Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi},
    title = {DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset},
    booktitle = {Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
    year = {2017}
}

贡献

感谢 @thomwolf 和 @julien-c 添加了这个数据集。

作者:

佚名

数据集大小:

15.41 KB