数据集:

daily_dialog

任务:

文本分类

子任务:

multi-label-classification

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

其他:

emotion-classification dialog-act-classification

许可:

cc-by-nc-sa-4.0

数据集介绍文件清单

中文

Dataset Card for "daily_dialog"

Dataset Summary

We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

default

Size of downloaded dataset files: 4.48 MB
Size of the generated dataset: 8.63 MB
Total amount of disk used: 13.11 MB

An example of 'validation' looks as follows.

This example was too long and was cropped:

{
    "act": [2, 1, 1, 1, 1, 2, 3, 2, 3, 4],
    "dialog": "[\"Good afternoon . This is Michelle Li speaking , calling on behalf of IBA . Is Mr Meng available at all ? \", \" This is Mr Meng ...",
    "emotion": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
}

Data Fields

The data fields are the same among all splits.

default

dialog : a list of string features.
act : a list of classification labels, with possible values including __dummy__ (0), inform (1), question (2), directive (3) and commissive (4).
emotion : a list of classification labels, with possible values including no emotion (0), anger (1), disgust (2), fear (3), happiness (4), sadness (5) and surprise (6).

Data Splits

name	train	validation	test
default	11118	1000	1000

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

Dataset provided for research purposes only. Please check dataset license for additional information.

Additional Information

Dataset Curators

More Information Needed

Licensing Information

DailyDialog dataset is licensed under CC BY-NC-SA 4.0 .

Citation Information

@InProceedings{li2017dailydialog,
    author = {Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi},
    title = {DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset},
    booktitle = {Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
    year = {2017}
}

Contributions

Thanks to @thomwolf , @julien-c for adding this dataset.

作者:

佚名

数据集大小:

15.41 KB