数据集:

allenai/prosocial-dialog

任务:

对话

文本分类

子任务:

dialogue-generation multi-class-classification

语言:

计算机处理:

monolingual

大小:

10K<n<100K 100K<n<1M

语言创建人:

crowdsourced machine-generated

批注创建人:

crowdsourced

源数据集:

original extended|social_bias_frames

预印本库:

arxiv:2205.12688

其他:

dialogue dialogue safety social norm dialogue+safety social+norm

许可:

cc-by-4.0

数据集介绍文件清单

英文

ProsocialDialog 数据集数据卡片

数据集概述

ProsocialDialog 是第一个大规模的多轮英文对话数据集，用于教授会话代理人如何按照社会规范回应问题内容。ProsocialDialog覆盖了多样的不道德、有问题的、有偏见的和有毒的情境，其中包含了鼓励亲社会行为的回应，基于常识的社交规则（即，经验法则）。通过人工智能与人类协同的框架创建，ProsocialDialog包含58K个对话、331K个话语、160K个独特的经验法则和497K个对话安全标签，附带自由形式的理由说明。

支持的任务

对话回应生成
对话安全性预测
经验法则生成

语言

英语

数据集结构

数据属性

attribute	type	description
context	str	the potentially unsafe utterance
response	str	the guiding utterance grounded on rules-of-thumb ( rots )
rots	list of str\|null	the relevant rules-of-thumb for text not labeled as __casual__
safety_label	str	the final verdict of the context according to safety_annotations : {__casual__, __possibly_needs_caution__, __probably_needs_caution__, __needs_caution__, __needs_intervention__}
safety_annotations	list of str	raw annotations from three workers: {casual, needs caution, needs intervention}
safety_annotation_reasons	list of str	the reasons behind the safety annotations in free-form text from each worker
source	str	the source of the seed text that was used to craft the first utterance of the dialogue: {socialchemistry, sbic, ethics_amt, ethics_reddit}
etc	str\|null	other information
dialogue_id	int	the dialogue index
response_id	int	the response index
episode_done	bool	an indicator of whether it is the end of the dialogue

数据集创建

为了创建ProsocialDialog，我们建立了一个人工智能与人类协同的数据创建框架，其中GPT-3生成潜在的不安全话语，众包工作者为其提供亲社会的回应。通过这种方法，我们能够绕过两个重大挑战：（1）目前没有可用的人类之间多轮亲社会对话的大规模语料库，以及（2）要求人类编写不道德、有毒或有问题的话语可能会导致心理伤害（Roberts，2017；Steiger等，2021年）。

其他详细信息、社会影响和限制

请参考我们的 paper 。

附加信息

引用

如果您发现本资源库中的资源有用，请引用我们的工作：

@inproceedings{kim2022prosocialdialog,
    title={ProsocialDialog: A Prosocial Backbone for Conversational Agents},
    author={Hyunwoo Kim and Youngjae Yu and Liwei Jiang and Ximing Lu and Daniel Khashabi and Gunhee Kim and Yejin Choi and Maarten Sap},
    booktitle={EMNLP},
    year=2022
}

作者:

allenai

数据集大小:

111.7 MB