数据集:
allenai/prosocial-dialog
ProsocialDialog 是第一个大规模的多轮英文对话数据集,用于教授会话代理人如何按照社会规范回应问题内容。ProsocialDialog覆盖了多样的不道德、有问题的、有偏见的和有毒的情境,其中包含了鼓励亲社会行为的回应,基于常识的社交规则(即,经验法则)。通过人工智能与人类协同的框架创建,ProsocialDialog包含58K个对话、331K个话语、160K个独特的经验法则和497K个对话安全标签,附带自由形式的理由说明。
英语
attribute | type | description |
---|---|---|
context | str | the potentially unsafe utterance |
response | str | the guiding utterance grounded on rules-of-thumb ( rots ) |
rots | list of str|null | the relevant rules-of-thumb for text not labeled as __casual__ |
safety_label | str | the final verdict of the context according to safety_annotations : {__casual__, __possibly_needs_caution__, __probably_needs_caution__, __needs_caution__, __needs_intervention__} |
safety_annotations | list of str | raw annotations from three workers: {casual, needs caution, needs intervention} |
safety_annotation_reasons | list of str | the reasons behind the safety annotations in free-form text from each worker |
source | str | the source of the seed text that was used to craft the first utterance of the dialogue: {socialchemistry, sbic, ethics_amt, ethics_reddit} |
etc | str|null | other information |
dialogue_id | int | the dialogue index |
response_id | int | the response index |
episode_done | bool | an indicator of whether it is the end of the dialogue |
为了创建ProsocialDialog,我们建立了一个人工智能与人类协同的数据创建框架,其中GPT-3生成潜在的不安全话语,众包工作者为其提供亲社会的回应。通过这种方法,我们能够绕过两个重大挑战:(1)目前没有可用的人类之间多轮亲社会对话的大规模语料库,以及(2)要求人类编写不道德、有毒或有问题的话语可能会导致心理伤害(Roberts,2017;Steiger等,2021年)。
请参考我们的 paper 。
如果您发现本资源库中的资源有用,请引用我们的工作:
@inproceedings{kim2022prosocialdialog, title={ProsocialDialog: A Prosocial Backbone for Conversational Agents}, author={Hyunwoo Kim and Youngjae Yu and Liwei Jiang and Ximing Lu and Daniel Khashabi and Gunhee Kim and Yejin Choi and Maarten Sap}, booktitle={EMNLP}, year=2022 }