数据集:
empathetic_dialogues
语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1811.00207许可:
cc-by-nc-4.0这是Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset的PyTorch原始实现。
'train'的一个示例如下。
{ "context": "sentimental", "conv_id": "hit:0_conv:1", "prompt": "I remember going to the fireworks with my best friend. There was a lot of people_comma_ but it only felt like us in the world.", "selfeval": "5|5|5_2|2|5", "speaker_idx": 1, "tags": "", "utterance": "I remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people_comma_ we felt like the only people in the world.", "utterance_idx": 1 }
所有拆分的数据字段相同。
默认name | train | validation | test |
---|---|---|---|
default | 76673 | 12030 | 10943 |
创作共用 Attribution-NonCommercial 4.0 International 。
@inproceedings{rashkin-etal-2019-towards, title = "Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset", author = "Rashkin, Hannah and Smith, Eric Michael and Li, Margaret and Boureau, Y-Lan", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P19-1534", doi = "10.18653/v1/P19-1534", pages = "5370--5381", }
感谢 @thomwolf , @patrickvonplaten , @lewtun 添加了这个数据集。