数据集:
silver/personal_dialog
任务:
对话子任务:
dialogue-generation语言:
zh计算机处理:
monolingual大小:
10M<n<100M语言创建人:
found批注创建人:
no-annotation源数据集:
original预印本库:
arxiv:1901.09672许可:
otherThe PersonalDialog dataset is a large-scale multi-turn Chinese dialogue dataset containing various traits from a large number of speakers. We are releasing about 5M sessions of carefully filtered dialogues. Each utterance in PersonalDialog is associated with a speaker marked with traits like Gender, Location, Interest Tags.
PersonalDialog is in Chinese
PersonalDialog中的对话是中文的
train split:
{ "dialog": ["那么 晚", "加班 了 刚 到 家 呀 !", "吃饭 了 么", "吃 过 了 !"], "profile": [ { "tag": ["间歇性神经病", "爱笑的疯子", "他们说我犀利", "爱做梦", "自由", "旅游", "学生", "双子座", "好性格"], "loc": "福建 厦门", "gender": "male" }, { "tag": ["设计师", "健康养生", "热爱生活", "善良", "宅", "音樂", "时尚"], "loc": "山东 济南", "gender": "male" } ], "uid": [0, 1, 0, 1], }
dev and test split:
{ "dialog": ["没 人性 啊 !", "可以 来 组织 啊", "来 上海 陪姐 打 ?"], "profile": [ {"tag": [""], "loc": "上海 浦东新区", "gender": "female"}, {"tag": ["嘉庚", "keele", "leicester", "UK", "泉州五中"], "loc": "福建 泉州", "gender": "male"}, ], "uid": [0, 1, 0], "responder_profile": {"tag": ["嘉庚", "keele", "leicester", "UK", "泉州五中"], "loc": "福建 泉州", "gender": "male"}, "golden_response": "吴经理 派车来 小 泉州 接 么 ?", "is_biased": true, }
train | valid | test |
---|---|---|
5,438,165 | 10,521 | 10,523 |
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
other-weibo
This dataset is collected from Weibo. You can refer to the detailed policy required to use this dataset. Please restrict the usage of this dataset to non-commerical purposes.
@article{zheng2019personalized, title = {Personalized dialogue generation with diversified traits}, author = {Zheng, Yinhe and Chen, Guanyi and Huang, Minlie and Liu, Song and Zhu, Xuan}, journal = {arXiv preprint arXiv:1901.09672}, year = {2019} } @inproceedings{zheng2020pre, title = {A pre-training based personalized dialogue generation model with persona-sparse data}, author = {Zheng, Yinhe and Zhang, Rongsheng and Huang, Minlie and Mao, Xiaoxi}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {34}, number = {05}, pages = {9693--9700}, year = {2020} }
Thanks to Yinhe Zheng for adding this dataset.