数据集:

blended_skill_talk

任务:

对话

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:2004.08449
英文

"blended_skill_talk" 数据集卡

数据集摘要

这是一个包含7k个对话的数据集,旨在展示多种对话模式:展示个性,具备共情能力和展示知识。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

默认
  • 下载的数据集文件大小: 38.11 MB
  • 生成的数据集大小: 15.08 MB
  • 总计使用的磁盘空间: 53.17 MB

"train"的示例如下。

{
  'personas': ['my parents don t really speak english , but i speak italian and english.', 'i have three children.'],
  'additional_context': 'Backstreet Boys',
  'previous_utterance': ['Oh, I am a BIG fan of the Backstreet Boys!  Have you ever seen them performing live?', "No,I listen to their music a lot,  mainly the unbreakable which  is the Backstreet Boys' sixth studio album. "],
  'context': 'wizard_of_wikipedia',
  'free_messages': ['you are very knowledgeable, do you prefer nsync or bsb?', "haha kids of this days don't know them, i'm 46 and i still enjoying them, my kids only listen k-pop", "italian?haha that's strange, i only talk english and a little spanish "],
  'guided_messages': ["i don't have a preference, they are both great. All 3 of my kids get annoyed when I listen to them though.", 'Sometimes I sing their songs in Italian, that really annoys them lol.', 'My parents barely speak English, so I was taught both.  By the way, what is k-pop?'],
  'suggestions': {'convai2': ["i don't have a preference , both are pretty . do you have any hobbies ?", "do they the backstreet boys ? that's my favorite group .", 'are your kids interested in music ?'], 'empathetic_dialogues': ['I actually just discovered Imagine Dragons. I love them!', "Hahaha that just goes to show ya, age is just a umber!'", 'That would be hard! Do you now Spanish well?'], 'wizard_of_wikipedia': ['NSYNC Also had Lance Bass and Joey Fatone, sometimes called the Fat One.', 'Yes, there are a few K-Pop songs that I have heard good big in the USA. It is the most popular in South Korea and has Western elements of pop.', 'English, beleive it or not.']},
  'guided_chosen_suggestions': ['convai2', '', ''],
  'label_candidates': []}

数据字段

所有拆分的数据字段相同。

默认
  • personas :字符串特征列表。
  • additional_context :字符串特征。
  • previous_utterance :字符串特征列表。
  • context :字符串特征。
  • free_messages :字符串特征列表。
  • guided_messgaes :字符串特征列表。
  • suggestions :包含的字典特征:
    • convai2 :字符串特征。
    • empathetic_dialogues :字符串特征。
    • wizard_of_wikipedia :字符串特征。
  • guided_chosen_suggestions :字符串特征列表。
  • label_candidates :字符串特征列表的列表。

数据拆分

name train validation test
default 4819 1009 980

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和标准化

More Information Needed

谁是源语言制作人?

More Information Needed

注释

注释过程

More Information Needed

谁是注释者?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

附加信息

数据集策划者

More Information Needed

许可信息

More Information Needed

引用信息

@misc{smith2020evaluating,
    title={Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills},
    author={Eric Michael Smith and Mary Williamson and Kurt Shuster and Jason Weston and Y-Lan Boureau},
    year={2020},
    eprint={2004.08449},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
立即下载

贡献者

感谢 @lewtun @thomwolf @lhoestq @patrickvonplaten @mariamabarham 添加了此数据集。