数据集:

blended_skill_talk

任务:

对话

子任务:

dialogue-generation

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:2004.08449

许可:

license:unknown

数据集介绍文件清单

中文

Dataset Card for "blended_skill_talk"

Dataset Summary

A dataset of 7k conversations explicitly designed to exhibit multiple conversation modes: displaying personality, having empathy, and demonstrating knowledge.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

default

Size of downloaded dataset files: 38.11 MB
Size of the generated dataset: 15.08 MB
Total amount of disk used: 53.17 MB

An example of 'train' looks as follows.

{
'personas': ['my parents don t really speak english , but i speak italian and english.', 'i have three children.'],
'additional_context': 'Backstreet Boys',
'previous_utterance': ['Oh, I am a BIG fan of the Backstreet Boys! Have you ever seen them performing live?', "No,I listen to their music a lot, mainly the unbreakable which is the Backstreet Boys' sixth studio album. "],
'context': 'wizard_of_wikipedia',
'free_messages': ['you are very knowledgeable, do you prefer nsync or bsb?', "haha kids of this days don't know them, i'm 46 and i still enjoying them, my kids only listen k-pop", "italian?haha that's strange, i only talk english and a little spanish "],
'guided_messages': ["i don't have a preference, they are both great. All 3 of my kids get annoyed when I listen to them though.", 'Sometimes I sing their songs in Italian, that really annoys them lol.', 'My parents barely speak English, so I was taught both. By the way, what is k-pop?'],
'suggestions': {'convai2': ["i don't have a preference , both are pretty . do you have any hobbies ?", "do they the backstreet boys ? that's my favorite group .", 'are your kids interested in music ?'], 'empathetic_dialogues': ['I actually just discovered Imagine Dragons. I love them!', "Hahaha that just goes to show ya, age is just a umber!'", 'That would be hard! Do you now Spanish well?'], 'wizard_of_wikipedia': ['NSYNC Also had Lance Bass and Joey Fatone, sometimes called the Fat One.', 'Yes, there are a few K-Pop songs that I have heard good big in the USA. It is the most popular in South Korea and has Western elements of pop.', 'English, beleive it or not.']},
'guided_chosen_suggestions': ['convai2', '', ''],
'label_candidates': []}

Data Fields

The data fields are the same among all splits.

default

personas : a list of string features.
additional_context : a string feature.
previous_utterance : a list of string features.
context : a string feature.
free_messages : a list of string features.
guided_messgaes : a list of string features.
suggestions : a dictionary feature containing:
- convai2 : a string feature.
- empathetic_dialogues : a string feature.
- wizard_of_wikipedia : a string feature.
guided_chosen_suggestions : a list of string features.
label_candidates : a list of lists of string features.

Data Splits

name	train	validation	test
default	4819	1009	980

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Additional Information

Dataset Curators

More Information Needed

Licensing Information

More Information Needed

Citation Information

@misc{smith2020evaluating,
    title={Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills},
    author={Eric Michael Smith and Mary Williamson and Kurt Shuster and Jason Weston and Y-Lan Boureau},
    year={2020},
    eprint={2004.08449},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Contributions

Thanks to @lewtun , @thomwolf , @lhoestq , @patrickvonplaten , @mariamabarham for adding this dataset.

作者:

佚名

数据集大小:

19.11 KB

Dataset Card for "blended_skill_talk"

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Annotations

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions