数据集:
meta_woz
子任务:
dialogue-modeling语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:2003.01680许可:
otherMetaLWOz:用于快速调整对话模型的多领域对话数据集。我们引入了Meta-Learning Wizard of Oz(MetaLWOz)对话数据集,用于开发对话模型的快速适应方法。此数据可用于训练面向任务的对话模型,特别是用于开发使用少量数据快速模拟用户响应的方法。这种快速适应模型属于迁移学习和元学习的研究领域。该数据集包含37,884个众包对话,这些对话是在Wizard of Oz的设置中两个人类用户之间记录的,其中一个被指示行为像一个机器人,另一个是真正的人类用户。用户被分配一个特定领域的任务,例如在特定餐馆预订预订,他们一起完成任务。我们的数据集包括47个领域,共227个任务。对话至少包含10个轮次。
此数据集支持多种任务。
数据集中的文本为英文(en)。
数据实例是两个众包工作者之间的完整多轮对话,一个扮演机器人的角色,另一个是用户。两者都被给予一个领域和一个任务。每个轮次有一个单独的话语,例如:
Domain: Ski User Task: You want to know if there are good ski hills an hour’s drive from your current location. Bot Task: Tell the user that there are no ski hills in their immediate location. Bot: Hello how may I help you? User: Is there any good ski hills an hour’s drive from my current location? Bot: I’m sorry to inform you that there are no ski hills in your immediate location User: Can you help me find the nearest? Bot: Absolutely! It looks like you’re about 3 hours away from Bear Mountain. That seems to be the closest. User: Hmm.. sounds good Bot: Alright! I can help you get your lift tickets now!When will you be going? User: Awesome! please get me a ticket for 10pax Bot: You’ve got it. Anything else I can help you with? User: None. Thanks again! Bot: No problem!
此对话的输入/输出示例:
Input: dialog history = Hello how may I help you?; Is there any good ski hills an hour’s drive from my current location?; I’m sorry to inform you that there are no ski hills in your immediate location Output: user response = Can you help me find the nearest?
每个对话实例都有以下字段:
每个任务实例都具有以下字段:
数据集分为训练集和测试集,大小如下:
Training MetaLWOz | Evaluation MetaLWOz | Combined | |
---|---|---|---|
Total Domains | 47 | 4 | 51 |
Total Tasks | 226 | 14 | 240 |
Total Dialogs | 37884 | 2319 | 40203 |
下面是数据集的各种统计信息:
Statistic | Mean | Minimum | Maximum |
---|---|---|---|
Number of tasks per domain | 4.8 | 3 | 11 |
Number of dialogs per domain | 806.0 | 288 | 1990 |
Number of dialogs per task | 167.6 | 32 | 285 |
Number of turns per dialog | 11.4 | 10 | 46 |
[需要更多信息]
[需要更多信息]
语言生产者是谁?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
该数据集的v1版本由来自Microsoft Research(加拿大蒙特利尔)的研究人员团队创建。
该数据集发布在 Microsoft Research Data License Agreement 下。
可以引用以下关于MetaLWOz各个版本的文章:
Version 1.0
@InProceedings{shalyminov2020fast, author = {Shalyminov, Igor and Sordoni, Alessandro and Atkinson, Adam and Schulz, Hannes}, title = {Fast Domain Adaptation For Goal-Oriented Dialogue Using A Hybrid Generative-Retrieval Transformer}, booktitle = {2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year = {2020}, month = {April}, url = {https://www.microsoft.com/en-us/research/publication/fast-domain-adaptation-for-goal-oriented-dialogue-using-a -hybrid-generative-retrieval-transformer/}, }
感谢 @pacman100 添加此数据集。