数据集:

GEM/dstc10_track2_task2

任务:

对话

语言:

en

计算机处理:

unknown

语言创建人:

unknown

批注创建人:

none

源数据集:

original

许可:

apache-2.0
英文

GEM/dstc10_track2_task2 数据集卡片

主要数据卡片链接

您可以在此处找到主要的数据卡片: GEM Website

数据集摘要

DSTC10 Track2 Task 2 跟随DSTC9 Track1任务,参与者需要实现基于知识的对话系统。训练数据集继承自DSTC9挑战,并且属于书面领域,而测试集是新收集的,由噪声自动语音识别转录组成。因此,该数据集有助于构建基于知识的对话生成模型。

你可以通过以下方式加载数据集:

import datasets
data = datasets.load_dataset('GEM/dstc10_track2_task2')

数据加载器可以在此处找到: here

网站: https://github.com/alexa/alexa-with-dstc10-track2-dataset

论文: https://assets.amazon.science/54/a1/5282d47044179737b4289622c824/how-robust-are-you-evaluating-task-oriented-dialogue-systems-on-spoken-conversations.pdf

作者:Seokhwan Kim,Yang Liu,Di Jin,Alexandros Papangelis,Karthik Gopalakrishnan,Behnam Hedayatnia,Dilek Hakkani-Tur (Amazon Alexa AI)

数据集概述

数据和文档的获取途径

网页: https://github.com/alexa/alexa-with-dstc10-track2-dataset

下载: https://github.com/alexa/alexa-with-dstc10-track2-dataset

论文: https://assets.amazon.science/54/a1/5282d47044179737b4289622c824/how-robust-are-you-evaluating-task-oriented-dialogue-systems-on-spoken-conversations.pdf

BibTex:@inproceedings{kim2021robust, title={" How Robust ru?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations}, author={Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hedayatnia, Behnam and Hakkani-Tur, Dilek}, journal={IEEE Automatic Speech Recognition and Understanding Workshop}, year={2021}}

联系人姓名:Seokhwan Kim

联系人电子邮件:seokhwk@amazon.com

是否有排行榜:

排行榜链接: https://eval.ai/challenge/1663/overview

排行榜详细信息:根据任务论文中定义的检测、选择和生成的自动度量标准评估模型。

语言和使用目的

多语言:

涵盖的语言:En

许可证:apache-2.0: Apache License 2.0

期望的使用:对话状态跟踪和知识相关响应生成的研究

主要任务:对话响应生成

交流目标:该数据集旨在探索训练在口语数据上的对话模型的鲁棒性,具有两个方面,即多领域对话状态跟踪和具备对非结构化知识访问的对话模型。

信用

策划组织类型:行业

策划组织:Amazon

数据集创建者:Seokhwan Kim,Yang Liu,Di Jin,Alexandros Papangelis,Karthik Gopalakrishnan,Behnam Hedayatnia,Dilek Hakkani-Tur (Amazon Alexa AI)

资金:Amazon

谁将数据集添加到GEM中:Alexandros Papangelis (Amazon Alexa AI),Di Jin (Amazon Alexa AI),Nico Daheim (RWTH Aachen University)

数据集结构

数据字段:

   features = datasets.Features(
        {
            "id": datasets.Value("string"),
            "gem_id": datasets.Value("string"),
            "turns": [
                {
                    "speaker": datasets.Value("string"),
                    "text": datasets.Value("string"),
                    "nbest": [
                        {
                            "hyp": datasets.Value("string"),
                            "score": datasets.Value("float"),
                        }
                    ],
                }
            ],
            "knowledge": {
                "domain": datasets.Value("string"),
                "entity_name": datasets.Value("string"),
                "title": datasets.Value("string"),
                "body": datasets.Value("string"),
            },
            "response": datasets.Value("string"),
            "source": datasets.Value("string"),
            "linearized_input": datasets.Value("string"),
            "target": datasets.Value("string"),
            "references": [datasets.Value("string")],
        }
    )

nbest:包含ASR系统生成的nbest列表输出及其分数。

knowledge:定义了注释的相关信息及其元数据。

数据结构原因:与MultiWox 2.X数据保持兼容。

示例实例:{'id': '0', 'gem_id': 'GEM-dstc10_track2_task2-test-0', 'turns': [{'speaker': 'U', 'text': "hi uh i'm looking for restaurant in lower ha", 'nbest': [{'hyp': "hi uh i'm looking for restaurant in lower ha", 'score': -25.625450134277344}, {'hyp': "hi uh i'm looking for restaurant in lower hai", 'score': -25.969446182250977}, {'hyp': "hi uh i'm looking for restaurant in lower haig", 'score': -32.816890716552734}, {'hyp': "hi uh i'm looking for restaurant in lower haigh", 'score': -32.84316635131836}, {'hyp': "hi uh i'm looking for restaurant in lower hag", 'score': -32.8637580871582}, {'hyp': "hi uh i'm looking for restaurant in lower hah", 'score': -33.1048698425293}, {'hyp': "hi uh i'm looking for restaurant in lower hait", 'score': -33.96509552001953}, {'hyp': "hi um i'm looking for restaurant in lower hai", 'score': -33.97885513305664}, {'hyp': "hi um i'm looking for restaurant in lower haig", 'score': -34.56083679199219}, {'hyp': "hi um i'm looking for restaurant in lower haigh", 'score': -34.58711242675781}]}, {'speaker': 'S', 'text': 'yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for', 'nbest': []}, {'speaker': 'U', 'text': 'yeah umm am looking for an expensive restaurant', 'nbest': [{'hyp': 'yeah umm am looking for an expensive restaurant', 'score': -21.272899627685547}, {'hyp': 'yeah umm m looking for an expensive restaurant', 'score': -21.444047927856445}, {'hyp': 'yeah umm a m looking for an expensive restaurant', 'score': -21.565458297729492}, {'hyp': 'yeah ummm am looking for an expensive restaurant', 'score': -21.68832778930664}, {'hyp': 'yeah ummm m looking for an expensive restaurant', 'score': -21.85947608947754}, {'hyp': 'yeah ummm a m looking for an expensive restaurant', 'score': -21.980886459350586}, {'hyp': "yeah umm a'm looking for an expensive restaurant", 'score': -22.613924026489258}, {'hyp': "yeah ummm a'm looking for an expensive restaurant", 'score': -23.02935218811035}, {'hyp': 'yeah um am looking for an expensive restaurant', 'score': -23.11180305480957}, {'hyp': 'yeah um m looking for an expensive restaurant', 'score': -23.28295135498047}]}, {'speaker': 'S', 'text': "lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive", 'nbest': []}, {'speaker': 'U', 'text': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'nbest': [{'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.492507934570312}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.75853729248047}, {'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.44410514831543}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.710134506225586}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price range", 'score': -31.136560440063477}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -33.088157653808594}, {'hyp': "sure ummm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.127620697021484}, {'hyp': "sure umm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.39365005493164}, {'hyp': "sure ummm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.43605041503906}, {'hyp': "sure umm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.70207977294922}]}, {'speaker': 'S', 'text': 'ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound', 'nbest': []}, {'speaker': 'U', 'text': 'that sounds good ummm do they have any sort of happy hour special', 'nbest': [{'hyp': 'that sounds good ummm do they have any sort of happy hour special', 'score': -30.316478729248047}, {'hyp': 'that sounds good umm do they have any sort of happy hour special', 'score': -30.958009719848633}, {'hyp': 'that sounds good um do they have any sort of happy hour special', 'score': -34.463165283203125}, {'hyp': 'that sounds good ummm do they have any sirt of happy hour special', 'score': -34.48350143432617}, {'hyp': 'that sounds good umm do they have any sirt of happy hour special', 'score': -35.12503433227539}, {'hyp': 'that sounds good ummm do they have any sord of happy hour special', 'score': -35.61939239501953}, {'hyp': 'that sounds good umm do they have any sord of happy hour special', 'score': -36.26092529296875}, {'hyp': 'that sounds good ummm do they have any sont of happy hour special', 'score': -37.697105407714844}, {'hyp': 'that sounds good umm do they have any sont of happy hour special', 'score': -38.33863830566406}, {'hyp': 'that sounds good um do they have any sirt of happy hour special', 'score': -38.630191802978516}]}], 'knowledge': {'domain': 'restaurant', 'entity_name': 'The View Lounge', 'title': 'Does The View Lounge offer happy hour?', 'body': 'The View Lounge offers happy hour.'}, 'response': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'source': 'sf_spoken', 'linearized_input': " hi uh i'm looking for restaurant in lower ha yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for yeah umm am looking for an expensive restaurant lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive sure ummm maybe if you don't have anything expensive how about something in the moderate price range ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound that sounds good ummm do they have any sort of happy hour special || knowledge domain: restaurant, entity: The View Lounge, title: Does The View Lounge offer happy hour?, information: The View Lounge offers happy hour.", 'target': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'references': ['uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour']}