数据集:

doc2dial

任务:

问答

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

许可:

cc-by-3.0
英文

doc2dial 数据集卡片

数据集摘要

Doc2dial 是一个以相关文档为依据的目标导向对话数据集。它包含超过4500个注释对话,平均每个对话包含14个回合,并以四个领域中的450个以上文档为基础。与先前的文档为基础的对话数据集相比,该数据集涵盖了信息获取对话中的各种对话场景。

支持的任务和排行榜

支持的任务: Shared Task ,由ACL的DialDoc21主办。

排行榜: LINK

语言

英语

数据集结构

数据实例

dialogue_domain的示例数据实例:

{
    "dial_id": "9f44c1539efe6f7e79b02eb1b413aa43",
    "doc_id": "Top 5 DMV Mistakes and How to Avoid Them#3_0",
    "domain": "dmv",
    "turns": [
        {
            "da": "query_condition",
            "references": [
                {
                    "sp_id": "4",
                    "label": "precondition"
                }
            ],
            "role": "user",
            "turn_id": 1,
            "utterance": "Hello, I forgot o update my address, can you help me with that?"
        },
        {
            "da": "response_solution",
            "references": [
                {
                    "sp_id": "6",
                    "label": "solution"
                },
                {
                    "sp_id": "7",
                    "label": "solution"
                },
                {
                    "sp_id": "4",
                    "label": "references"
                }
            ],
            "role": "agent",
            "turn_id": 2,
            "utterance": "hi, you have to report any change of address to DMV within 10 days after moving. You should do this both for the address associated with your license and all the addresses associated with all your vehicles."
        },
        {
            "da": "query_solution",
            "references": [
                {
                    "sp_id": "56",
                    "label": "solution"
                },
                {
                    "sp_id": "48",
                    "label": "references"
                }
            ],
            "role": "user",
            "turn_id": 3,
            "utterance": "Can I do my DMV transactions online?"
        },
        {
            "da": "respond_solution",
            "references": [
                {
                    "sp_id": "56",
                    "label": "solution"
                },
                {
                    "sp_id": "48",
                    "label": "references"
                }
            ],
            "role": "agent",
            "turn_id": 4,
            "utterance": "Yes, you can sign up for MyDMV for all the online transactions needed."
        },
        {
            "da": "query_condition",
            "references": [
                {
                    "sp_id": "48",
                    "label": "precondition"
                }
            ],
            "role": "user",
            "turn_id": 5,
            "utterance": "Thanks, and in case I forget to bring all of the documentation needed to the DMV office, what can I do?"
        },
        {
            "da": "respond_solution",
            "references": [
                {
                    "sp_id": "49",
                    "label": "solution"
                },
                {
                    "sp_id": "50",
                    "label": "solution"
                },
                {
                    "sp_id": "52",
                    "label": "solution"
                },
                {
                    "sp_id": "48",
                    "label": "references"
                }
            ],
            "role": "agent",
            "turn_id": 6,
            "utterance": "This happens often with our customers so that's why our website and MyDMV are so useful for our customers. Just check if you can make your transaction online so you don't have to go to the DMV Office."
        },
        {
            "da": "query_solution",
            "references": [
                {
                    "sp_id": "6",
                    "label": "solution"
                },
                {
                    "sp_id": "7",
                    "label": "solution"
                },
                {
                    "sp_id": "4",
                    "label": "references"
                }
            ],
            "role": "user",
            "turn_id": 7,
            "utterance": "Ok, and can you tell me again where should I report my new address?"
        },
        {
            "da": "respond_solution",
            "references": [
                {
                    "sp_id": "6",
                    "label": "solution"
                },
                {
                    "sp_id": "7",
                    "label": "solution"
                },
                {
                    "sp_id": "4",
                    "label": "references"
                }
            ],
            "role": "agent",
            "turn_id": 8,
            "utterance": "Sure. Any change of address must be reported to the DMV, that's for the address associated with your license and any of your vehicles."
        },
        {
            "da": "query_condition",
            "references": [
                {
                    "sp_id": "40",
                    "label": "precondition"
                }
            ],
            "role": "user",
            "turn_id": 9,
            "utterance": "Can you tell me more about Traffic points and their cost?"
        },
        {
            "da": "respond_solution",
            "references": [
                {
                    "sp_id": "41",
                    "label": "solution"
                },
                {
                    "sp_id": "43",
                    "label": "solution"
                },
                {
                    "sp_id": "40",
                    "label": "references"
                }
            ],
            "role": "agent",
            "turn_id": 10,
            "utterance": "Traffic points is the system used by DMV to track dangerous drivers. The cost of the traffic points is independent of the DRA, so you get a separate charge based on the total points you accumulate."
        }
    ]
}

document_domain的示例数据实例:

{
    "doc_id": "Benefits Planner: Retirement | Online Calculator (WEP Version)#1_0",
    "domain": "ssa",
    "doc_html_raw": "<main class=\"content\" id=\"content\" role=\"main\">\n\n<section>\n\n<div>\n<h2>\nBenefits Planner: Retirement\n</h2>\n</div>\n</section>\n\n\n<section>\n\n<div>\n\n<div>\n\n\n</div>\n\n<article>\n<section>\n\n<h3>Online Calculator (WEP Version)</h3>\n<p>The calculator shown below allows you to estimate your Social Security benefit.\nHowever, for the most accurate estimates, <a>use the Detailed Calculator</a>.</p>\n<p>You need to enter all your past earnings\n, which are shown on your <a>online </a>.</p>\n\n<p>Please Note:</p>\n<ul class=\"browser-default\">\n<li>The Online Calculator is updated periodically<span>*</span> with new benefit increases and other benefit amounts. Therefore, it is likely that your benefit estimates in the future will differ from those calculated today.</li>\n<li>The Online Calculator works on PCs and Macs with Javascript enabled.</li>\n<li>Some browsers may not allow you to print the table below. </li>\n</ul>\n<p></p>\n\n<div>\nThe Online Calculator temporarily stores information on your local computer while your browser is open. To protect your personal information, you should close your browser after you have finished your estimate.\n</div>\n<p></p>\n\n<div>\n<p>Note: If your birthday is on January 1st, we figure your benefit as if your birthday was in the previous year.</p>\n<p>If you qualify for benefits as a Survivor, your <a>full retirement age for survivors benefits</a> may be different.</p></div>\n\n<div>\n</div></section></article></div></section></main>",
    "doc_html_ts": "<main><section><div><h2 sent_id=\"1\" text_id=\"1\">Benefits Planner: Retirement</h2></div></section><section><div><article><section><h3 sent_id=\"2\" text_id=\"2\">Online Calculator (WEP Version)</h3><div tag_id=\"1\"><u sent_id=\"3\" tag_id=\"1\"><u sent_id=\"3\" tag_id=\"1\" text_id=\"3\">The calculator shown below allows you to estimate your Social Security benefit .</u></u><u sent_id=\"4\" tag_id=\"1\"><u sent_id=\"4\" tag_id=\"1\" text_id=\"4\">However ,</u><u sent_id=\"4\" tag_id=\"1\" text_id=\"5\">for the most accurate estimates ,</u><u sent_id=\"4\" tag_id=\"1\" text_id=\"6\">use the Detailed Calculator .</u></u></div><div tag_id=\"2\"><u sent_id=\"5\" tag_id=\"2\"><u sent_id=\"5\" tag_id=\"2\" text_id=\"7\">You need to enter all your past earnings , which are shown on your online .</u></u></div><div tag_id=\"3\"><u sent_id=\"6\" tag_id=\"3\"><u sent_id=\"6\" tag_id=\"3\" text_id=\"8\">Please Note:</u></u></div><ul class=\"browser-default\" tag_id=\"3\"><li tag_id=\"3\"><div tag_id=\"3\"><u sent_id=\"9\" tag_id=\"3\"><u sent_id=\"9\" tag_id=\"3\" text_id=\"9\">The Online Calculator is updated periodically * with new benefit increases and other benefit amounts .</u></u><u sent_id=\"10\" tag_id=\"3\"><u sent_id=\"10\" tag_id=\"3\" text_id=\"10\">Therefore ,</u><u sent_id=\"10\" tag_id=\"3\" text_id=\"11\">it is likely that your benefit estimates in the future will differ from those calculated today .</u></u></div></li><li tag_id=\"3\"><u sent_id=\"11\" tag_id=\"3\"><u sent_id=\"11\" tag_id=\"3\" text_id=\"12\">The Online Calculator works on PCs and Macs with Javascript enabled .</u></u></li><li tag_id=\"3\"><u sent_id=\"12\" tag_id=\"3\"><u sent_id=\"12\" tag_id=\"3\" text_id=\"13\">Some browsers may not allow you to print the table below .</u></u></li></ul><div>The Online Calculator temporarily stores information on your local computer while your browser is open. To protect your personal information, you should close your browser after you have finished your estimate.</div><div><div tag_id=\"4\"><u sent_id=\"13\" tag_id=\"4\"><u sent_id=\"13\" tag_id=\"4\" text_id=\"14\">Note:</u></u><u sent_id=\"14\" tag_id=\"4\"><u sent_id=\"14\" tag_id=\"4\" text_id=\"15\">If your birthday is on January 1st ,</u><u sent_id=\"14\" tag_id=\"4\" text_id=\"16\">we figure your benefit as if your birthday was in the previous year .</u></u></div><div tag_id=\"5\"><u sent_id=\"15\" tag_id=\"5\"><u sent_id=\"15\" tag_id=\"5\" text_id=\"17\">If you qualify for benefits as a Survivor ,</u><u sent_id=\"15\" tag_id=\"5\" text_id=\"18\">your full retirement age for survivors benefits may be different .</u></u></div></div></section></article></div></section></main>",
    "doc_text": "\n\nBenefits Planner: Retirement \n\n\nOnline Calculator (WEP Version) \nThe calculator shown below allows you to estimate your Social Security benefit. However , for the most accurate estimates , use the Detailed Calculator. You need to enter all your past earnings, which are shown on your online. Please Note: The Online Calculator is updated periodically * with new benefit increases and other benefit amounts. Therefore , it is likely that your benefit estimates in the future will differ from those calculated today. The Online Calculator works on PCs and Macs with Javascript enabled. Some browsers may not allow you to print the table below. Note: If your birthday is on January 1st , we figure your benefit as if your birthday was in the previous year. If you qualify for benefits as a Survivor , your full retirement age for survivors benefits may be different. ",
    "title": "Benefits Planner: Retirement | Online Calculator (WEP Version)#1",
    "spans": [
        {
            "end_sec": 32,
            "end_sp": 32,
            "id_sec": "t_0",
            "id_sp": "1",
            "parent_titles": "[]",
            "start_sec": 0,
            "start_sp": 0,
            "tag": "h2",
            "text_sec": "\n\nBenefits Planner: Retirement \n",
            "text_sp": "\n\nBenefits Planner: Retirement \n",
            "title": "Benefits Planner: Retirement"
        },
        {
            "end_sec": 67,
            "end_sp": 67,
            "id_sec": "t_1",
            "id_sp": "2",
            "parent_titles": "[{'id_sp': '1', 'text': 'Benefits Planner: Retirement', 'level': 'h2'}]",
            "start_sec": 32,
            "start_sp": 32,
            "tag": "h3",
            "text_sec": "\n\nOnline Calculator (WEP Version) \n",
            "text_sp": "\n\nOnline Calculator (WEP Version) \n",
            "title": "Online Calculator (WEP Version)"
        },
        {
            "end_sec": 220,
            "end_sp": 147,
            "id_sec": "1",
            "id_sp": "3",
            "parent_titles": "[]",
            "start_sec": 67,
            "start_sp": 67,
            "tag": "u",
            "text_sec": "The calculator shown below allows you to estimate your Social Security benefit. However , for the most accurate estimates , use the Detailed Calculator. ",
            "text_sp": "The calculator shown below allows you to estimate your Social Security benefit. ",
            "title": "Online Calculator (WEP Version)"
        }
    ]
}

doc2dial_rc的示例数据实例:

{
    "id": "78f72b08b43791a4a70363fe62b8de08_1",
    "is_impossible": false,
    "question": "Hello, I want to know about the retirement plan.",
    "answers": {
        "answer_start": [
            0
        ],
        "text": [
            "\n\nBenefits Planner: Retirement \n\n\nOnline Calculator (WEP Version) \n"
        ]
    },
    "context": "\n\nBenefits Planner: Retirement \n\n\nOnline Calculator (WEP Version) \nThe calculator shown below allows you to estimate your Social Security benefit. However , for the most accurate estimates , use the Detailed Calculator. You need to enter all your past earnings, which are shown on your online. Please Note: The Online Calculator is updated periodically * with new benefit increases and other benefit amounts. Therefore , it is likely that your benefit estimates in the future will differ from those calculated today. The Online Calculator works on PCs and Macs with Javascript enabled. Some browsers may not allow you to print the table below. Note: If your birthday is on January 1st , we figure your benefit as if your birthday was in the previous year. If you qualify for benefits as a Survivor , your full retirement age for survivors benefits may be different. ",
    "title": "Benefits Planner: Retirement | Online Calculator (WEP Version)#1_0",
    "domain": "ssa"
}

数据字段

对于document_domain,

  • doc_id:文档的ID;
  • title:文档的标题;
  • domain:文档的领域;
  • doc_text:文档的文本内容(不包含HTML标记);
  • doc_html_ts:带有HTML标记的文档内容和由text_id属性指示的注释段落,与id_sp对应。
  • doc_html_raw:带有HTML标记但不带有跨度注释的文档内容。
  • spans:文档中所有跨度的键值对,以id_sp为键。每个跨度包括以下内容,
    • id_sp:跨度的ID,如doc_html_ts中的text_id;
    • start_sp / end_sp:文本跨度在doc_text中的起始/结束位置;
    • text_sp:跨度的文本内容;
    • id_sec:包含跨度的(子)部分(例如

      )或标题;

    • start_sec / end_sec:(子)部分在doc_text中的起始/结束位置;
    • text_sec:(子)部分的文本内容;
    • title:(子)部分的标题;
    • parent_titles:标题的父标题。

对于dialogue_domain:

  • dial_id:对话的ID;
  • doc_id:关联文档的ID;
  • domain:文档的领域;
  • turns:对话回合的列表。每个回合包括,
    • turn_id:回合的时间顺序;
    • role:角色,可以是“agent”或“user”;
    • da:对话行为;
    • references:关联文档中的跨度(id_sp)。如果回合是无关回合,即da以“ood”结尾,则reference为空。请注意,标签为“precondition”/“solution”的跨度是实际的跨度。标签为“reference”的跨度是相关标题或上下文引用,用于更好地描述对话场景给众包贡献者。
    • utterance:基于对话场景的人工生成的话语。

对于doc2dial_rc,其符合 SQuAD 数据格式。如何加载用于阅读理解任务的Doc2Dial数据,请参考 here

  • id:问答实例的ID;
  • question:用户查询;
  • answers:基于关联文档的答案;
    • answer_start:关联文档(context)中跨度的起始位置;
    • text:跨度的文本内容;
  • title:关联文档的标题;
  • domain:关联文档的领域;
  • context:关联文档的文本内容(不包含HTML标记)。

数据切分

对话领域的训练集和开发集只有训练集中有文档领域的数据。

数据集创建

策划理由

[需要更多信息]

来源数据

初始数据收集和规范化

[需要更多信息]

谁是源语言制作者?

[需要更多信息]

注释

注释过程

[需要更多信息]

谁是注释者?

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集策划者

Song Feng,Hui Wan,Chulaka Gunasekara,Siva Sankalp Patel,Sachindra Joshi,Luis A. Lastras

许可信息

Creative Commons Attribution 3.0 Unported

引用信息

@inproceedings{feng-etal-2020-doc2dial, title = "doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset", author = "Feng, Song and Wan, Hui and Gunasekara, Chulaka and Patel, Siva and Joshi, Sachindra and Lastras, Luis", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", publisher = "Association for Computational Linguistics", url = " https://www.aclweb.org/anthology/2020.emnlp-main.652" ",}

贡献

感谢 @songfeng @KMFODA 添加了这个数据集。