数据集:

air_dialogue

英文

air_dialogue数据集卡

数据集摘要

AirDialogue是一个包含402,038个目标导向对话的大型数据集。为了收集这个数据集,我们创建了一个上下文生成器,提供旅行和航班限制。然后,要求人类注释者扮演顾客或代理商的角色,并与目标互动,成功地预订行程,以符合限制条件。

支持的任务和排行榜

我们使用困惑度和BLEU分数来评估模型生成的语言质量。我们还比较模型生成的对话状态与真实状态之间的差异。我们使用两种类型的指标:完全匹配分数和缩放分数。

推断竞赛和排行榜可以在此处找到: https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59

语言

数据集中的文本为英文。BCP 47代码为en

数据集结构

数据实例

数据以两组文件的形式提供。第一组包含对话(air_dialogue_data)和知识库(air_dialogue_kb)

BuilderConfig:air_dialogue_data

{"action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "intent": {"return_month": "June", "return_day": "14", "max_price": 200, "departure_airport": "DFW", "return_time": "afternoon", "max_connections": 1, "departure_day": "12", "goal": "book", "departure_month": "June", "name": "Emily Edwards", "return_airport": "IAD"}, "timestamps": [1519233239, 1519233244, 1519233249, 1519233252, 1519233333, 1519233374, 1519233392, 1519233416, 1519233443, 1519233448, 1519233464, 1519233513, 1519233525, 1519233540, 1519233626, 1519233628, 1519233638], "dialogue": ["customer: Hello.", "agent: Hello.", "customer: My name is Emily Edwards.", "agent: How may I help you out?", "customer: I need some help in my flight ticket reservation to attend a convocation meeting, can you please help me?", "agent: Sure, I will help you out. May I know your travelling dates please?", "customer: Thank you and my dates are 06/12 and back on 06/14.", "agent: Can I know your airport codes?", "customer: The airport codes are from DFW to IAD.", "agent: Ok, please wait a moment.", "customer: Sure.", "agent: There is a flight with connection 1 and price 200, can I proceed with this flight?", "customer: Yes, do proceed with booking.", "agent: Ok, your ticket has been booked.", "customer: Thank you for your assistance in my flight ticket reservation.", "agent: Thank you for choosing us.", "customer: You are welcome."], "expected_action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "correct_sample": true}

BuilderConfig:air_dialogue_kb

{"kb": [{"return_airport": "DTW", "airline": "Spirit", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1000, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1001, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 15, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 500}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1002, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 13, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 600}, {"return_airport": "IAD", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1003, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 5, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1004, "departure_month": "June", "departure_time_num": 9, "class": "economy", "return_time_num": 11, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "AA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1005, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 17, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1006, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1007, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 20, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "AA", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1008, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1009, "departure_month": "June", "departure_time_num": 18, "class": "economy", "return_time_num": 6, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Frontier", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1010, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1011, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 100}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1012, "departure_month": "June", "departure_time_num": 13, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1013, "departure_month": "June", "departure_time_num": 16, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1014, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1015, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 300}, {"return_airport": "DTW", "airline": "UA", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1016, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1017, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1018, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1019, "departure_month": "June", "departure_time_num": 7, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1020, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 200}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1021, "departure_month": "June", "departure_time_num": 11, "class": "business", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 1000}, {"return_airport": "IAD", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1022, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 14, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 200}, {"return_airport": "IAD", "airline": "Frontier", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1023, "departure_month": "June", "departure_time_num": 19, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "UA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1024, "departure_month": "June", "departure_time_num": 11, "class": "economy", "return_time_num": 19, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Hawaiian", "departure_day": "11", "departure_airport": "IAD", "flight_number": 1025, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1026, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 300}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1027, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 15, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "IAD", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1028, "departure_month": "June", "departure_time_num": 23, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Spirit", "departure_day": "11", "departure_airport": "DTW", "flight_number": 1029, "departure_month": "June", "departure_time_num": 22, "class": "business", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 800}], "reservation": 0}

数据字段

BuilderConfig:air_dialogue_data:提供顾客上下文、对话状态和环境

key name Description
'search_action' search action performed by customer
'action' Action taken by the agent
'intent' Intents from the conversation
'timestamps' Timestamp for each of the dialogues
'dialogue' Dialogue recorded between agent & customer
'expected_action' Expected action from agent (human-annotated)
'correct_sample' whether action performed by agent was same as expected_action

BuilderConfig:air_dialogue_kb:提供代理商上下文ca =(db,r)

key name Description
'kb' Available flights in the database
'reservation' whether customer has an existing reservation

数据拆分

数据按照80%、10%和10%的比例拆分为训练集、开发集和测试集

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

谁是源语言的生产者?

[需要更多信息]

注释

注释过程

为了收集这个数据集,我们创建了一个上下文生成器,提供旅行和航班限制。然后,我们要求人类注释者扮演顾客或代理商的角色,并与目标互动,成功地预订行程,以符合限制条件。我们环境的关键在于评估对话成功的便利性,这是通过使用限制条件生成的真实状态(例如预订的航班)来实现的。任何不能生成正确状态的对话代理被认为是失败的。

谁是注释者?

[需要更多信息]

个人和敏感信息

不存储任何个人和敏感信息

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集策划者

AirDialogue团队

有关HuggingFace数据集中心实现的问题,联系Aakash Gupta

许可信息

cc-by-nc-4.0

引用信息

@inproceedings{wei-etal-2018-airdialogue, title = "{A}ir{D}ialogue: An Environment for Goal-Oriented Dialogue Research", author = "Wei, Wei and Le, Quoc and Dai, Andrew and Li, Jia", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = " https://www.aclweb.org/anthology/D18-1419" ", doi = "10.18653/v1/D18-1419", pages = "3844--3854", abstract = "Recent progress in dialogue generation has inspired a number of studies on dialogue systems that are capable of accomplishing tasks through natural language interactions. A promising direction among these studies is the use of reinforcement learning techniques, such as self-play, for training dialogue agents. However, current datasets are limited in size, and the environment for training agents and evaluating progress is relatively unsophisticated. We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. To collect this dataset, we create a context-generator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail. Our experimental results indicate that state-of-the-art dialogue models can only achieve a score of 0.17 while humans can reach a score of 0.91, which suggests significant opportunities for future improvement."}

贡献

感谢 @skyprince999 添加了这个数据集。