数据集:

conv_ai_3

任务:

对话

文本分类

子任务:

text-scoring

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:2009.11352

其他:

evaluating-dialogue-systems

许可:

license:unknown

数据集介绍文件清单

英文

数据集卡片：[需要更多信息]

数据集概述

Conv AI 3挑战赛是2020年在Search-oriented Conversational AI (SCAI) EMNLP研讨会中举办的。对话系统的主要目标是针对用户的请求返回适当的回答。然而，某些用户请求可能存在歧义。在信息检索（IR）设置中，这种情况主要通过多样化的搜索结果页来处理。但是在对话设置中更具挑战性。因此，我们的目标是研究对话设置下的以下情况：

用户提出了一个模糊的问题（模糊的问题指可以有多个可能答案的问题）
系统必须识别问题的模糊性，并且在尝试直接回答问题之前，提出一个好的澄清问题。

支持的任务和排行榜

[需要更多信息]

语言

[需要更多信息]

数据集结构

数据实例

这里是数据集中的几个示例：

{'topic_id': 8,
'facet_id': 'F0968',
'initial_request': 'I want to know about appraisals.',
'topic_desc': 'Find information about the appraisals in nearby companies.',
'clarification_need': 2,
'question_id': 'F0001',
'question': 'are you looking for a type of appraiser',
'answer': 'im looking for nearby companies that do home appraisals',
'facet_desc': 'Get the TYPE of Appraisals'
'conversation_context': [],
'context_id': 968}

{'topic_id': 8,
'facet_id': 'F0969',
'initial_request': 'I want to know about appraisals.',
'topic_desc': 'Find information about the type of appraisals.',
'clarification_need': 2,
'question_id': 'F0005',
'question': 'are you looking for a type of appraiser',
'facet_desc': 'Get the TYPE of Appraisals'
'answer': 'yes jewelry',
'conversation_context': [],
'context_id': 969}

{'topic_id': 293,
'facet_id': 'F0729',
'initial_request': 'Tell me about the educational advantages of social networking sites.',
'topic_desc': 'Find information about the educational benefits of the social media sites',
'clarification_need': 2,
'question_id': 'F0009'
'question': 'which social networking sites would you like information on',
'answer': 'i don have a specific one in mind just overall educational benefits to social media sites',
'facet_desc': 'Detailed information about the Networking Sites.'
'conversation_context': [{'question': 'what level of schooling are you interested in gaining the advantages to social networking sites', 'answer': 'all levels'}, {'question': 'what type of educational advantages are you seeking from social networking', 'answer': 'i just want to know if there are any'}],
'context_id': 976573}

数据字段

topic_id: 主题的ID（initial_request）。
initial_request：发起对话的查询（文本）。
topic_desc：TREC Web Track数据中主题的完整描述。
clarification_need：标签从1到4，表示需要澄清主题的程度。如果一个initial_request是独立的并且不需要任何澄清，标签为1。如果一个initial_request绝对模糊，使得搜索引擎在澄清之前无法猜测用户的正确意图，标签为4。
facet_id: 指标的ID。
facet_desc：TREC Web Track数据中信息需求的完整描述。
question_id: 问题的ID。
question：系统可以针对当前主题和指标向用户提出的澄清问题。
answer：假设用户处于当前行的上下文中（即，用户的初始查询是initial_request，他们的信息需求是facet_desc，并且已向用户提出question），这是对澄清问题的回答。

数据拆分

[需要更多信息]

数据集创建

策划原理

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

谁是源语言制作方？

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁？

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集策划者

[需要更多信息]

许可信息

[需要更多信息]

引用信息

@misc{aliannejadi2020convai3,title={ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)},author={Mohammad Aliannejadi and Julia Kiseleva and Aleksandr Chuklin and Jeff Dalton and Mikhail Burtsev},year={2020},eprint={2009.11352},archivePrefix={arXiv},primaryClass={cs.CL}}

贡献者

感谢 @rkc007 添加此数据集。

作者:

佚名

数据集大小:

15.19 KB