数据集:
coached_conv_pref
语言:
en计算机处理:
monolingual大小:
n<1K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
cc-by-sa-4.0该数据集包含502个英语对话,共计12,000个经过注释的用户与助手之间的自然语言电影偏好讨论语句。这些数据是通过Wizard-of-Oz方法收集的,其中两个付费众包工作者扮演“助手”和“用户”的角色。助手采用Coached Conversational Preference Elicitation (CCPE)方法询问用户对电影的偏好。助手设计了一些问题,旨在尽可能减少用户在传达偏好时使用的术语偏差,以及以自然语言获得这些偏好。每个对话都带有实体提及、关于实体的偏好表达、提供的实体描述以及有关实体的其他语句的注释。
该数据集中的文本为英语。相关的BCP-47代码是en。
一个典型的数据点由“助手”和“用户”之间的一系列话语组成。每个话语都被注释为数据字段中提到的类别。
Coached Conversational Preference Elicitation数据集的示例如下:
{'conversationId': 'CCPE-6faee', 'utterances': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'segments': [{'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}, {'annotationType': [1], 'entityType': [0]}], 'endIndex': [20, 27], 'startIndex': [14, 0], 'text': ['comedy', 'I really like comedy movies']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}], 'endIndex': [24], 'startIndex': [16], 'text': ['comedies']}, {'annotations': [{'annotationType': [1], 'entityType': [0]}], 'endIndex': [15], 'startIndex': [0], 'text': ['I love to laugh']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [21, 21], 'startIndex': [8, 0], 'text': ['Step Brothers', 'I liked Step Brothers']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}], 'endIndex': [32], 'startIndex': [0], 'text': ['Had some amazing one-liners that']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [15, 15], 'startIndex': [13, 0], 'text': ['RV', "I don't like RV"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [48, 66], 'startIndex': [18, 50], 'text': ['It was just so slow and boring', "I didn't like it"]}, {'annotations': [{'annotationType': [0], 'entityType': [1]}], 'endIndex': [63], 'startIndex': [33], 'text': ['Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [3], 'entityType': [1]}], 'endIndex': [52, 52], 'startIndex': [22, 0], 'text': ['Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [24, 125, 161], 'startIndex': [0, 95, 135], 'text': ['I really like the actors', 'I just really like the scenery', 'the dinosaurs were awesome']}], 'speaker': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0], 'text': ['What kinds of movies do you like?', 'I really like comedy movies.', 'Why do you like comedies?', "I love to laugh and comedy movies, that's their whole purpose. Make you laugh.", 'Alright, how about a movie you liked?', 'I liked Step Brothers.', 'Why did you like that movie?', 'Had some amazing one-liners that still get used today even though the movie was made awhile ago.', 'Well, is there a movie you did not like?', "I don't like RV.", 'Why not?', "And I just didn't It was just so slow and boring. I didn't like it.", 'Ok, then have you seen the movie Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom.', 'What is it about these kinds of movies that you like or dislike?', 'I really like the actors. I feel like they were doing their best to make the movie better. And I just really like the scenery, and the the dinosaurs were awesome.']}}
每个对话具有以下字段:
每个话语具有以下字段:
每个语义注释段具有以下字段:
每个注释具有两个字段:
本体论解释
在语料库中,对偏好和这些偏好所涉及的实体进行了注释,包括注释类型和实体类型。
注释类型分为四类:
实体类型标记为属于四个类别之一:
数据集有一个名为“train”的拆分,包含整个数据集。
Train | |
---|---|
Input Conversations | 502 |
[需要更多信息]
[需要更多信息]
Who are the source language producers?[需要更多信息]
[需要更多信息]
Who are the annotators?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
Creative Commons Attribution 4.0 License
@inproceedings{radlinski-etal-2019-ccpe, title = {Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences}, author = {Filip Radlinski and Krisztian Balog and Bill Byrne and Karthik Krishnamoorthi}, booktitle = {Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue ({SIGDIAL})}, year = 2019 }
感谢 @vineeths96 添加了这个数据集。