数据集:

diplomacy_detection

语言:

en

计算机处理:

monolingual

大小:

n<1K

语言创建人:

found

批注创建人:

found

源数据集:

original
英文

HateOffensive 数据集卡片

数据集简介

该数据集包含发送者和接收者根据欺骗(或相反的真实性)对成对对话进行注释的内容。这17,289条消息来自12个游戏。

支持的任务和排行榜

[需要更多信息]

语言

英语

数据集结构

数据实例

{
"messages": 
["Greetings Sultan!\n\nAs your neighbor I would like to propose an alliance! What are your views on the board so far?", "I think an alliance would be great! Perhaps a dmz in the Black Sea would be a good idea to solidify this alliance?\n\nAs for my views on the board, my first moves will be Western into the Balkans and Mediterranean Sea.", "Sounds good lets call a dmz in the black sea", "What's our move this year?", "I've been away from the game for a while", "Not sure yet, what are your thoughts?", "Well I'm pretty worried about Germany attacking me (and Austria to a lesser extent) so im headed west. It looks like Italy's landing a army in Syr this fall unless you can stop it", "That sounds good to me. I'll move to defend against Italy while you move west. If it's not too much too ask, I'd like to request that you withdraw your fleet from bla.", "Oh sorry missed the msg to move out of bl sea ill do that this turn. I did bring my army down into Armenia, To help you expel the Italian. It looks like Austria and Italy are working together. If we have a chance in the region you should probably use smy to protect con. We can't afford to lose con.", "I'll defend con from both ank and smy.", "Hey sorry for stabbing you earlier, it was an especially hard choice since Turkey is usually my country of choice. It's cool we got to do this study huh?"], 
"sender_labels": [false, true, false, true, true, true, true, true, true, true, true], 
"receiver_labels": [true, true, true, true, true, true, true, true, true, true, "NOANNOTATION"], 
"speakers": ["russia", "turkey", "russia", "russia", "russia", "turkey", "russia", "turkey", "russia", "turkey", "russia"], 
"receivers": ["turkey", "russia", "turkey", "turkey", "turkey", "russia", "turkey", "russia", "turkey", "russia", "turkey"], 
"absolute_message_index": [78, 107, 145, 370, 371, 374, 415, 420, 495, 497, 717], 
"relative_message_index": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
"seasons": ["Spring", "Spring", "Spring", "Spring", "Spring", "Spring", "Fall", "Fall", "Spring", "Spring", "Fall"], 
"years": ["1901", "1901", "1901", "1902", "1902", "1902", "1902", "1902", "1903", "1903", "1905"], 
"game_score": ["4", "3", "4", "5", "5", "4", "5", "4", "5", "3", "7"],
"game_score_delta": ["1", "-1", "1", "1", "1", "-1", "1", "-1", "2", "-2", "7"], 
"players": ["russia", "turkey"], 
"game_id": 10
}

数据字段

  • speakers: 发送者(字符串格式。有七个可能的值: 俄罗斯、土耳其、英国、奥地利、德国、法国、意大利)
  • receivers: 接收者(字符串格式。有七个可能的值: 俄罗斯、土耳其、英国、奥地利、德国、法国、意大利)
  • messages: 原始消息字符串(字符串格式。长度从一个单词到段落不等)
  • sender_labels: 指示发送者是否选择该消息为真实、真实或虚假。这用于我们的ACTUAL_LIE计算(true/false,可以是布尔值或字符串格式)
  • receiver_labels: 指示接收者是否选择该消息被认为是真实、真实或虚假。在不到10%的情况下,未收到注释。这用于我们的SUSPECTED_LIE计算(字符串格式。true/false/"NOANNOTATION")
  • game_score: 发送者当前的游戏得分——中心供应地(范围从0到18的字符串格式)
  • game_score_delta: 发送者的当前游戏得分减去接收者的游戏得分(范围从-18到18的字符串格式)
  • absolute_message_index: 该消息在整个游戏中的索引,跨所有对话(整数格式)
  • relative_message_index: 该消息在当前对话中的索引(整数格式)
  • seasons: 与外交中的年份相关联的季节(字符串格式。春季、秋季、冬季)
  • years: 与外交中的季节相关联的年份(字符串格式。1901到1918年)
  • game_id: 对话来自的12个游戏中的哪一个(从1到12的整数格式)

数据拆分

训练集、测试集和验证集的拆分

数据集创建

策划理由

[需要更多信息]

源数据

初始数据采集和规范化

[需要更多信息]

谁是源语言的生产者?

[需要更多信息]

注释

注释过程

[需要更多信息]

谁是注释者?

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集治理者

[需要更多信息]

授权信息

未知

引用信息

@inproceedings{Peskov:Cheng:Elgohary:Barrow:Danescu-Niculescu-Mizil:Boyd-Graber-2020,Title = {It Takes Two to Lie: One to Lie and One to Listen},Author = {Denis Peskov and Benny Cheng and Ahmed Elgohary and Joe Barrow and Cristian Danescu-Niculescu-Mizil and Jordan Boyd-Graber},Booktitle = {Association for Computational Linguistics},Year = {2020},Location = {Seattle},}

贡献

感谢 @MisbahKhan789 添加了这个数据集。