数据集:

compguesswhat

任务:

视觉问答

子任务:

visual-question-answering

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

found

批注创建人:

machine-generated

源数据集:

extended|other-guesswhat

许可:

license:unknown

数据集介绍文件清单

英文

"compguesswhat" 数据集卡片

数据集摘要

    CompGuessWhat?! is an instance of a multi-task framework for evaluating the quality of learned neural representations,
    in particular concerning attribute grounding. Use this dataset if you want to use the set of games whose reference
    scene is an image in VisualGenome. Visit the website for more details: https://compguesswhat.github.io

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据示例

compguesswhat-original

已下载的数据集文件大小：107.21 MB
生成的数据集大小：174.37 MB
总计使用的磁盘空间：281.57 MB

'验证'的示例如下所示。

This example was too long and was cropped:

{
    "id": 2424,
    "image": "{\"coco_url\": \"http://mscoco.org/images/270512\", \"file_name\": \"COCO_train2014_000000270512.jpg\", \"flickr_url\": \"http://farm6.stat...",
    "objects": "{\"area\": [1723.5133056640625, 4838.5361328125, 287.44476318359375, 44918.7109375, 3688.09375, 522.1935424804688], \"bbox\": [[5.61...",
    "qas": {
        "answer": ["Yes", "No", "No", "Yes"],
        "id": [4983, 4996, 5006, 5017],
        "question": ["Is it in the foreground?", "Does it have wings?", "Is it a person?", "Is it a vehicle?"]
    },
    "status": "success",
    "target_id": 1197044,
    "timestamp": "2016-07-08 15:07:38"
}

compguesswhat-zero_shot

已下载的数据集文件大小：4.84 MB
生成的数据集大小：96.74 MB
总计使用的磁盘空间：101.59 MB

'nd_valid'的示例如下所示。

This example was too long and was cropped:

{
    "id": 0,
    "image": {
        "coco_url": "https://s3.amazonaws.com/nocaps/val/004e21eb2e686f40.jpg",
        "date_captured": "2018-11-06 11:04:33",
        "file_name": "004e21eb2e686f40.jpg",
        "height": 1024,
        "id": 6,
        "license": 0,
        "open_images_id": "004e21eb2e686f40",
        "width": 768
    },
    "objects": "{\"IsOccluded\": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \"IsTruncated\": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \"area\": [3...",
    "status": "incomplete",
    "target_id": "004e21eb2e686f40_30"
}

数据字段

所有拆分的数据字段相同。

compguesswhat-original

id : 一个 int32 特征。
target_id : 一个 int32 特征。
timestamp : 一个 string 特征。
status : 一个 string 特征。
id : 一个 int32 特征。
file_name : 一个 string 特征。
flickr_url : 一个 string 特征。
coco_url : 一个 string 特征。
height : 一个 int32 特征。
width : 一个 int32 特征。
width : 一个 int32 特征。
height : 一个 int32 特征。
url : 一个 string 特征。
coco_id : 一个 int32 特征。
flickr_id : 一个 string 特征。
image_id : 一个 string 特征。
qas : 一个包含以下特征的字典：
- question : 一个 string 特征。
- answer : 一个 string 特征。
- id : 一个 int32 特征。
objects : 一个包含以下特征的字典：
- id : 一个 int32 特征。
- bbox : 一个 float32 特征的列表。
- category : 一个 string 特征。
- area : 一个 float32 特征。
- category_id : 一个 int32 特征。
- segment : 一个包含以下特征的字典：
  - feature : 一个 float32 特征。

compguesswhat-zero_shot

id : 一个 int32 特征。
target_id : 一个 string 特征。
status : 一个 string 特征。
id : 一个 int32 特征。
file_name : 一个 string 特征。
coco_url : 一个 string 特征。
height : 一个 int32 特征。
width : 一个 int32 特征。
license : 一个 int32 特征。
open_images_id : 一个 string 特征。
date_captured : 一个 string 特征。
objects : 一个包含以下特征的字典：
- id : 一个 string 特征。
- bbox : 一个 float32 特征的列表。
- category : 一个 string 特征。
- area : 一个 float32 特征。
- category_id : 一个 int32 特征。
- IsOccluded : 一个 int32 特征。
- IsTruncated : 一个 int32 特征。
- segment : 一个包含以下特征的字典：
  - MaskPath : 一个 string 特征。
  - LabelName : 一个 string 特征。
  - BoxID : 一个 string 特征。
  - BoxXMin : 一个 string 特征。
  - BoxXMax : 一个 string 特征。
  - BoxYMin : 一个 string 特征。
  - BoxYMax : 一个 string 特征。
  - PredictedIoU : 一个 string 特征。
  - Clicks : 一个 string 特征。

数据拆分

compguesswhat-original

train	validation	test
compguesswhat-original	46341	9738	9621

compguesswhat-zero_shot

nd_valid	od_valid	nd_test	od_test
compguesswhat-zero_shot	5343	5372	13836	13300

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和规范化

More Information Needed

谁是源语言的生产者？

More Information Needed

注释

注释过程

More Information Needed

谁是注释者？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

More Information Needed

引用信息

        @inproceedings{suglia2020compguesswhat,
          title={CompGuessWhat?!: a Multi-task Evaluation Framework for Grounded Language Learning},
          author={Suglia, Alessandro, Konstas, Ioannis, Vanzo, Andrea, Bastianelli, Emanuele, Desmond Elliott, Stella Frank and Oliver Lemon},
          booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
          year={2020}
        }

贡献者

感谢 @thomwolf 、 @aleSuglia 、 @lhoestq 添加了该数据集。

作者:

佚名

数据集大小:

36.23 KB

"compguesswhat" 数据集卡片

数据集摘要

支持的任务和排行榜

语言

数据集结构

数据示例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据集的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献者