数据集:

squad_v1_pt

任务:

问答

子任务:

extractive-qa open-domain-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1606.05250

许可:

mit

数据集介绍文件清单

英文

数据集卡片: "squad_v1_pt"

数据集摘要

SQuAD 数据集的葡萄牙语翻译版。该翻译是使用 Google Cloud API 自动完成的。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

default

下载的数据集文件大小：39.53 MB
生成的数据集大小：96.72 MB
总磁盘使用量：136.25 MB

'train' 的一个示例如下所示。

This example was too long and was cropped:

{
    "answers": {
        "answer_start": [0],
        "text": ["Saint Bernadette Soubirous"]
    },
    "context": "\"Arquitetonicamente, a escola tem um caráter católico. No topo da cúpula de ouro do edifício principal é uma estátua de ouro da ...",
    "id": "5733be284776f41900661182",
    "question": "A quem a Virgem Maria supostamente apareceu em 1858 em Lourdes, na França?",
    "title": "University_of_Notre_Dame"
}

数据字段

各拆分中的数据字段相同。

default

id: 字符串属性。
title: 字符串属性。
context: 字符串属性。
question: 字符串属性。
answers: 包含的字典属性:
- text: 字符串属性。
- answer_start: int32 属性。

数据拆分

name	train	validation
default	87599	10570

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和规范化

More Information Needed

源语言生产者是谁？

More Information Needed

注释

注释过程

More Information Needed

注释者是谁？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

附加信息

数据集策划者

More Information Needed

许可信息

More Information Needed

引用信息

@article{2016arXiv160605250R,
       author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
                 Konstantin and {Liang}, Percy},
        title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
      journal = {arXiv e-prints},
         year = 2016,
          eid = {arXiv:1606.05250},
        pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
       eprint = {1606.05250},
}

贡献

感谢 @thomwolf ， @albertvillanova ， @lewtun ， @patrickvonplaten 添加此数据集。

作者:

佚名

数据集大小:

14.41 KB

数据集卡片: "squad_v1_pt"

数据集摘要

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据集的社会影响

偏见讨论

其他已知限制

附加信息

数据集策划者

许可信息

引用信息

贡献