数据集:

wikisql

任务:

文生文

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found machine-generated

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1709.00103

其他:

text-to-sql

许可:

license:unknown

数据集介绍文件清单

英文

"wikisql" 数据集卡片

数据集概要

一个用于开发关系数据库自然语言接口的大规模众包数据集。

WikiSQL是一个包含80654个问题和SQL查询的数据集，分布在来自维基百科的24241个表格中进行手动注释。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据示例

默认

下载的数据集文件大小：26.16 MB
生成的数据集大小：154.74 MB
总使用的磁盘空间：180.90 MB

'validation'的示例如下。

This example was too long and was cropped:

{
    "phase": 1,
    "question": "How would you answer a second test question?",
    "sql": {
        "agg": 0,
        "conds": {
            "column_index": [2],
            "condition": ["Some Entity"],
            "operator_index": [0]
        },
        "human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity",
        "sel": 0
    },
    "table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..."
}

数据字段

所有拆分之间的数据字段相同。

默认

阶段：整数（int32）特征。
问题：字符串（string）特征。
标头：字符串（string）特征的列表。
页面标题：字符串（string）特征。
页面ID：字符串（string）特征。
类型：字符串（string）特征的列表。
ID：字符串（string）特征。
章节标题：字符串（string）特征。
字幕：字符串（string）特征。
行：包含以下内容的字典特征：
- 特征：字符串（string）特征。
名称：字符串（string）特征。
人类可读：字符串（string）特征。
sel：整数（int32）特征。
agg：整数（int32）特征。
条件：包含以下内容的字典特征：
- 列索引：整数（int32）特征。
- 操作符索引：整数（int32）特征。
- 条件：字符串（string）特征。

数据拆分

name	train	validation	test
default	56355	8421	15878

数据集创建

策划理由

More Information Needed

原始数据

初始数据收集和规范化

More Information Needed

谁是源语言的制作者？

More Information Needed

注释

注释过程

More Information Needed

注释者是谁？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

More Information Needed

引用信息

@article{zhongSeq2SQL2017,
  author    = {Victor Zhong and
               Caiming Xiong and
               Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using
               Reinforcement Learning},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
  year      = {2017}
}

贡献者

感谢 @lewtun 、 @ghomasHudson 、 @thomwolf 添加此数据集。

作者:

佚名

数据集大小:

17.88 KB

"wikisql" 数据集卡片

数据集概要

支持的任务和排行榜

语言

数据集结构

数据示例

数据字段

数据拆分

数据集创建

策划理由

原始数据

注释

个人和敏感信息

使用数据的注意事项

数据的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献者