数据集:
wikisql
任务:
文生文语言:
en计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1709.00103其他:
text-to-sql许可:
license:unknown一个用于开发关系数据库自然语言接口的大规模众包数据集。
WikiSQL是一个包含80654个问题和SQL查询的数据集,分布在来自维基百科的24241个表格中进行手动注释。
'validation'的示例如下。
This example was too long and was cropped: { "phase": 1, "question": "How would you answer a second test question?", "sql": { "agg": 0, "conds": { "column_index": [2], "condition": ["Some Entity"], "operator_index": [0] }, "human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity", "sel": 0 }, "table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..." }
所有拆分之间的数据字段相同。
默认name | train | validation | test |
---|---|---|---|
default | 56355 | 8421 | 15878 |
@article{zhongSeq2SQL2017, author = {Victor Zhong and Caiming Xiong and Richard Socher}, title = {Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning}, journal = {CoRR}, volume = {abs/1709.00103}, year = {2017} }
感谢 @lewtun 、 @ghomasHudson 、 @thomwolf 添加此数据集。