数据集:
wiki_bio
任务:
表格到文本语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:1603.07771许可:
cc-by-sa-3.0此数据集包含从维基百科提取的728,321个传记,其中包含传记的第一段和表格信息框。
此数据集的主要目的是开发文本生成模型。
英文。
需要更多信息
单个样本的结构如下所示:
{ "input_text":{ "context":"pope michael iii of alexandria\n", "table":{ "column_header":[ "type", "ended", "death_date", "title", "enthroned", "name", "buried", "religion", "predecessor", "nationality", "article_title", "feast_day", "birth_place", "residence", "successor" ], "content":[ "pope", "16 march 907", "16 march 907", "56th of st. mark pope of alexandria & patriarch of the see", "25 april 880", "michael iii of alexandria", "monastery of saint macarius the great", "coptic orthodox christian", "shenouda i", "egyptian", "pope michael iii of alexandria\n", "16 -rrb- march -lrb- 20 baramhat in the coptic calendar", "egypt", "saint mark 's church", "gabriel i" ], "row_number":[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] } }, "target_text":"pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .\nin 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .\nthis building was at one time believed to have later become the site of the cairo geniza .\n" }
其中,"table"字段中存储了所有维基百科信息框的信息(信息框的标题存储在"column_header"中,内容存储在"content"字段中)。
[需要更多信息]
此数据集在论文《用于传记领域的结构化数据的神经文本生成》 (arxiv link) 中宣布,并存储在 this 存储库(由DavidGrangier拥有)中。
初始数据收集和标准化[需要更多信息]
谁是源语言生产者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
此数据集使用 Creative Commons CC BY-SA 3.0 许可证发布。
若要以BibTex格式引用原始论文,请参考:
@article{DBLP:journals/corr/LebretGA16, author = {R{\'{e}}mi Lebret and David Grangier and Michael Auli}, title = {Generating Text from Structured Data with Application to the Biography Domain}, journal = {CoRR}, volume = {abs/1603.07771}, year = {2016}, url = {http://arxiv.org/abs/1603.07771}, archivePrefix = {arXiv}, eprint = {1603.07771}, timestamp = {Mon, 13 Aug 2018 16:48:30 +0200}, biburl = {https://dblp.org/rec/journals/corr/LebretGA16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
感谢 @alejandrocros 添加此数据集。