数据集:
lc_quad
任务:
问答语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
cc-by-3.0LC-QuAD 2.0是一个大型问答数据集,包含30,000对问题和相应的SPARQL查询。目标知识库是Wikidata和DBpedia,具体是2018年的版本。有关数据集创建过程和框架的详细信息,请参阅我们的论文。
'train'的一个示例如下所示。
This example was too long and was cropped: { "NNQT_question": "What is the {periodical literature} for {mouthpiece} of {Delta Air Lines}", "paraphrased_question": "What is Delta Air Line's periodical literature mouthpiece?", "question": "What periodical literature does Delta Air Lines use as a moutpiece?", "sparql_dbpedia18": "\"select distinct ?obj where { ?statement <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://wikidata.dbpedia.org/resou...", "sparql_wikidata": " select distinct ?obj where { wd:Q188920 wdt:P2813 ?obj . ?obj wdt:P31 wd:Q1002697 } ", "subgraph": "simple question right", "template": " <S P ?O ; ?O instanceOf Type>", "template_index": 65, "uid": 19719 }
数据字段在所有拆分中相同。
defaultname | train | test |
---|---|---|
default | 19293 | 4781 |
LC-QuAD 2.0的许可协议是 Creative Commons Attribution 3.0 Unported License 。
@inproceedings{dubey2017lc2, title={LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia}, author={Dubey, Mohnish and Banerjee, Debayan and Abdelkawi, Abdelrahman and Lehmann, Jens}, booktitle={Proceedings of the 18th International Semantic Web Conference (ISWC)}, year={2019}, organization={Springer} }
感谢 @lewtun 、 @thomwolf 、 @patrickvonplaten 添加了该数据集。