数据集:
cfq
语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
expert-generated批注创建人:
no-annotation源数据集:
original预印本库:
arxiv:1912.09713其他:
compositionality许可:
cc-by-4.0Compositional Freebase Questions (CFQ)是一个专门设计用于衡量组合泛化能力的数据集。CFQ是一个简单而又现实的大型自然语言问题和答案数据集,对于每个问题还提供了与Freebase知识库对应的SPARQL查询。这意味着CFQ也可以用于语义解析。
英语 (en)。
"train"的示例如下所示。
{ 'query': 'SELECT count(*) WHERE {\n?x0 a ns:people.person .\n?x0 ns:influence.influence_node.influenced M1 .\n?x0 ns:influence.influence_node.influenced M2 .\n?x0 ns:people.person.spouse_s/ns:people.marriage.spouse|ns:fictional_universe.fictional_character.married_to/ns:fictional_universe.marriage_of_fictional_characters.spouses ?x1 .\n?x1 a ns:film.cinematographer .\nFILTER ( ?x0 != ?x1 )\n}', 'question': 'Did a person marry a cinematographer , influence M1 , and influence M2' }mcd2
"train"的示例如下所示。
{ 'query': 'SELECT count(*) WHERE {\n?x0 ns:people.person.parents|ns:fictional_universe.fictional_character.parents|ns:organization.organization.parent/ns:organization.organization_relationship.parent ?x1 .\n?x1 a ns:people.person .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM1 ns:business.employer.employees/ns:business.employment_tenure.person M4 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person ?x0 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M2 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M3 .\nM5 ns:business.employer.employees/ns:business.employment_tenure.person M4\n}', 'question': "Did M1 and M5 employ M2 , M3 , and M4 and employ a person 's child" }mcd3
"train"的示例如下所示。
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }query_complexity_split
"train"的示例如下所示。
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }query_pattern_split
"train"的示例如下所示。
{ "query": "SELECT /producer M0 . /director M0 . ", "question": "Who produced and directed M0?" }
所有拆分和配置的数据字段相同:
name | train | test |
---|---|---|
mcd1 | 95743 | 11968 |
mcd2 | 95743 | 11968 |
mcd3 | 95743 | 11968 |
query_complexity_split | 100654 | 9512 |
query_pattern_split | 94600 | 12589 |
question_complexity_split | 98999 | 10340 |
question_pattern_split | 95654 | 11909 |
random_split | 95744 | 11967 |
@inproceedings{Keysers2020, title={Measuring Compositional Generalization: A Comprehensive Method on Realistic Data}, author={Daniel Keysers and Nathanael Sch"{a}rli and Nathan Scales and Hylke Buisman and Daniel Furrer and Sergii Kashubin and Nikola Momchev and Danila Sinopalnikov and Lukasz Stafiniak and Tibor Tihon and Dmitry Tsarkov and Xiao Wang and Marc van Zee and Olivier Bousquet}, booktitle={ICLR}, year={2020}, url={https://arxiv.org/abs/1912.09713.pdf}, }
感谢 @thomwolf , @patrickvonplaten , @lewtun , @brainshawn 添加了此数据集。