数据集:
break_data
任务:
文生文语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original许可:
license:unknownBreak 是一个人工标注的自然语言问题及其问题分解意义表示(Question Decomposition Meaning Representations,QDMR)的数据集。Break 数据集包含来自文本、图像和数据库的10个问答数据集中的83,978个示例。此存储库包含 Break 数据集以及有关确切数据格式的信息。
“验证”示例如下。
{ "decomposition": "return flights ;return #1 from denver ;return #2 to philadelphia ;return #3 if available", "operators": "['select', 'filter', 'filter', 'filter']", "question_id": "ATIS_dev_0", "question_text": "what flights are available tomorrow from denver to philadelphia ", "split": "dev" }QDMR-high-level
“训练”示例如下。
{ "decomposition": "return ground transportation ;return #1 which is available ;return #2 from the pittsburgh airport ;return #3 to downtown ;return the cost of #4", "operators": "['select', 'filter', 'filter', 'filter', 'project']", "question_id": "ATIS_dev_102", "question_text": "what ground transportation is available from the pittsburgh airport to downtown and how much does it cost ", "split": "dev" }QDMR-high-level-lexicon
“训练”示例如下。
This example was too long and was cropped: { "allowed_tokens": "\"['higher than', 'same as', 'what ', 'and ', 'than ', 'at most', 'he', 'distinct', 'House', 'two', 'at least', 'or ', 'date', 'o...", "source": "What office, also held by a member of the Maine House of Representatives, did James K. Polk hold before he was president?" }QDMR-lexicon
“验证”示例如下。
This example was too long and was cropped: { "allowed_tokens": "\"['higher than', 'same as', 'what ', 'and ', 'than ', 'at most', 'distinct', 'two', 'at least', 'or ', 'date', 'on ', '@@14@@', ...", "source": "what flights are available tomorrow from denver to philadelphia " }logical-forms
“训练”示例如下。
{ "decomposition": "return ground transportation ;return #1 which is available ;return #2 from the pittsburgh airport ;return #3 to downtown ;return the cost of #4", "operators": "['select', 'filter', 'filter', 'filter', 'project']", "program": "some program", "question_id": "ATIS_dev_102", "question_text": "what ground transportation is available from the pittsburgh airport to downtown and how much does it cost ", "split": "dev" }
所有拆分之间的数据字段相同。
QDMRname | train | validation | test |
---|---|---|---|
QDMR | 44321 | 7760 | 8069 |
QDMR-high-level | 17503 | 3130 | 3195 |
QDMR-high-level-lexicon | 17503 | 3130 | 3195 |
QDMR-lexicon | 44321 | 7760 | 8069 |
logical-forms | 44098 | 7719 | 8006 |
@article{Wolfson2020Break, title={Break It Down: A Question Understanding Benchmark}, author={Wolfson, Tomer and Geva, Mor and Gupta, Ankit and Gardner, Matt and Goldberg, Yoav and Deutch, Daniel and Berant, Jonathan}, journal={Transactions of the Association for Computational Linguistics}, year={2020}, }
感谢 @patrickvonplaten 、 @lewtun 和 @thomwolf 添加了该数据集。