数据集:
lmqg/qg_squad
这是《 "Generative Language Models for Paragraph-Level Question Generation: A Unified Benchmark and Evaluation, EMNLP 2022 main conference" 》提出的统一问题生成基准数据集《 QG-Bench 》的子集。这是针对问题生成(QG)任务的《 SQuAD 》数据集。训练/开发/测试集的划分遵循《 "Neural Question Generation" 》的工作并与《 leader board 》兼容。
英语(en)
'train'的示例如下。
《{ "question": "What is heresy mainly at odds with?", "paragraph": "Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs. A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "answer": "established beliefs or customs", "sentence": "Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs .", "paragraph_sentence": "<hl> Heresy is any provocative belief or theory that is strongly at variance with established beliefs or customs . <hl> A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "paragraph_answer": "Heresy is any provocative belief or theory that is strongly at variance with <hl> established beliefs or customs <hl>. A heretic is a proponent of such claims or beliefs. Heresy is distinct from both apostasy, which is the explicit renunciation of one's religion, principles or cause, and blasphemy, which is an impious utterance or action concerning God or sacred things.", "sentence_answer": "Heresy is any provocative belief or theory that is strongly at variance with <hl> established beliefs or customs <hl> ." }》
所有拆分的数据字段是相同的。
假设每个paragraph_answer、paragraph_sentence和sentence_answer特征用于训练问题生成模型,但包含了不同的信息。paragraph_answer和sentence_answer特征用于答案感知型问题生成,而paragraph_sentence特征用于句子感知型问题生成。
train | validation | test |
---|---|---|
75722 | 10570 | 11877 |
@inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", }》