数据集:
bigbio/meqsum
MeQSum 是一份医学问题摘要数据集,该数据集在ACL 2019论文《On the Summarization of Consumer Health Questions》中引入。问题理解是问答系统的主要挑战之一。在现实世界的应用中,用户常常提交过长、包含冗余信息的自然语言问题,这增加了问题的复杂性,并导致检索到更多的错误答案。在本文中,我们研究了用于医学问题摘要的神经抽象模型。我们介绍了MeQSum数据集,包含了1,000个摘要的消费者健康问题。
@inproceedings{ben-abacha-demner-fushman-2019-summarization, title = "On the Summarization of Consumer Health Questions", author = "Ben Abacha, Asma and Demner-Fushman, Dina", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P19-1215", doi = "10.18653/v1/P19-1215", pages = "2228--2234", abstract = "Question understanding is one of the main challenges in question answering. In real world applications, users often submit natural language questions that are longer than needed and include peripheral information that increases the complexity of the question, leading to substantially more false positives in answer retrieval. In this paper, we study neural abstractive models for medical question summarization. We introduce the MeQSum corpus of 1,000 summarized consumer health questions. We explore data augmentation methods and evaluate state-of-the-art neural abstractive models on this new task. In particular, we show that semantic augmentation from question datasets improves the overall performance, and that pointer-generator networks outperform sequence-to-sequence attentional models on this task, with a ROUGE-1 score of 44.16{\%}. We also present a detailed error analysis and discuss directions for improvement that are specific to question summarization.", }