数据集:

medalpaca/medical_meadow_cord19

语言:

en

大小:

100K<n<1M
英文

CORD 19

数据集概述

作为对COVID-19大流行的回应,白宫和一组领先的研究团体准备了COVID-19开放研究数据集(CORD-19)。CORD-19是一个包含超过1,000,000篇学术文章的资源,其中包括超过400,000篇全文文章,涉及COVID-19、SARS-CoV-2和相关冠状病毒。这个免费提供的数据集被提供给全球研究界,以应用最新的自然语言处理和其他人工智能技术,产生支持对抗这种传染病的新见解。这是数据集的加工版本,我们去除了一些空条目,并将其格式化为与alpaca训练兼容。有关数据的更多详细信息,请参阅原始出版物。

引用信息

@inproceedings{wang-etal-2020-cord,
    title = "{CORD-19}: The {COVID-19} Open Research Dataset",
    author = "Wang, Lucy Lu  and Lo, Kyle  and Chandrasekhar, Yoganand  and Reas, Russell  and Yang, Jiangjiang  and Burdick, Doug  and Eide, Darrin  and Funk, Kathryn  and Katsis, Yannis  and Kinney, Rodney Michael  and Li, Yunyao  and Liu, Ziyang  and Merrill, William  and Mooney, Paul  and Murdick, Dewey A.  and Rishi, Devvret  and Sheehan, Jerry  and Shen, Zhihong  and Stilson, Brandon  and Wade, Alex D.  and Wang, Kuansan  and Wang, Nancy Xin Ru  and Wilhelm, Christopher  and Xie, Boya  and Raymond, Douglas M.  and Weld, Daniel S.  and Etzioni, Oren  and Kohlmeier, Sebastian",
    booktitle = "Proceedings of the 1st Workshop on {NLP} for {COVID-19} at {ACL} 2020",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1"
}