作为对COVID-19大流行的回应,白宫和一组领先的研究团体准备了COVID-19开放研究数据集(CORD-19)。CORD-19是一个包含超过1,000,000篇学术文章的资源,其中包括超过400,000篇全文文章,涉及COVID-19、SARS-CoV-2和相关冠状病毒。这个免费提供的数据集被提供给全球研究界,以应用最新的自然语言处理和其他人工智能技术,产生支持对抗这种传染病的新见解。这是数据集的加工版本,我们去除了一些空条目,并将其格式化为与alpaca训练兼容。有关数据的更多详细信息,请参阅原始出版物。
@inproceedings{wang-etal-2020-cord, title = "{CORD-19}: The {COVID-19} Open Research Dataset", author = "Wang, Lucy Lu and Lo, Kyle and Chandrasekhar, Yoganand and Reas, Russell and Yang, Jiangjiang and Burdick, Doug and Eide, Darrin and Funk, Kathryn and Katsis, Yannis and Kinney, Rodney Michael and Li, Yunyao and Liu, Ziyang and Merrill, William and Mooney, Paul and Murdick, Dewey A. and Rishi, Devvret and Sheehan, Jerry and Shen, Zhihong and Stilson, Brandon and Wade, Alex D. and Wang, Kuansan and Wang, Nancy Xin Ru and Wilhelm, Christopher and Xie, Boya and Raymond, Douglas M. and Weld, Daniel S. and Etzioni, Oren and Kohlmeier, Sebastian", booktitle = "Proceedings of the 1st Workshop on {NLP} for {COVID-19} at {ACL} 2020", month = jul, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1" }