数据集:
bigbio/cellfinder
CellFinder 项目旨在通过链接现有的公共数据库的信息并对研究文献进行文本挖掘,创建一个干细胞数据库。第一个版本的语料库包含10个完整文本文档,包含超过2,100个句子,65,000个标记和5,200个实体注释。该语料库已用六种类型的实体(解剖部分、细胞组分、细胞系、细胞类型、基因/蛋白质和物种)进行注释,整体的注释者一致性约为80%。
参考资料: https://www.informatik.hu-berlin.de/de/forschung/gebiete/wbi/resources/cellfinder/
@inproceedings{neves2012annotating, title = {Annotating and evaluating text for stem cell research}, author = {Neves, Mariana and Damaschun, Alexander and Kurtz, Andreas and Leser, Ulf}, year = 2012, booktitle = { Proceedings of the Third Workshop on Building and Evaluation Resources for Biomedical Text Mining\ (BioTxtM 2012) at Language Resources and Evaluation (LREC). Istanbul, Turkey }, pages = {16--23}, organization = {Citeseer} }