数据集:

bigbio/genetag

语言:

en

计算机处理:

monolingual

许可:

other
中文

Dataset Card for GENETAG

Named entity recognition (NER) is an important first step for text mining the biomedical literature. Evaluating the performance of biomedical NER systems is impossible without a standardized test corpus. The annotation of such a corpus for gene/protein name NER is a difficult process due to the complexity of gene/protein names. We describe the construction and annotation of GENETAG, a corpus of 20K MEDLINE® sentences for gene/protein NER. 15K GENETAG sentences were used for the BioCreAtIvE Task 1A Competition..

Citation Information

@article{Tanabe2005,
  author    = {Lorraine Tanabe and Natalie Xie and Lynne H Thom and Wayne Matten and W John Wilbur},
  title     = {{GENETAG}: a tagged corpus for gene/protein named entity recognition},
  journal   = {{BMC} Bioinformatics},
  volume    = {6},
  year      = {2005},
  url       = {https://doi.org/10.1186/1471-2105-6-S1-S3},
  doi       = {10.1186/1471-2105-6-s1-s3},
  biburl    = {},
  bibsource = {}
}