数据集:
tner/bionlp2004
TNER 项目中格式化的 BioNLP2004 NER 数据集。BioNLP2004 数据集只包含训练集和测试集,因此我们从训练集中随机抽取一半大小的测试实例来创建验证集。
train 的一个示例如下。
{ 'tags': [0, 0, 0, 0, 3, 0, 9, 10, 0, 0, 0, 0, 0, 7, 8, 0, 3, 0, 0, 9, 10, 10, 0, 0], 'tokens': ['In', 'the', 'presence', 'of', 'Epo', ',', 'c-myb', 'mRNA', 'declined', 'and', '20', '%', 'of', 'K562', 'cells', 'synthesized', 'Hb', 'regardless', 'of', 'antisense', 'myb', 'RNA', 'expression', '.'] }
label2id 字典可以在 here 中找到。
{ "O": 0, "B-DNA": 1, "I-DNA": 2, "B-protein": 3, "I-protein": 4, "B-cell_type": 5, "I-cell_type": 6, "B-cell_line": 7, "I-cell_line": 8, "B-RNA": 9, "I-RNA": 10 }
name | train | validation | test |
---|---|---|---|
bionlp2004 | 16619 | 1927 | 3856 |
@inproceedings{collier-kim-2004-introduction, title = "Introduction to the Bio-entity Recognition Task at {JNLPBA}", author = "Collier, Nigel and Kim, Jin-Dong", booktitle = "Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications ({NLPBA}/{B}io{NLP})", month = aug # " 28th and 29th", year = "2004", address = "Geneva, Switzerland", publisher = "COLING", url = "https://aclanthology.org/W04-1213", pages = "73--78", }