数据集:

bigbio/nlmchem

语言:

en

计算机处理:

monolingual

许可:

cc0-1.0
中文

Dataset Card for NLM-Chem

NLM-Chem corpus consists of 150 full-text articles from the PubMed Central Open Access dataset, comprising 67 different chemical journals, aiming to cover a general distribution of usage of chemical names in the biomedical literature. Articles were selected so that human annotation was most valuable (meaning that they were rich in bio-entities, and current state-of-the-art named entity recognition systems disagreed on bio-entity recognition.

Citation Information

@Article{islamaj2021nlm,
title={NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature},
author={Islamaj, Rezarta and Leaman, Robert and Kim, Sun and Kwon, Dongseop and Wei, Chih-Hsuan and Comeau, Donald C and Peng, Yifan and Cissel, David and Coss, Cathleen and Fisher, Carol and others},
journal={Scientific Data},
volume={8},
number={1},
pages={1--12},
year={2021},
publisher={Nature Publishing Group}
}