数据集:
bigbio/euadr
具有特定实体和关系注释的语料库对于训练和评估文本挖掘系统至关重要,这些系统被开发出来从大语料库中提取特定的结构化信息。在本文中,我们描述了一种方法,其中一个命名实体识别系统产生了第一个标注,然后标注者使用基于网络的界面对此标注进行修订。所达到的一致性指标显示,标注者之间的一致性远远优于系统提供的标注的一致性。该语料库已经针对药物、疾病、基因及其相互关系进行了注释。针对药物-疾病、药物-靶点和靶点-疾病关系,每个关系类型的三位专家对一组100篇摘要进行了注释。这些注释关系将用于训练和评估文本挖掘软件,以捕获文本中的这些关系。
@article{VANMULLIGEN2012879, title = {The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships}, journal = {Journal of Biomedical Informatics}, volume = {45}, number = {5}, pages = {879-884}, year = {2012}, note = {Text Mining and Natural Language Processing in Pharmacogenomics}, issn = {1532-0464}, doi = {https://doi.org/10.1016/j.jbi.2012.04.004}, url = {https://www.sciencedirect.com/science/article/pii/S1532046412000573}, author = {Erik M. {van Mulligen} and Annie Fourrier-Reglat and David Gurwitz and Mariam Molokhia and Ainhoa Nieto and Gianluca Trifiro and Jan A. Kors and Laura I. Furlong}, keywords = {Text mining, Corpus development, Machine learning, Adverse drug reactions}, abstract = {Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug–disorder, drug–target, and target–disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.} }