英文

模型描述

该模型是在NCBI疾病数据集上针对疾病命名实体识别(NER)任务进行微调的BioBERT版本。它可用于从医学和生物领域的非结构化文本中提取疾病提及。

预期用途

该模型适用于从医学和生物领域的非结构化文本中提取疾病提及。它可用于改进这些领域的信息检索和知识提取。

训练数据

该模型是在 NCBI disease dataset 上训练的,其中包含793篇PubMed摘要,涵盖6892个疾病提及。

使用方法

您可以使用Hugging Face Transformers库使用此模型。以下是一个示例,演示如何加载模型,并使用它从文本中提取疾病提及:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("ugaray96/biobert_ncbi_disease_ner")
model = AutoModelForTokenClassification.from_pretrained(
    "ugaray96/biobert_ncbi_disease_ner"
)

ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

text = "The patient was diagnosed with lung cancer and started chemotherapy. They also have a history of diabetes and heart disease."
result = ner_pipeline(text)

diseases = []
for entity in result:
    if entity["entity"] == "Disease":
        diseases.append(entity["word"])
    elif entity["entity"] == "Disease Continuation" and diseases:
        diseases[-1] += f" {entity['word']}"

print(f"Diseases: {', '.join(diseases)}")

输出应为:疾病:肺癌,糖尿病,心脏病