该模型是在NCBI疾病数据集上针对疾病命名实体识别(NER)任务进行微调的BioBERT版本。它可用于从医学和生物领域的非结构化文本中提取疾病提及。
该模型适用于从医学和生物领域的非结构化文本中提取疾病提及。它可用于改进这些领域的信息检索和知识提取。
该模型是在 NCBI disease dataset 上训练的,其中包含793篇PubMed摘要,涵盖6892个疾病提及。
您可以使用Hugging Face Transformers库使用此模型。以下是一个示例,演示如何加载模型,并使用它从文本中提取疾病提及:
from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained("ugaray96/biobert_ncbi_disease_ner") model = AutoModelForTokenClassification.from_pretrained( "ugaray96/biobert_ncbi_disease_ner" ) ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer) text = "The patient was diagnosed with lung cancer and started chemotherapy. They also have a history of diabetes and heart disease." result = ner_pipeline(text) diseases = [] for entity in result: if entity["entity"] == "Disease": diseases.append(entity["word"]) elif entity["entity"] == "Disease Continuation" and diseases: diseases[-1] += f" {entity['word']}" print(f"Diseases: {', '.join(diseases)}")
输出应为:疾病:肺癌,糖尿病,心脏病