CORe(临床结果表示)模型在文章 Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration 中介绍。它基于BioBERT,并在临床记录、疾病描述和医学文章上进行了进一步的预训练,采用了专门的临床结果预训练目标。
该模型检查点在诊断预测任务上进行了微调。模型接受患者的入院记录作为输入,并输出多标签ICD9编码的预测。
模型预测该模型总共对9237个标签进行预测。其中包含3位和4位的ICD9编码以及这些编码的文本描述。4位编码和文本描述有助于在训练期间将进一步的主题和层次信息纳入模型中(请参见我们的论文中的第4.2节ICD+:纳入ICD层次结构)。我们建议在推断时只使用3位编码的预测,因为只有这些编码在我们的工作中进行了评估。
如何使用CORe诊断预测您可以通过transformers库加载模型:
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction") model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
以下代码显示了一个推断示例:
input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life." tokenized_input = tokenizer(input, return_tensors="pt") output = model(**tokenized_input) import torch predictions = torch.sigmoid(output.logits) predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]
注意:为了获得最佳性能,我们建议根据每个标签单独确定阈值(在本例中为0.3)。
有关CORe和联系信息的所有详细信息,请访问 CORe.app.datexis.com 。
@inproceedings{vanaken21, author = {Betty van Aken and Jens-Michalis Papaioannou and Manuel Mayrdorfer and Klemens Budde and Felix A. Gers and Alexander Löser}, title = {Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration}, booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, {EACL} 2021, Online, April 19 - 23, 2021}, publisher = {Association for Computational Linguistics}, year = {2021}, }