CORe模型 - 临床诊断预测

模型描述

CORe（临床结果表示）模型在文章 Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration 中介绍。它基于BioBERT，并在临床记录、疾病描述和医学文章上进行了进一步的预训练，采用了专门的临床结果预训练目标。

该模型检查点在诊断预测任务上进行了微调。模型接受患者的入院记录作为输入，并输出多标签ICD9编码的预测。

模型预测

该模型总共对9237个标签进行预测。其中包含3位和4位的ICD9编码以及这些编码的文本描述。4位编码和文本描述有助于在训练期间将进一步的主题和层次信息纳入模型中（请参见我们的论文中的第4.2节ICD+：纳入ICD层次结构）。我们建议在推断时只使用3位编码的预测，因为只有这些编码在我们的工作中进行了评估。

如何使用CORe诊断预测

您可以通过transformers库加载模型：

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

以下代码显示了一个推断示例：

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

注意：为了获得最佳性能，我们建议根据每个标签单独确定阈值（在本例中为0.3）。

引用

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}

作者:

Data Science and Text-based Information Systems

数据集大小:

440.93 MB

CORe模型 - 临床诊断预测

模型描述

更多信息

引用