CORe Model - BioBERT + Clinical Outcome Pre-Training

Model description

The CORe ( Clinical Outcome Representations ) model is introduced in the paper Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration . It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised Clinical Outcome Pre-Training objective.

How to use CORe

You can load the model via the transformers library:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-outcome-biobert-v1")
model = AutoModel.from_pretrained("bvanaken/CORe-clinical-outcome-biobert-v1")

From there, you can fine-tune it on clinical tasks that benefit from patient outcome knowledge.

Pre-Training Data

The model is based on BioBERT pre-trained on PubMed data. The Clinical Outcome Pre-Training included discharge summaries from the MIMIC III training set (specified here ), medical transcriptions from MTSamples and clinical notes from the i2b2 challenges 2006-2012. It further includes ~10k case reports from PubMed Central (PMC), disease articles from Wikipedia and article sections from the MedQuAd dataset extracted from NIH websites.

More Information

For all the details about CORe and contact info, please visit CORe.app.datexis.com .

Cite

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}

作者:

Betty van Aken

数据集大小:

826.6 MB