BiomedNLP-PubMedBERT在文本蕴涵（NLI）上进行微调

在MNLI数据集上微调的 microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext 。在涉及生物医学语料库的文本蕴涵任务中应该很有用。

使用方法

给定两个句子（前提和假设），模型输出蕴涵、中性或矛盾的对数几率。

您可以在侧边栏使用HuggingFace模型小部件测试模型：

分别输入两个句子（前提和假设）。
模型返回3个标签的概率：蕴涵（标签：0）、中性（标签：1）和矛盾（标签：2）。

要在本地计算机上使用该模型：

# import torch
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

from transformers import AutoTokenizer, AutoModelForSequenceClassification
  
tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")
model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")

premise = 'EpCAM is overexpressed in breast cancer'
hypothesis = 'EpCAM is downregulated in breast cancer.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = model(x)[0]

probs = logits.softmax(dim=1)
print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu().
                                                                           detach().numpy(),3))
# Probabilities for entailment, neutral, contradiction 
# 0.001 0.001 0.998

指标

在MNLI测试集上的分类准确度评估（蕴涵、矛盾、中性）：

Metric	Value
Accuracy	0.8338

有关详细信息，请参阅“Training Metrics”选项卡。

作者:

Dimitris Papadopoulos

数据集大小:

418.42 MB