在MNLI数据集上微调的 microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext 。在涉及生物医学语料库的文本蕴涵任务中应该很有用。
给定两个句子(前提和假设),模型输出蕴涵、中性或矛盾的对数几率。
您可以在侧边栏使用HuggingFace模型小部件测试模型:
要在本地计算机上使用该模型:
# import torch # device = torch.device("cuda" if torch.cuda.is_available() else "cpu") from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli") model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli") premise = 'EpCAM is overexpressed in breast cancer' hypothesis = 'EpCAM is downregulated in breast cancer.' # run through model pre-trained on MNLI x = tokenizer.encode(premise, hypothesis, return_tensors='pt', truncation_strategy='only_first') logits = model(x)[0] probs = logits.softmax(dim=1) print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu(). detach().numpy(),3)) # Probabilities for entailment, neutral, contradiction # 0.001 0.001 0.998
在MNLI测试集上的分类准确度评估(蕴涵、矛盾、中性):
Metric | Value |
---|---|
Accuracy | 0.8338 |
有关详细信息,请参阅“Training Metrics”选项卡。