英文

DeBERTa-v3-base-mnli-fever-anli

模型描述

该模型是在MultiNLI、Fever-NLI和Adversarial-NLI(ANLI)数据集上进行训练的,包括763,913个NLI假设-前提对。该基础模型在 ANLI benchmark 上优于几乎所有大型模型。基础模型是 DeBERTa-v3-base from Microsoft 。DeBERTa的v3变体通过包含不同的预训练目标显著优于模型的先前版本,请参阅原始 DeBERTa paper 的附录11。

为了最高效的性能(但速度较慢),建议使用 https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli

如何使用该模型

简单的零样本分类管道
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
sequence_to_classify = "Angela Merkel is a politician in Germany and leader of the CDU"
candidate_labels = ["politics", "economy", "entertainment", "environment"]
output = classifier(sequence_to_classify, candidate_labels, multi_label=False)
print(output)
NLI用例
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

训练数据

DeBERTa-v3-base-mnli-fever-anli是在MultiNLI、Fever-NLI和Adversarial-NLI(ANLI)数据集上进行训练的,包括763,913个NLI假设-前提对。

训练过程

DeBERTa-v3-base-mnli-fever-anli使用Hugging Face的trainer训练,采用以下超参数。

training_args = TrainingArguments(
    num_train_epochs=3,              # total number of training epochs
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # batch size per device during training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_ratio=0.1,                # number of warmup steps for learning rate scheduler
    weight_decay=0.06,               # strength of weight decay
    fp16=True                        # mixed precision training
)

评估结果

模型使用MultiNLI的测试集、ANLI的测试集和Fever-NLI的开发集进行评估,使用的度量是准确率。

mnli-m mnli-mm fever-nli anli-all anli-r3
0.903 0.903 0.777 0.579 0.495

限制和偏见

请参考原始的DeBERTa论文和不同的NLI数据集的文献,了解潜在的偏见。

引用

如果您使用该模型,请引用:Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas和Kasper Welbers. 2022.‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. https://osf.io/74b8k

合作或问题的想法?

如果您有问题或合作的想法,请通过m{点}laurer{at}vu{点}nl或 LinkedIn 与我联系。

调试和问题

请注意,DeBERTa-v3发布于06.12.21,旧版本的HF Transformers似乎在运行该模型时存在问题(例如,与标记器相关的问题)。使用Transformers>=4.13可能会解决一些问题。

模型回收

Evaluation on 36 datasets 使用MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli作为基础模型,与microsoft/deberta-v3-base相比,平均得分为79.69,而microsoft/deberta-v3-base为79.04。

该模型在截止到2023年01月09日的所有测试模型中排名第二,针对microsoft/deberta-v3-base架构。

结果:

20_newsgroup ag_news amazon_reviews_multi anli boolq cb cola copa dbpedia esnli financial_phrasebank imdb isear mnli mrpc multirc poem_sentiment qnli qqp rotten_tomatoes rte sst2 sst_5bins stsb trec_coarse trec_fine tweet_ev_emoji tweet_ev_emotion tweet_ev_hate tweet_ev_irony tweet_ev_offensive tweet_ev_sentiment wic wnli wsc yahoo_answers
85.8072 90.4333 67.32 59.625 85.107 91.0714 85.8102 67 79.0333 91.6327 82.5 94.02 71.6428 89.5749 89.7059 64.1708 88.4615 93.575 91.4148 89.6811 86.2816 94.6101 57.0588 91.5508 97.6 91.2 45.264 82.6179 54.5455 74.3622 84.8837 71.6949 71.0031 69.0141 68.2692 71.3333

更多信息,请参阅: Model Recycling