DeBERTa-v3-base-mnli-fever-anli

模型描述

该模型在MultiNLI数据集上进行了训练，该数据集包含392,702个NLI假设-前提对。基础模型有 DeBERTa-v3-base from Microsoft 个参数。DeBERTa的v3变种通过包含不同的预训练目标显著优于先前的模型版本，请参见原始 DeBERTa paper 的附录11。如果需要更强大的模型，请查看 DeBERTa-v3-base-mnli-fever-anli ，该模型在更多数据上进行了训练。

使用方法和限制

如何使用模型

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."
input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

训练数据

该模型使用MultiNLI数据集进行了训练，该数据集包含392,702个NLI假设-前提对。

训练过程

DeBERTa-v3-base-mnli使用Hugging Face训练器以以下超参数进行训练。

training_args = TrainingArguments(
    num_train_epochs=5,              # total number of training epochs
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # batch size per device during training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_ratio=0.1,                # number of warmup steps for learning rate scheduler
    weight_decay=0.06,               # strength of weight decay
    fp16=True                        # mixed precision training
)

评估结果

该模型使用匹配的测试集进行评估，并达到0.90的准确率。

限制和偏差

请参考原始的DeBERTa论文和关于不同NLI数据集的文献，以了解潜在的偏差。

BibTeX条目和引用信息

如果您想引用此模型，请引用原始的DeBERTa论文、相应的NLI数据集，并包含指向Hugging Face hub上该模型的链接。

合作或问题的想法？

如果您有问题或合作的想法，请通过m{点}laurer{at}vu{点}nl或 LinkedIn 与我联系。

调试和问题

请注意，DeBERTa-v3最近发布，较旧版本的HF Transformers似乎无法运行该模型（例如，与分词器相关的问题）。使用Transformers==4.13可能会解决一些问题。

模型回收利用

使用MoritzLaurer/DeBERTa-v3-base-mnli作为基础模型的 Evaluation on 36 datasets 模型与microsoft/deberta-v3-base相比，平均得分为80.01，而microsoft/deberta-v3-base为79.04。

截至2023年9月1日，该模型在microsoft/deberta-v3-base架构的所有测试模型中排名第一。

结果：

20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
86.0196	90.6333	66.96	60.0938	83.792	83.9286	86.5772	72	79.2	91.419	85.1	94.232	71.5124	89.4426	90.4412	63.7583	86.5385	93.8129	91.9144	89.8687	85.9206	95.4128	57.3756	91.377	97.4	91	47.302	83.6031	57.6431	77.1684	83.3721	70.2947	71.7868	67.6056	74.0385	71.7

有关更多信息，请参见 Model Recycling 。

作者:

Moritz Laurer

数据集大小:

1.38 GB