模型:
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli
该模型是在MultiNLI、Fever-NLI和Adversarial-NLI(ANLI)数据集上进行训练的,包括763,913个NLI假设-前提对。该基础模型在 ANLI benchmark 上优于几乎所有大型模型。基础模型是 DeBERTa-v3-base from Microsoft 。DeBERTa的v3变体通过包含不同的预训练目标显著优于模型的先前版本,请参阅原始 DeBERTa paper 的附录11。
为了最高效的性能(但速度较慢),建议使用 https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli 。
from transformers import pipeline classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli") sequence_to_classify = "Angela Merkel is a politician in Germany and leader of the CDU" candidate_labels = ["politics", "economy", "entertainment", "environment"] output = classifier(sequence_to_classify, candidate_labels, multi_label=False) print(output)NLI用例
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing." hypothesis = "The movie was good." input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt") output = model(input["input_ids"].to(device)) # device = "cuda:0" or "cpu" prediction = torch.softmax(output["logits"][0], -1).tolist() label_names = ["entailment", "neutral", "contradiction"] prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)} print(prediction)
DeBERTa-v3-base-mnli-fever-anli是在MultiNLI、Fever-NLI和Adversarial-NLI(ANLI)数据集上进行训练的,包括763,913个NLI假设-前提对。
DeBERTa-v3-base-mnli-fever-anli使用Hugging Face的trainer训练,采用以下超参数。
training_args = TrainingArguments( num_train_epochs=3, # total number of training epochs learning_rate=2e-05, per_device_train_batch_size=32, # batch size per device during training per_device_eval_batch_size=32, # batch size for evaluation warmup_ratio=0.1, # number of warmup steps for learning rate scheduler weight_decay=0.06, # strength of weight decay fp16=True # mixed precision training )
模型使用MultiNLI的测试集、ANLI的测试集和Fever-NLI的开发集进行评估,使用的度量是准确率。
mnli-m | mnli-mm | fever-nli | anli-all | anli-r3 |
---|---|---|---|---|
0.903 | 0.903 | 0.777 | 0.579 | 0.495 |
请参考原始的DeBERTa论文和不同的NLI数据集的文献,了解潜在的偏见。
如果您使用该模型,请引用:Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas和Kasper Welbers. 2022.‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. https://osf.io/74b8k 。
如果您有问题或合作的想法,请通过m{点}laurer{at}vu{点}nl或 LinkedIn 与我联系。
请注意,DeBERTa-v3发布于06.12.21,旧版本的HF Transformers似乎在运行该模型时存在问题(例如,与标记器相关的问题)。使用Transformers>=4.13可能会解决一些问题。
Evaluation on 36 datasets 使用MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli作为基础模型,与microsoft/deberta-v3-base相比,平均得分为79.69,而microsoft/deberta-v3-base为79.04。
该模型在截止到2023年01月09日的所有测试模型中排名第二,针对microsoft/deberta-v3-base架构。
结果:
20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
85.8072 | 90.4333 | 67.32 | 59.625 | 85.107 | 91.0714 | 85.8102 | 67 | 79.0333 | 91.6327 | 82.5 | 94.02 | 71.6428 | 89.5749 | 89.7059 | 64.1708 | 88.4615 | 93.575 | 91.4148 | 89.6811 | 86.2816 | 94.6101 | 57.0588 | 91.5508 | 97.6 | 91.2 | 45.264 | 82.6179 | 54.5455 | 74.3622 | 84.8837 | 71.6949 | 71.0031 | 69.0141 | 68.2692 | 71.3333 |
更多信息,请参阅: Model Recycling