模型:
MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli
任务:
数据集:
multi_nli anli fever lingnli alisawuffles/WANLI 3Aalisawuffles/WANLI 3Alingnli 3Afever 3Aanli 3Amulti_nli语言:
许可:
此模型在 MultiNLI 、 Fever-NLI 、Adversarial-NLI ( ANLI )、 LingNLI 和 WANLI 数据集上进行了微调,共包含885,242个NLI假设-前提对。截至06.06.22,这个模型是Hugging Face Hub上表现最好的NLI模型,并可用于零-shot分类。在 ANLI benchmark 上它明显优于其他所有大型模型。
基础模型是 DeBERTa-v3-large from Microsoft 。DeBERTa-v3相比传统的掩码语言模型(如BERT、RoBERTa等)结合了几项最新创新,详情请参阅 paper 。
from transformers import pipeline classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli") sequence_to_classify = "Angela Merkel is a politician in Germany and leader of the CDU" candidate_labels = ["politics", "economy", "entertainment", "environment"] output = classifier(sequence_to_classify, candidate_labels, multi_label=False) print(output)NLI用例
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model_name = "MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing." hypothesis = "The movie was not good." input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt") output = model(input["input_ids"].to(device)) # device = "cuda:0" or "cpu" prediction = torch.softmax(output["logits"][0], -1).tolist() label_names = ["entailment", "neutral", "contradiction"] prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)} print(prediction)
DeBERTa-v3-large-mnli-fever-anli-ling-wanli是在 MultiNLI 、 Fever-NLI 、Adversarial-NLI ( ANLI )、 LingNLI 和 WANLI 数据集上进行训练的,共包含885,242个NLI假设-前提对。请注意,由于数据集的质量问题, SNLI 明确被排除在外。更多数据并不一定能得到更好的NLI模型。
使用Hugging Face训练器对DeBERTa-v3-large-mnli-fever-anli-ling-wanli进行了训练,使用了以下超参数。请注意,在我的测试中,更长的训练和更多的轮次会损害性能(过拟合)。
training_args = TrainingArguments( num_train_epochs=4, # total number of training epochs learning_rate=5e-06, per_device_train_batch_size=16, # batch size per device during training gradient_accumulation_steps=2, # doubles the effective batch_size to 32, while decreasing memory requirements per_device_eval_batch_size=64, # batch size for evaluation warmup_ratio=0.06, # number of warmup steps for learning rate scheduler weight_decay=0.01, # strength of weight decay fp16=True # mixed precision training )
该模型使用MultiNLI、ANLI、LingNLI、WANLI的测试集以及Fever-NLI的开发集进行了评估。评估指标为准确度。该模型在每个数据集上都取得了最先进的性能。令人惊讶的是,它在与之前的 state-of-the-art on ANLI (ALBERT-XXL)相比提高了8.3%的性能。我认为这是因为ANLI是为了欺骗类似RoBERTa(或ALBERT)这样的掩码语言模型而创建的,而DeBERTa-v3使用了更好的预训练目标(RTD)、分离的注意力,我还对更高质量的NLI数据进行了微调。
Datasets | mnli_test_m | mnli_test_mm | anli_test | anli_test_r3 | ling_test | wanli_test |
---|---|---|---|---|---|---|
Accuracy | 0.912 | 0.908 | 0.702 | 0.64 | 0.87 | 0.77 |
Speed (text/sec, A100 GPU) | 696.0 | 697.0 | 488.0 | 425.0 | 828.0 | 980.0 |
请查阅原始的DeBERTa-v3论文和有关不同NLI数据集的文献,以获取有关训练数据和潜在偏见的更多信息。该模型将重现训练数据的统计模式。
如果您使用了该模型,请引用:Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas和Kasper Welbers。2022年。“Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI”。预印本,6月。Open Science Framework. https://osf.io/74b8k 。
如果您有问题或合作的想法,请通过m{dot}laurer{at}vu{dot}nl或 LinkedIn 与我联系。
请注意,DeBERTa-v3于06.12.21发布,较旧版本的HF Transformers似乎在运行模型时存在问题(例如,令牌化器出现问题)。使用Transformers> = 4.13可能会解决一些问题。