模型说明:DistilBERT模型提出于博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT 和论文 DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter 中。DistilBERT是一个小型,快速,廉价且轻量级的Transformer模型,通过蒸馏BERT base进行训练。它的参数比bert-base-uncased少40%,运行速度比BERT快60%,同时在GLUE语言理解基准测试中保持了超过95%的BERT性能。
这个模型是 DistilBERT-base-cased 的微调检查点,使用 SQuAD v1.1 上的知识蒸馏进行了(第二步的)微调。
>>> from transformers import pipeline >>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad') >>> context = r""" ... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a ... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune ... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script. ... """ >>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context) >>> print( ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}" ...) Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
from transformers import DistilBertTokenizer, DistilBertModel import torch tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad') model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad') question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) print(outputs)
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering import tensorflow as tf tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad") model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad") question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="tf") outputs = model(**inputs) answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0]) answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0]) predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens)
对语言模型进行了重要的研究,探讨了偏见和公平性问题(参见例如 Sheng et al. (2021) 和 Bender et al. (2021) )。模型生成的预测可能包含针对受保护群体、身份特征和敏感社会和职业群体的令人不安和有害的刻板印象。例如:
>>> from transformers import pipeline >>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad') >>> context = r""" ... Alice is sitting on the bench. Bob is sitting next to her. ... """ >>> result = question_answerer(question="Who is the CEO?", context=context) >>> print( ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}" ...) Answer: 'Bob', score: 0.7527, start: 32, end: 35
distilbert-base-cased model 是使用与 distilbert-base-uncased model 相同的数据进行训练的。 distilbert-base-uncased model 模型描述了其训练数据如下:
DistilBERT在与BERT相同的数据上进行了预训练,该数据集是 BookCorpus ,包括11,038本未公开的书籍和 English Wikipedia (不包括列表、表格和标题)。
要了解有关SQuAD v1.1数据集的更多信息,请参见 SQuAD v1.1 data card 。
训练过程预处理有关详细信息,请参见 distilbert-base-cased model card 。
预训练有关详细信息,请参见 distilbert-base-cased model card 。
如 model repository 所讨论
该模型在[SQuAD v1.1]开发集上达到87.1的F1分数(作为比较,BERT的bert-base-cased版本达到88.7的F1分数)。
可以使用 Machine Learning Impact calculator 在 Lacoste et al. (2019) 中提供的方式估算碳排放量。我们基于 associated paper 给出硬件类型和使用时间。请注意,这些细节仅适用于DistilBERT的训练,不包括与SQuAD的微调。
有关建模架构、目标、计算基础设施和训练细节的详细信息,请参见 associated paper 。
@inproceedings{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, booktitle={NeurIPS EMC^2 Workshop}, year={2019} }
本模型卡片由Hugging Face团队撰写。