模型:
distilroberta-base
这个模型是DistilRoBERTa的基础版本。它遵循与RoBERTa相同的训练过程,可在 here 处找到蒸馏过程的代码。该模型区分大小写: English和english是有区别的。
该模型有6层,768维和12个头,总参数为82M(与RoBERTa-base的125M参数相比)。DistilRoBERTa的速度平均比RoBERTa-base快一倍。
我们鼓励使用该模型卡片的用户查看 RoBERTa-base model card ,以了解更多关于使用、限制和潜在偏见的信息。
您可以使用原始模型进行遮蔽语言建模,但主要用于在下游任务上进行微调。请查看 model hub ,以寻找您感兴趣的任务的微调版本。
请注意,该模型主要用于在使用整个句子(可能被遮蔽)进行决策的任务上进行微调,例如序列分类、标记分类或问答。对于生成文本等任务,您应该看一下类似GPT2的模型。
该模型不应用于有意对人们创建敌对或疏远环境。该模型的训练不是为了成为人或事件的真实或准确代表,因此使用模型生成这种内容超出了该模型的能力范围。
大量研究探讨了语言模型的偏见和公平性问题(参见,例如, Sheng et al. (2021) 和 Bender et al. (2021) )。模型生成的预测可能涉及对受保护类别、身份特征以及敏感的社会和职业群体的令人不安和有害的刻板印象。例如:
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("The man worked as a <mask>.")
[{'score': 0.1237526461482048,
'sequence': 'The man worked as a waiter.',
'token': 38233,
'token_str': ' waiter'},
{'score': 0.08968018740415573,
'sequence': 'The man worked as a waitress.',
'token': 35698,
'token_str': ' waitress'},
{'score': 0.08387645334005356,
'sequence': 'The man worked as a bartender.',
'token': 33080,
'token_str': ' bartender'},
{'score': 0.061059024184942245,
'sequence': 'The man worked as a mechanic.',
'token': 25682,
'token_str': ' mechanic'},
{'score': 0.03804653510451317,
'sequence': 'The man worked as a courier.',
'token': 37171,
'token_str': ' courier'}]
>>> unmasker("The woman worked as a <mask>.")
[{'score': 0.23149248957633972,
'sequence': 'The woman worked as a waitress.',
'token': 35698,
'token_str': ' waitress'},
{'score': 0.07563332468271255,
'sequence': 'The woman worked as a waiter.',
'token': 38233,
'token_str': ' waiter'},
{'score': 0.06983394920825958,
'sequence': 'The woman worked as a bartender.',
'token': 33080,
'token_str': ' bartender'},
{'score': 0.05411609262228012,
'sequence': 'The woman worked as a nurse.',
'token': 9008,
'token_str': ' nurse'},
{'score': 0.04995106905698776,
'sequence': 'The woman worked as a maid.',
'token': 29754,
'token_str': ' maid'}]
用户(包括直接用户和下游用户)应了解模型的风险、偏见和限制。
DistilRoBERTa在 OpenWebTextCorpus 上进行了预训练,这是OpenAI的WebText数据集的复制品(其训练数据比teacher RoBERTa少约4倍)。有关训练的详细信息,请参阅 roberta-base model card 。
在下游任务上进行微调时,该模型实现了以下结果(请参阅 GitHub Repo ):
Glue测试结果:
Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE |
---|---|---|---|---|---|---|---|---|
84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
可以使用 Machine Learning Impact calculator 中提出的方法估计碳排放量,该方法在 Lacoste et al. (2019) 中提供。
@article{Sanh2019DistilBERTAD,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
journal={ArXiv},
year={2019},
volume={abs/1910.01108}
}
APA格式
您可以使用带有遮蔽语言建模的管道直接使用模型:
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("Hello I'm a <mask> model.")
[{'score': 0.04673689603805542,
'sequence': "Hello I'm a business model.",
'token': 265,
'token_str': ' business'},
{'score': 0.03846118599176407,
'sequence': "Hello I'm a freelance model.",
'token': 18150,
'token_str': ' freelance'},
{'score': 0.03308931365609169,
'sequence': "Hello I'm a fashion model.",
'token': 2734,
'token_str': ' fashion'},
{'score': 0.03018997237086296,
'sequence': "Hello I'm a role model.",
'token': 774,
'token_str': ' role'},
{'score': 0.02111748233437538,
'sequence': "Hello I'm a Playboy model.",
'token': 24526,
'token_str': ' Playboy'}]