模型:

distilroberta-base

任务:

填充掩码

类库:

PyTorch TensorFlow JAX Rust Safetensors Transformers

数据集:

openwebtext 3Aopenwebtext

语言:

其他:

roberta exbert AutoTrain Compatible

预印本库:

arxiv:1910.01108 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

DistilRoBERTa base模型卡片

模型详情

模型描述

这个模型是DistilRoBERTa的基础版本。它遵循与RoBERTa相同的训练过程，可在 here 处找到蒸馏过程的代码。该模型区分大小写: English和english是有区别的。

该模型有6层，768维和12个头，总参数为82M（与RoBERTa-base的125M参数相比）。DistilRoBERTa的速度平均比RoBERTa-base快一倍。

我们鼓励使用该模型卡片的用户查看 RoBERTa-base model card ，以了解更多关于使用、限制和潜在偏见的信息。

开发人员：Victor Sanh、Lysandre Debut、Julien Chaumond、Thomas Wolf（Hugging Face）
模型类型：基于Transformer的语言模型
语言(NLP)：英语
许可证：Apache 2.0
相关模型： RoBERTa-base model card
更多信息资源：
- GitHub Repository
- Associated Paper

应用

直接使用和下游应用

您可以使用原始模型进行遮蔽语言建模，但主要用于在下游任务上进行微调。请查看 model hub ，以寻找您感兴趣的任务的微调版本。

请注意，该模型主要用于在使用整个句子（可能被遮蔽）进行决策的任务上进行微调，例如序列分类、标记分类或问答。对于生成文本等任务，您应该看一下类似GPT2的模型。

超范围使用

该模型不应用于有意对人们创建敌对或疏远环境。该模型的训练不是为了成为人或事件的真实或准确代表，因此使用模型生成这种内容超出了该模型的能力范围。

偏见、风险和限制

大量研究探讨了语言模型的偏见和公平性问题（参见，例如， Sheng et al. (2021) 和 Bender et al. (2021) ）。模型生成的预测可能涉及对受保护类别、身份特征以及敏感的社会和职业群体的令人不安和有害的刻板印象。例如：

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("The man worked as a <mask>.")
[{'score': 0.1237526461482048,
  'sequence': 'The man worked as a waiter.',
  'token': 38233,
  'token_str': ' waiter'},
 {'score': 0.08968018740415573,
  'sequence': 'The man worked as a waitress.',
  'token': 35698,
  'token_str': ' waitress'},
 {'score': 0.08387645334005356,
  'sequence': 'The man worked as a bartender.',
  'token': 33080,
  'token_str': ' bartender'},
 {'score': 0.061059024184942245,
  'sequence': 'The man worked as a mechanic.',
  'token': 25682,
  'token_str': ' mechanic'},
 {'score': 0.03804653510451317,
  'sequence': 'The man worked as a courier.',
  'token': 37171,
  'token_str': ' courier'}]
  
>>> unmasker("The woman worked as a <mask>.")
[{'score': 0.23149248957633972,
  'sequence': 'The woman worked as a waitress.',
  'token': 35698,
  'token_str': ' waitress'},
 {'score': 0.07563332468271255,
  'sequence': 'The woman worked as a waiter.',
  'token': 38233,
  'token_str': ' waiter'},
 {'score': 0.06983394920825958,
  'sequence': 'The woman worked as a bartender.',
  'token': 33080,
  'token_str': ' bartender'},
 {'score': 0.05411609262228012,
  'sequence': 'The woman worked as a nurse.',
  'token': 9008,
  'token_str': ' nurse'},
 {'score': 0.04995106905698776,
  'sequence': 'The woman worked as a maid.',
  'token': 29754,
  'token_str': ' maid'}]

建议

用户（包括直接用户和下游用户）应了解模型的风险、偏见和限制。

训练详情

DistilRoBERTa在 OpenWebTextCorpus 上进行了预训练，这是OpenAI的WebText数据集的复制品（其训练数据比teacher RoBERTa少约4倍）。有关训练的详细信息，请参阅 roberta-base model card 。

评估

在下游任务上进行微调时，该模型实现了以下结果（请参阅 GitHub Repo ）：

Glue测试结果：

Task	MNLI	QQP	QNLI	SST-2	CoLA	STS-B	MRPC	RTE
84.0	89.4	90.8	92.5	59.3	88.3	86.6	67.9

环境影响

可以使用 Machine Learning Impact calculator 中提出的方法估计碳排放量，该方法在 Lacoste et al. (2019) 中提供。

硬件类型：需要更多信息
使用小时数：需要更多信息
云提供商：需要更多信息
计算区域：需要更多信息
排放的碳量：需要更多信息

引用

@article{Sanh2019DistilBERTAD,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.01108}
}

APA格式

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

如何开始使用该模型

您可以使用带有遮蔽语言建模的管道直接使用模型：

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("Hello I'm a <mask> model.")
[{'score': 0.04673689603805542,
  'sequence': "Hello I'm a business model.",
  'token': 265,
  'token_str': ' business'},
 {'score': 0.03846118599176407,
  'sequence': "Hello I'm a freelance model.",
  'token': 18150,
  'token_str': ' freelance'},
 {'score': 0.03308931365609169,
  'sequence': "Hello I'm a fashion model.",
  'token': 2734,
  'token_str': ' fashion'},
 {'score': 0.03018997237086296,
  'sequence': "Hello I'm a role model.",
  'token': 774,
  'token_str': ' role'},
 {'score': 0.02111748233437538,
  'sequence': "Hello I'm a Playboy model.",
  'token': 24526,
  'token_str': ' Playboy'}]

作者:

None

数据集大小:

1.83 GB