模型:
Intel/bert-base-uncased-sparse-85-unstructured-pruneofa
该模型是一个稀疏的预训练模型,可以在各种语言任务中进行微调。权重剪枝的过程是将神经网络的一些权重强制设为零。将部分权重设为零会导致稀疏的矩阵。在更新神经网络权重时,需要进行矩阵乘法运算,如果我们能够在保留足够重要信息的同时保持矩阵的稀疏性,就可以减少整体的计算开销。标题中的“稀疏”一词表示权重的稀疏比例;有关更多详细信息,请阅读 Zafrir et al. (2021) 。
来自 Zafrir et al. (2021) 的剪枝一劳永逸方法的可视化:
Model Detail | Description |
---|---|
Model Authors - Company | Intel |
Date | September 30, 2021 |
Version | 1 |
Type | NLP - General sparse language model |
Architecture | "The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine-tune the sparse pre-trained models for several language tasks while we prune the pre-trained model only once." 1235321 |
Paper or Other Resources | 1236321 ; 1237321 |
License | Apache 2.0 |
Questions or Comments | 1238321 and 1239321 |
Intended Use | Description |
---|---|
Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) question-answering, genre natural language inference, and sentiment classification. |
Primary intended users | Anyone who needs an efficient general language model for other downstream tasks. |
Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people. |
以下是在Python中导入此模型的示例:
import transformers model = transformers.AutoModelForQuestionAnswering.from_pretrained('Intel/bert-base-uncased-sparse-85-unstructured-pruneofa')
有关更多代码示例,请参阅 GitHub Repo 。
Model | Model Size | SQuADv1.1 (EM/F1) | MNLI-m (Acc) | MNLI-mm (Acc) | QQP (Acc/F1) | QNLI (Acc) | SST-2 (Acc) |
---|---|---|---|---|---|---|---|
12311321 | - | 81.29/88.47 | - | - | - | - | - |
12312321 | Medium | 81.10/88.42 | 82.71 | 83.67 | 91.15/88.00 | 90.34 | 91.46 |
12313321 | Medium | 79.83/87.25 | 81.45 | 82.43 | 90.93/87.72 | 89.07 | 90.88 |
12314321 | Large | 83.35/90.20 | 83.74 | 84.20 | 91.48/88.43 | 91.39 | 92.95 |
12315321 | Small | 78.10/85.82 | 81.35 | 82.03 | 90.29/86.97 | 88.31 | 90.60 |
12316321 | Small | 76.91/84.82 | 80.68 | 81.47 | 90.05/86.67 | 87.66 | 90.02 |
所有结果都是相同超参数但使用不同种子进行两次独立实验的平均值。
Training and Evaluation Data | Description |
---|---|
Datasets | 12317321 (2500M words). |
Motivation | To build an efficient and accurate base model for several downstream language tasks. |
Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ( 12318321 , 12319321 ). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
Ethical Considerations | Description |
---|---|
Data | The training data come from Wikipedia articles |
Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles. |
Mitigations | No additional risk mitigation strategies were considered during model development. |
Risks and harms | Significant research has explored bias and fairness issues with language models (see, e.g., 12320321 , and 12321321 ). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown. |
Use cases | - |
Caveats and Recommendations |
---|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. |
@article{zafrir2021prune, title={Prune Once for All: Sparse Pre-Trained Language Models}, author={Zafrir, Ofir and Larey, Ariel and Boudoukh, Guy and Shen, Haihao and Wasserblat, Moshe}, journal={arXiv preprint arXiv:2111.05754}, year={2021} }