Intel/bert-large-uncased-sparse-90-unstructured-pruneofa | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

Intel/bert-large-uncased-sparse-90-unstructured-pruneofa

任务:

填充掩码

类库:

PyTorch TensorFlow Transformers

数据集:

wikipedia bookcorpus 3Abookcorpus 3Awikipedia

语言:

其他:

bert pretraining

预印本库:

arxiv:2111.05754 arxiv:1810.04805 arxiv:1910.01108

许可:

apache-2.0

模型介绍文件清单

英文

模型详情：90%稀疏的BERT-Large（无大小写）一次剪枝

该模型是一个稀疏的预训练模型，可以用于各种语言任务的微调。权重剪枝的过程是将神经网络的部分权重强制设置为零。将部分权重设置为零会导致稀疏的矩阵。更新神经网络的权重确实涉及矩阵乘法，如果我们能够保持矩阵的稀疏性同时保留足够重要的信息，就可以减少整体的计算开销。标题中的“稀疏”一词指示了权重中的稀疏比率；有关更多详情，请查阅 Zafrir et al. (2021) 。

剪枝一次即可方法的可视化来自 Zafrir et al. (2021) ：

Model Detail	Description
Model Authors - Company	Intel
Date	September 30, 2021
Version	1
Type	NLP - General sparse language model
Architecture	"The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine-tune the sparse pre-trained models for several language tasks while we prune the pre-trained model only once." 1235321
Paper or Other Resources	1236321 ; 1237321
License	Apache 2.0
Questions or Comments	1238321 and 1239321

Intended Use	Description
Primary intended uses	This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) question-answering, genre natural language inference, and sentiment classification.
Primary intended users	Anyone who needs an efficient general language model for other downstream tasks.
Out-of-scope uses	The model should not be used to intentionally create hostile or alienating environments for people.

如何使用

以下是在Python中导入该模型的示例：

import transformers

model = transformers.AutoModelForQuestionAnswering.from_pretrained('Intel/bert-large-uncased-sparse-90-unstructured-pruneofa')

更多的代码示例，请参阅 GitHub Repo 。

指标（模型性能）：

Model	Model Size	SQuADv1.1 (EM/F1)	MNLI-m (Acc)	MNLI-mm (Acc)	QQP (Acc/F1)	QNLI (Acc)	SST-2 (Acc)
12311321	-	81.29/88.47	-	-	-	-	-
12312321	Medium	81.10/88.42	82.71	83.67	91.15/88.00	90.34	91.46
12313321	Medium	79.83/87.25	81.45	82.43	90.93/87.72	89.07	90.88
12314321	Large	83.35/90.20	83.74	84.20	91.48/88.43	91.39	92.95
12315321	Small	78.10/85.82	81.35	82.03	90.29/86.97	88.31	90.60
12316321	Small	76.91/84.82	80.68	81.47	90.05/86.67	87.66	90.02

所有结果都是相同超参数和不同种子下的两次独立实验的平均值。

Training and Evaluation Data	Description
Datasets	12317321 (2500M words).
Motivation	To build an efficient and accurate base model for several downstream language tasks.
Preprocessing	"We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ( 12318321 , 12319321 ). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1."

Ethical Considerations	Description
Data	The training data come from Wikipedia articles
Human life	The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles.
Mitigations	No additional risk mitigation strategies were considered during model development.
Risks and harms	Significant research has explored bias and fairness issues with language models (see, e.g., 12320321 , and 12321321 ). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.
Use cases	-

Caveats and Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model.

BibTeX条目和引用信息

@article{zafrir2021prune,
  title={Prune Once for All: Sparse Pre-Trained Language Models},
  author={Zafrir, Ofir and Larey, Ariel and Boudoukh, Guy and Shen, Haihao and Wasserblat, Moshe},
  journal={arXiv preprint arXiv:2111.05754},
  year={2021}
}

作者:

Intel

数据集大小:

2.62 GB