Intel/dynamic_tinybert | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

Intel/dynamic_tinybert

任务:

问答

类库:

PyTorch Transformers

数据集:

squad 3Asquad

语言:

其他:

bert Eval Results AutoTrain Compatible

预印本库:

arxiv:2111.09645

许可:

apache-2.0

模型介绍文件清单

英文

模型详情：Dynamic-TinyBERT: 通过动态序列长度提高TinyBERT的推理效率

Dynamic-TinyBERT经过微调，用于自然语言处理的问答任务，在SQuAD 1.1数据集上进行了训练。 Guskin et al. (2021) 注意：

Dynamic-TinyBERT是一个利用序列长度缩减和超参数优化的TinyBERT模型，以提高计算预算下的推理效率。Dynamic-TinyBERT仅训练一次，并且在性能上与BERT相当，并且实现了比其他任何高效方法更好的准确度和速度权衡（高达3.3倍，且损失降低小于1%）。

Model Detail	Description
Model Authors - Company	Intel
Model Card Authors	Intel in collaboration with Hugging Face
Date	November 22, 2021
Version	1
Type	NLP - Question Answering
Architecture	"For our Dynamic-TinyBERT model we use the architecture of TinyBERT6L: a small BERT model with 6 layers, a hidden size of 768, a feed forward size of 3072 and 12 heads." 1233321
Paper or Other Resources	1234321 ; 1235321 ; 1236321
License	Apache 2.0
Questions or Comments	1237321 and 1238321

Intended Use	Description
Primary intended uses	You can use the model for the NLP task of question answering: given a corpus of text, you can ask it a question about that text, and it will find the answer in the text.
Primary intended users	Anyone doing question answering
Out-of-scope uses	The model should not be used to intentionally create hostile or alienating environments for people.

如何使用

以下是在Python中导入此模型的方式：

Click to expand

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")

model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")

Factors	Description
Groups	Many Wikipedia articles with question and answer labels are contained in the training data
Instrumentation	-
Environment	Training was completed on a Titan GPU.
Card Prompts	Model deployment on alternate hardware and software will change model performance

Metrics	Description
Model performance measures	F1
Decision thresholds	-
Approaches to uncertainty and variability	-

Training and Evaluation Data	Description
Datasets	SQuAD1.1: "Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable." ( 1239321 )
Motivation	To build an efficient and accurate model for the question answering task.
Preprocessing	"We start with a pre-trained general-TinyBERT student, which was trained to learn the general knowledge of BERT using the general-distillation method presented by TinyBERT. We perform transformer distillation from a fine- tuned BERT teacher to the student, following the same training steps used in the original TinyBERT: (1) intermediate-layer distillation (ID) — learning the knowledge residing in the hidden states and attentions matrices, and (2) prediction-layer distillation (PD) — fitting the predictions of the teacher." ( 12310321 )

模型性能分析：

Model	Max F1 (full model)	Best Speedup within BERT-1%
Dynamic-TinyBERT	88.71	3.3x

Ethical Considerations	Description
Data	The training data come from Wikipedia articles
Human life	The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles.
Mitigations	No additional risk mitigation strategies were considered during model development.
Risks and harms	Significant research has explored bias and fairness issues with language models (see, e.g., 12311321 , and 12312321 ). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.
Use cases	-

Caveats and Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model.

BibTeX条目和引用信息

@misc{https://doi.org/10.48550/arxiv.2111.09645,
  doi = {10.48550/ARXIV.2111.09645},
  
  url = {https://arxiv.org/abs/2111.09645},
  
  author = {Guskin, Shira and Wasserblat, Moshe and Ding, Ke and Kim, Gyuwan},
  
  keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length},
  
  publisher = {arXiv},
  
  year = {2021},

作者:

Intel

数据集大小:

256.12 MB