模型:

google/switch-large-128

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

切换Transformers Large-128专家的模型卡

TL;DR

切换Transformers是一个在遮蔽语言建模（MLM）任务上训练的专家混合模型（MoE）。模型架构类似于经典的T5，但将前馈层替换为包含“专家”MLP的稀疏MLP层。根据 original paper 的说法，该模型在训练速度上具有更好的缩放性能，同时也比T5在微调任务上表现更好。正如摘要的前几行所提到的：

我们通过在“庞大干净爬行语料库”上预训练万亿参数模型，实现了对T5-XXL模型的4倍加速。

免责声明：本模型卡的内容由Hugging Face团队撰写，其中的部分内容来自 original paper 。

模型细节

模型描述

模型类型：语言模型
语言（NLP）：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
了解更多信息的资源：

用法

请注意，这些检查点是在遮蔽语言建模（MLM）任务上训练的。因此，这些检查点不适合直接用于下游任务。您可以使用FLAN-T5来运行微调权重，或者按照 this notebook 自己微调专家混合模型。

以下是如何在transformers中使用模型的一些示例脚本：

使用PyTorch模型

在CPU上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-large-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-large-128")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-large-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-large-128", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

使用不同精度在GPU上运行模型

FP16 点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-large-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-large-128", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-large-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-large-128", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

用途

直接使用和下游使用

有关更多详细信息，请参见 research paper 。

超出范围的使用

需要更多信息。

偏见、风险和限制

需要更多信息。

伦理考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感用途：

需要更多信息。

训练细节

训练数据

该模型在Colossal Clean Crawled Corpus (C4)数据集上进行了遮蔽语言建模任务的训练，遵循与T5相同的过程。

训练过程

根据 original paper 的模型卡，该模型使用TPU v3或TPU v4 pod，在 t5x 代码库以及 jax 的支持下进行训练。

评估

测试数据、因素和指标

作者对模型进行了各种任务的评估，并与T5进行了比较。请参阅下表中的一些定量评估结果：如需详细信息，请查看 research paper 。

结果

有关切换Transformers的完整结果，请参阅 research paper ，表5。

环境影响

可以使用 Machine Learning Impact calculator 在 Lacoste et al. (2019) 中提供的方法来估计碳排放量。

硬件类型：Google Cloud TPU Pods - TPU v3或TPU v4 | 芯片数量≥4。
使用的小时数：需要更多信息
云提供商：GCP
计算地区：需要更多信息
排放的碳量：需要更多信息

引用

BibTeX:

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

49.13 GB

切换Transformers Large-128专家的模型卡

目录

TL;DR

模型细节

模型描述

用法

使用PyTorch模型

在CPU上运行模型

在GPU上运行模型

使用不同精度在GPU上运行模型

用途

直接使用和下游使用

超出范围的使用

偏见、风险和限制

伦理考虑和风险

已知限制

敏感用途：

训练细节

训练数据

训练过程

评估

测试数据、因素和指标

结果

环境影响

引用