模型:

google/switch-base-16

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers Base - 16 experts的模型卡片

内容目录

简介

模型详情

用法

应用

偏见、风险和限制

训练详情

评估

环境影响

引用

模型卡片作者

简介

Switch Transformers是一个在掩码语言建模（MLM）任务上进行训练的专家混合（MoE）模型。该模型的架构与经典的T5模型类似，但将前馈层替换为包含"experts" MLP的稀疏MLP层。根据 original paper 的报道，该模型实现了更快的训练（具有可扩展性），并且在精细调整任务上优于T5。正如摘要的前几行所提到的：

我们通过在“Colossal Clean Crawled Corpus”上预训练千亿参数模型，实现了相对于T5-XXL模型的4倍加速。

注意：本模型卡片的内容由Hugging Face团队撰写，并部分从 original paper 进行了复制粘贴。

模型详情

模型描述

模型类型：语言模型
自然语言处理语言：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
获取更多信息的资源：

用法

请注意，这些检查点是在掩码语言建模（MLM）任务上进行训练的。因此，这些检查点不适用于下游任务的“即插即用”。您可能希望查看FLAN-T5以运行微调的权重，或根据 this notebook 自己微调您自己的MoE。

以下是如何在transformers库中使用该模型的一些示例脚本：

使用Pytorch模型

在CPU上运行模型

点击扩展

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-16")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-16")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击扩展

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-16")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-16", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上使用不同精度运行模型

FP16 点击扩展

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-16")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-16", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击扩展

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-16")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-16", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

应用

直接使用和下游应用

有关更多详细信息，请参阅 research paper 。

超出范围的应用

需要更多信息。

偏见、风险和限制

需要更多信息。

道德考虑和风险

需要更多信息。

已知的限制

需要更多信息。

敏感使用：

需要更多信息。

训练详情

训练数据

该模型在Colossal Clean Crawled Corpus（C4）数据集上进行了掩码语言建模任务的训练，遵循与T5相同的过程。

训练过程

根据 original paper 的模型卡片，该模型使用TPU v3或TPU v4 pods进行训练，使用 t5x 代码库和 jax 进行训练。

评估

测试数据、因素和指标

作者对模型进行了各种任务的评估，并与T5进行了比较。请参阅下表以了解一些定量评估的结果：。有关完整详情，请查看 research paper 。

结果

有关Switch Transformers的完整结果，请参阅 research paper ，表5。

环境影响

可以使用 Machine Learning Impact calculator 和 Lacoste et al. (2019) 中提供的方法估计碳排放量。

硬件类型：Google Cloud TPU Pods - TPU v3或TPU v4 | 芯片数量≥4。
使用时间：需要更多信息
云服务提供商：GCP
计算区域：需要更多信息
排放的碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

4 GB