模型:

google/switch-base-32

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers基础模型卡 - 32位专家

TL;DR

Switch Transformers是基于层次编码模型的混合专家模型，是在掩码语言模型（MLM）任务上进行训练。该模型的架构类似于经典的T5，但将前馈层替换为包含“专家”MLP的稀疏MLP层。根据摘要的前几行所述：

我们通过在“Colossal Clean Crawled Corpus”上训练万亿参数模型，提高了当前语言模型的规模，并实现了比T5-XXL模型快4倍的训练速度。

免责声明：此模型卡内容由Hugging Face团队编写，其中部分内容是从 original paper 复制粘贴而来的。

模型详情

模型描述

模型类型：语言模型
语言（自然语言处理）：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
获取更多信息的资源：

使用

请注意，这些检查点是在遮蔽语言建模（MLM）任务上进行训练的。因此，这些检查点不适用于下游任务。您可以查看FLAN-T5以运行微调后的权重，或根据 this notebook 自行微调您自己的MoE。

以下是使用transformers库中的模型的示例脚本：

使用Pytorch模型

在CPU上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-32")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-32")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-32")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-32", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上使用不同精度运行模型

FP16 点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-32")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-32", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-32")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-32", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

用途

直接用途和下游用途

有关详细信息，请参阅 research paper 。

超出范围的用途

需要更多信息。

偏见、风险和限制

需要更多信息。

伦理考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感用途：

需要更多信息。

训练详情

训练数据

该模型是在Colossal Clean Crawled Corpus（C4）数据集上进行遮蔽语言建模任务训练的，遵循与T5相同的过程。

训练过程

根据 original paper 的模型卡，模型是在TPU v3或TPU v4 Pod上使用 t5x 代码库和 jax 进行训练的。

评估

测试数据、因素和指标

作者对模型进行了各种任务的评估，并与T5进行了比较。请参阅下表以获取一些定量评估结果：有关完整详情，请查看 research paper 。

结果

有关Switch Transformers的完整结果，请参阅 research paper 的第5表。

环境影响

可以使用 Machine Learning Impact calculator 和 Lacoste et al. (2019) 中提供的方法来估计碳排放量。

硬件类型：Google Cloud TPU Pods - TPU v3或TPU v4 | 芯片数≥4.
使用小时数：需要更多信息
云提供商：GCP
计算区域：需要更多信息
排放碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

7.37 GB

Switch Transformers基础模型卡 - 32位专家

目录

TL;DR

模型详情

模型描述

使用

使用Pytorch模型

在CPU上运行模型

在GPU上运行模型

在GPU上使用不同精度运行模型

用途

直接用途和下游用途

超出范围的用途

偏见、风险和限制

伦理考虑和风险

已知限制

敏感用途：

训练详情

训练数据

训练过程

评估

测试数据、因素和指标

结果

环境影响

引用