模型:

google/switch-base-64

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers Base - 64 专家模型卡片

内容目录

TL;DR

模型详情

使用方法

应用

偏见、风险和限制

训练详情

评估

环境影响

引用

模型卡片作者

TL;DR

Switch Transformers 是一个基于混合专家 (MoE) 模型，以掩码语言建模（MLM）任务进行训练。这个模型的架构类似于经典的T5模型，但是将前馈层替换为包含“专家”MLP的稀疏MLP层。根据 original paper 所说，该模型在训练速度（可扩展的特性）上比T5更快，并在微调任务上具有更好的表现。正如摘要中的前几句所说：

我们通过在“巨大的干净爬行语料库”上预训练万亿参数模型，实现了对T5-XXL模型的4倍加速。

免责声明：该模型卡片的内容由Hugging Face团队撰写，其中的部分内容来自于 original paper 的复制粘贴。

模型详情

模型描述

模型类型：语言模型
语言：英语
许可证： Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
更多信息的资源：

使用方法

请注意，这些检查点是在掩码语言建模（MLM）任务上训练的。因此，这些检查点不是“即用型”的用于下游任务。您可能希望查看 FLAN-T5 以运行经过微调的权重，或按照 this notebook 自行微调自己的混合专家模型。

下面是如何在 transformers 中使用模型的一些示例脚本：

使用Pytorch模型

在CPU上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-64")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-64")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-64")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-64", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上使用不同精度运行模型

FP16 点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-64")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-64", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-64")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-64", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

应用

直接使用和下游应用

有关详细信息，请参阅 research paper 。

不适用范围

需要更多信息。

偏见、风险和限制

需要更多信息。

伦理考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感使用：

需要更多信息。

训练详情

训练数据

该模型是在“巨大的干净爬行语料库”（C4）数据集上进行掩码语言建模训练的，遵循与 T5 相同的过程。

训练过程

根据 original paper 的模型卡片所述，该模型使用 TPU v3 或 TPU v4 pods，在 t5x 代码库和 jax 的协同作用下进行训练。

评估

测试数据、因素和指标

作者对模型在各种任务上进行了评估，并与 T5 进行了比较。请参阅下表以获取一些定量评估结果：。详细内容请查看 research paper 。

结果

有关 Switch Transformers 的完整结果，请参阅 research paper ，表5。

环境影响

可以使用 Machine Learning Impact calculator 中提供的方法在 Lacoste et al. (2019) 中估计碳排放量。

硬件类型： Google Cloud TPU Pods - TPU v3 或 TPU v4 | 至少4个芯片
使用小时数：需要更多信息
云服务提供商： GCP
计算区域：需要更多信息
排放碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

14.22 GB