模型:

google/switch-base-8

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers Base - 8 experts的模型卡片

简介

Switch Transformers是一个基于混合专家（MoE）模型，经过掩蔽语言建模（MLM）任务训练。模型的架构类似于经典的T5模型，但将前馈层替换为包含"专家"MLP的稀疏MLP层。根据 original paper 的描述，该模型在训练速度上能够更快（具有可扩展性），同时在微调任务上表现优于T5模型。正如摘要的前几行所述：

我们通过在“巨大干净的抓取语料库”上预训练了万亿参数模型，实现了对T5-XXL模型的4倍加速。

免责声明：此模型卡片的内容由Hugging Face团队撰写，并且其中的部分内容是从 original paper 中复制粘贴的。

模型详细信息

模型描述

模型类型：语言模型
语言（NLP）：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
获取更多信息的资源：

使用方法

请注意，这些检查点是在掩蔽语言模型（MLM）任务上训练的。因此，这些检查点不能直接用于下游任务。您可以查看FLAN-T5以获取运行微调权重或自己微调MoE的示例脚本，详见 this notebook 。

以下是如何在transformers中使用该模型的示例脚本：

使用Pytorch模型

在CPU上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-8")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-8", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上使用不同精度运行模型

FP16点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-8", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-8", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

用途

直接使用和下游使用

有关详细信息，请参阅 research paper 。

超出范围的使用

需要更多信息。

偏差，风险和限制

需要更多信息。

伦理考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感用途：

需要更多信息。

训练细节

训练数据

该模型是在掩蔽语言建模任务上训练的，使用Colossal Clean Crawled Corpus (C4)数据集，遵循与T5相同的过程。

训练过程

根据 original paper 的模型卡片，该模型是在TPU v3或TPU v4的pods上训练的，使用 t5x 代码库和 jax 。

评估

测试数据、因素和指标

作者在各种任务上评估了该模型，并与T5进行了比较。请参阅下表中的一些定量评估结果：有关完整详情，请查看 research paper 。

结果

有关Switch Transformers的完整结果，请参阅 research paper 中的表5。

环境影响

可以使用 Machine Learning Impact calculator 和 Lacoste et al. (2019) 中提供的方法来估计碳排放量。

硬件类型：Google Cloud TPU工作站-TPU v3或TPU v4 | 芯片数量≥4。
使用小时数：需要更多信息
云提供商：GCP
计算区域：需要更多信息
碳排放量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

1.16 GB

Switch Transformers Base - 8 experts的模型卡片

目录

简介

模型详细信息

模型描述

使用方法

使用Pytorch模型

在CPU上运行模型

在GPU上运行模型

在GPU上使用不同精度运行模型

用途

直接使用和下游使用

超出范围的使用

偏差，风险和限制

伦理考虑和风险

已知限制

敏感用途：

训练细节

训练数据

训练过程

评估

测试数据、因素和指标

结果

环境影响

引用