模型:

google/switch-base-128

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers Base - 128 experts模型卡

TL;DR

Switch Transformers是一个基于混合专家（MoE）模型，是在掩蔽语言建模（MLM）任务上进行训练的。该模型的架构类似于经典的T5模型，但是将前馈层替换为包含“专家”MLP（稀疏多层感知机）的Switch MLP层。根据 original paper ，该模型在训练速度（扩展性）方面更快，同时在微调任务上比T5表现更好。正如摘要的前几行所述：

我们通过对“Colossal Clean Crawled Corpus”进行训练，将当前语言模型的规模提升到万亿参数级别，并在T5-XXL模型上实现了4倍的加速。

免责声明：本模型卡的内容由Hugging Face团队撰写，并且其中的部分内容来自 original paper 。

模型详情

模型描述

模型类型：语言模型
自然语言处理语言：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
获取更多信息的资源：

使用方法

请注意，这些检查点是在掩蔽语言建模（MLM）任务上训练的。因此，这些检查点不是“即插即用”的用于下游任务的模型。您可能希望查看FLAN-T5以运行微调权重，或者按照 this notebook 自行微调您自己的MoE。

以下是如何在transformers中使用该模型的一些示例脚本：

使用PyTorch模型

在CPU上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

使用不同精度在GPU上运行模型

FP16 点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

应用领域

直接使用和下游使用

有关更多详细信息，请参阅 research paper 。

不适用范围

需要更多信息。

偏见、风险和限制

需要更多信息。

道德考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感用途：

需要更多信息。

训练细节

训练数据

该模型是在Colossal Clean Crawled Corpus（C4）数据集上进行掩蔽语言建模任务训练的，遵循与T5相同的过程。

训练过程

根据 original paper 模型卡，该模型在TPU v3或TPU v4的机架上，使用 t5x 代码库和 jax 进行了训练。

评估

测试数据、因素和指标

作者在各种任务上对模型进行了评估，并将结果与T5进行了比较。请参阅下表以获取一些定量评估结果：。有关详细信息，请查看 research paper 。

结果

对于Switch Transformers的完整结果，请参阅 research paper ，表5。

环境影响

可以使用 Machine Learning Impact calculator 中提到的方法在 Lacoste et al. (2019) 中估算碳排放量。

硬件类型：Google Cloud TPU Pod - TPU v3或TPU v4 | 芯片数量≥4。
使用小时数：需要更多信息
云服务提供商：GCP
计算区域：需要更多信息
排放碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

27.81 GB

Switch Transformers Base - 128 experts模型卡

目录

TL;DR

模型详情

模型描述

使用方法

使用PyTorch模型

在CPU上运行模型

在GPU上运行模型

使用不同精度在GPU上运行模型

应用领域

直接使用和下游使用

不适用范围

偏见、风险和限制

道德考虑和风险

已知限制

敏感用途：

训练细节

训练数据

训练过程

评估

测试数据、因素和指标

结果

环境影响

引用