模型:

google/switch-base-256

任务:

文生文

类库:

PyTorch Transformers

数据集:

c4 3Ac4

语言:

其他:

switch_transformers AutoTrain Compatible

预印本库:

arxiv:2101.03961 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

Switch Transformers Base - 256 专家模型卡

TL;DR

Switch Transformers 是一个在遮蔽语言建模（MLM）任务上训练的专家组合 (MoE) 模型。模型架构类似于经典的 T5，但是通过包含“专家” MLP 的稀疏 MLP 替换了前馈层。根据 original paper ，该模型在更好地执行微调任务的同时，实现了更快的训练（扩展性能）。正如摘要的前几行所提到的：

通过在“巨大干净抓取语料库”上预训练万亿参数模型，我们扩展了当前的语言模型规模，并实现了对 T5-XXL 模型的四倍加速。

免责声明：模型卡的内容是由 Hugging Face 团队编写的，其中部分内容是从 original paper 复制粘贴过来的。

模型详情

模型描述

模型类型：语言模型
语言（NLP）：英语
许可证：Apache 2.0
相关模型： All Switch Transformers Checkpoints
原始检查点： All Original Switch Transformers Checkpoints
获取更多信息的资源：

使用

请注意，这些检查点是在遮蔽语言建模（MLM）任务上进行训练的。因此，这些检查点不是“立即可用”于下游任务。您可以查看 FLAN-T5 以运行经过微调的权重，或根据 this notebook 自行微调自己的 MoE。

以下是一些如何在 transformers 中使用模型的示例脚本：

使用 Pytorch 模型

在 CPU 上运行模型

点击展开

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-256")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-256")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在 GPU 上运行模型

点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-256")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-256", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

在 GPU 上使用不同精度运行模型

FP16 点击展开

# pip install accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-256")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-256", device_map="auto", torch_dtype=torch.float16)

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-256")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-256", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>

用途

直接使用和下游应用

详细信息请参见 research paper 。

超出范围的使用

需要更多信息。

偏见、风险和局限性

需要更多信息。

伦理考虑和风险

需要更多信息。

已知限制

需要更多信息。

敏感使用：

需要更多信息。

训练详情

训练数据

该模型在“巨大干净抓取语料库”（C4 数据集）上进行了遮蔽语言建模任务的训练，遵循与 T5 相同的过程。

训练过程

根据 original paper 的模型卡，该模型使用 TPU v3 或 TPU v4 pods，在 t5x 代码库和 jax 的共同支持下进行了训练。

评估

测试数据、因素和指标

作者在多个任务上评估了模型，并将结果与 T5 进行了比较。请参阅下表以了解一些定量评估结果：。有关完整详细信息，请查阅 research paper 。

结果

有关 Switch Transformers 的完整结果，请参见 research paper ，表格 5。

环境影响

可以使用 Machine Learning Impact calculator 来估计碳排放量，该工具在 Lacoste et al. (2019) 中介绍。

硬件类型：Google Cloud TPU Pods - TPU v3 或 TPU v4 | 芯片数量 ≥ 4。
使用时间：需要更多信息
云服务提供商：GCP
计算区域：需要更多信息
排放的碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  author = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
  
  publisher = {arXiv},
  
  year = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

作者:

Google AI

数据集大小:

54.82 GB

Switch Transformers Base - 256 专家模型卡

目录

TL;DR

模型详情

模型描述

使用

使用 Pytorch 模型

在 CPU 上运行模型

在 GPU 上运行模型

在 GPU 上使用不同精度运行模型

用途

直接使用和下游应用

超出范围的使用

偏见、风险和局限性

伦理考虑和风险

已知限制

敏感使用：

训练详情

训练数据

训练过程

评估

测试数据、因素和指标

结果

环境影响

引用