模型:

google/flan-t5-xxl

任务:

类库:

PyTorch TensorFlow JAX Safetensors Transformers

数据集:

svakulenk0/qrecc taskmaster2 djaym7/wiki_dialog deepmind/code_contests lambada gsm8k aqua_rat esnli quasc qed 3Aquasc 3Aqed 3Aesnli 3Aaqua_rat 3Agsm8k 3Alambada 3Adeepmind/code_contests 3Adjaym7/wiki_dialog 3Ataskmaster2 3Asvakulenk0/qrecc

语言:

其他:

t5 AutoTrain Compatible text-generation-inference

预印本库:

arxiv:2210.11416 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

FLAN-T5 XXL模型卡

TL;DR

如果您已经了解T5，FLAN-T5在所有方面都更加出色。尽管参数数量相同，但这些模型在1000多个额外任务上进行了微调，涵盖了更多的语言。如摘要的前几行所述：

Flan-PaLM 540B在几个基准测试中取得了最新的成绩，例如在五轮MMLU中达到了75.2%。我们还公开发布了Flan-T5检查点，1相比于更大的模型（如PaLM 62B），Flan-T5在少样本的情况下仍表现出色。总的来说，指导微调是一种改善预训练语言模型性能和可用性的通用方法。

免责声明：本模型卡的内容由Hugging Face团队编写，其中的部分内容来自 T5 model card 。

模型详情

模型描述

模型类型：语言模型
语言（NLP）：英语、德语、法语
许可证：Apache 2.0
相关模型： All FLAN-T5 Checkpoints
原始检查点： All Original FLAN-T5 Checkpoints
更多信息资源：

用法

以下是使用transformers库中该模型的一些示例脚本：

使用Pytorch模型

在CPU上运行模型

点击展开

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上使用不同精度运行模型

FP16 点击展开

# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", torch_dtype=torch.float16)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8 点击展开

# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", load_in_8bit=True)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

用途

直接使用和下游使用

作者在 the original paper's model card 中写道：

主要用途是语言模型的研究，包括：零样本自然语言处理任务、上下文少样本学习自然语言处理任务（例如推理和问答）、推动公平性和安全性研究，并了解当前大型语言模型的限制

有关更多详细信息，请参阅 research paper 。

超出范围的用途

需要更多信息。

偏见、风险和限制

以下信息来自模型的 official model card ：

如Rae等人（2021）所述，包括Flan-T5在内的语言模型可能会以有害的方式用于语言生成。在未事先评估特定应用程序的安全性和公平性问题之前，不应直接在任何应用程序中使用Flan-T5。

道德考虑和风险

Flan-T5在大规模文本数据上进行了微调，这些数据未经过滤以排除明确的内容或评估现有的偏见。因此，模型本身有可能生成同样不适当的内容或复制底层数据中的固有偏见。

已知限制

Flan-T5尚未在实际应用中进行测试。

敏感用途

Flan-T5不应用于任何不可接受的用例，例如生成滥用言论。

训练详情

训练数据

该模型在多个任务上进行了训练，包括论文中描述的任务（原始论文、图2）：

训练过程

根据 original paper 中的模型卡：

这些模型基于预训练的T5（Raffel等人，2020），并通过指导进行微调，以获得更好的零样本和少样本性能。每种T5模型大小对应一个经过微调的Flan模型。

该模型是使用TPU v3或TPU v4 pod，在 t5x 代码库与 jax 一起进行训练的。

评估

测试数据、因素和指标

作者在多个任务和多种语言（共1836个）上评估了模型。以下是一些定量评估的表格：。有关完整详细信息，请查看 research paper 。

结果

有关FLAN-T5-XXL的完整结果，请参阅 research paper 中的表3。

环境影响

可以使用 Machine Learning Impact calculator 中介绍的方法估算碳排放量。

硬件类型：Google Cloud TPU Pods - TPU v3或TPU v4 | 数量≥ 4.
使用小时数：需要更多信息
云服务提供商：GCP
计算区域：需要更多信息
排放的碳量：需要更多信息

引用

BibTeX：

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  
  url = {https://arxiv.org/abs/2210.11416},
  
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Scaling Instruction-Finetuned Language Models},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

作者:

Google AI

数据集大小:

166.91 GB