模型:

bigscience/bloomz-1b1

任务:

文本生成

类库:

PyTorch TensorBoard Safetensors Transformers

数据集:

bigscience/xP3 3Abigscience/xP3

语言:

其他:

bloom Eval Results text-generation-inference

预印本库:

arxiv:2211.01786

许可:

bigscience-bloom-rail-1.0

模型介绍文件清单

英文

模型摘要

我们提出了BLOOMZ & mT0模型系列，能够在多种语言中零-shot跟随人类指令。我们对BLOOM和mT5进行微调，预训练的多语言语言模型在我们的跨语言任务混合（xP3）上，并发现得到的模型能够在未见任务和语言上具有跨语言泛化能力。

存储库： bigscience-workshop/xmtf
论文： Crosslingual Generalization through Multitask Finetuning
联系人： Niklas Muennighoff
语言：预训练和微调语言比例，请参阅 bloom 。它可以理解预训练和微调语言。
BLOOMZ & mT0 模型家族：

Multitask finetuned on 1239321 . Recommended for prompting in English.
Parameters	300M	580M	1.2B	3.7B	13B	560M	1.1B	1.7B	3B	7.1B	176B
Finetuned Model	12310321	12311321	12312321	12313321	12314321	12315321	12316321	12317321	12318321	12319321	12320321
Multitask finetuned on 12321321 . Recommended for prompting in non-English.
Finetuned Model	12322321	12323321	12324321
Multitask finetuned on 12325321 . Released for research purposes only. Strictly inferior to above models!
Finetuned Model	12326321	12327321	12328321
Original pretrained checkpoints. Not recommended.
Pretrained Model	12329321	12330321	12331321	12332321	12333321	12334321	12335321	12336321	12337321	12338321	12339321

使用

预期用途

我们建议使用模型来执行自然语言表达的任务。例如，给定提示“将其翻译成英语：Je t’aime。”，模型很可能会回答“我爱你”。我们论文中的一些提示点子：

一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?
Suggest at least five related search terms to "Mạng neural nhân tạo".
Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is "Heroes Come in All Shapes and Sizes". Story (in Spanish):
Explain in a sentence in Telugu what is backpropagation in neural networks.

欢迎在社区标签中分享您的生成内容！

如何使用

CPU

点击展开

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-1b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

GPU

点击展开

# pip install -q transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-1b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

8位GPU

点击展开

# pip install -q transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-1b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True)

inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

限制

提示工程：性能可能因提示而异。对于BLOOMZ模型，我们建议在输入结束时清楚地指示，以避免模型尝试继续句子。例如，没有末尾的句点（.）的提示“将其翻译成英语：Je t'aime”可能导致模型尝试继续法语句子。更好的提示示例包括“将其翻译成英语：Je t'aime。”，“将其翻译成英语：Je t'aime。翻译：”，“Je t'aime.”在英语中是什么意思？在这些提示中，模型可以清楚地知道应该回答的时间点。此外，我们建议尽可能地提供模型所需的上下文。例如，如果您希望它用泰卢固语回答，请告诉模型，例如“用泰卢固语用一句话解释神经网络中的反向传播是什么。”。

训练

模型

架构：与 bloom-1b1 相同，还可以参考config.json文件
微调步骤： 250
微调标记： 5.02亿
微调布局： 1x流水线并行，1x张量并行，1x数据并行
精度：浮点数16位

硬件

CPU：每个节点具有512GB内存的AMD CPU
GPU：使用NVLink 4个互连GPU连接和4个OmniPath链接的64个A100 80GB GPU，每个节点8个GPU（8个节点）
通信：具有完全专用子网的NCCL通信网络

软件

编排： Megatron-DeepSpeed
优化器和并行化： DeepSpeed
神经网络： PyTorch （基于pytorch-1.11 w/ CUDA-11.5）
如适用，使用FP16： apex

评估

我们参考我们的 paper 和 bigscience/evaluation-results 中的表 7，以获得未见任务的零-shot结果。侧边栏报告了每个数据集配置的最佳提示的零-shot性能。

引用

@article{muennighoff2022crosslingual,
  title={Crosslingual generalization through multitask finetuning},
  author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},
  journal={arXiv preprint arXiv:2211.01786},
  year={2022}
}

作者:

BigScience Workshop

数据集大小:

4.01 GB

目录

模型摘要

使用

预期用途

如何使用

CPU

GPU

8位GPU

限制

训练

模型

硬件

软件

评估

引用