模型:
bigscience/bloomz-7b1-p3
我们提出了BLOOMZ & mT0模型系列,这是一组能够零-shot跟随人类指令的模型,可处理几十种语言。我们使用BLOOM和mT5预训练的多语言语言模型,在我们的跨语言任务混合(xP3)上进行微调,发现所得到的模型能够在未见过的任务和语言上实现跨语言泛化。
Multitask finetuned on 1239321 . Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | 12310321 | 12311321 | 12312321 | 12313321 | 12314321 | 12315321 | 12316321 | 12317321 | 12318321 | 12319321 | 12320321 |
Multitask finetuned on 12321321 . Recommended for prompting in non-English. | |||||||||||
Finetuned Model | 12322321 | 12323321 | 12324321 | ||||||||
Multitask finetuned on 12325321 . Released for research purposes only. Strictly inferior to above models! | |||||||||||
Finetuned Model | 12326321 | 12327321 | 12328321 | ||||||||
Original pretrained checkpoints. Not recommended. | |||||||||||
Pretrained Model | 12329321 | 12330321 | 12331321 | 12332321 | 12333321 | 12334321 | 12335321 | 12336321 | 12337321 | 12338321 | 12339321 |
我们建议使用该模型执行用自然语言表达的任务。例如,给定提示语“翻译成英语:Je t’aime”,模型很可能会回答“我爱你”。以下是一些来自我们论文的提示的例子:
欢迎在社区页面分享生成的结果!
# pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz-7b1-p3" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz-7b1-p3" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto") inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate bitsandbytes from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz-7b1-p3" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
提示工程:性能可能因提示而异。对于BLOOMZ模型,我们建议在输入终止时清楚地表明,以避免模型试图继续处理。例如,提示“翻译成英语:Je t'aime”如果没有句号(.)结尾,可能导致模型试图继续翻译法语句子。更好的提示可以是“翻译成英语:Je t'aime.”,“翻译成英语:Je t'aime.翻译:”或者“Je t'aime.翻译成英语是什么?”,这样对于模型来说很清楚何时回答。此外,我们建议尽可能提供给模型更多的上下文。例如,如果您希望它用泰卢固语回答,请告诉模型“用泰卢固语的一句话解释神经网络中的反向传播是什么。”。
我们参考了我们的论文 paper 和 bigscience/evaluation-results 中的表7,其中显示了对未见任务的zero-shot结果。侧边栏报告了每个数据集配置的最佳提示的zero-shot性能。
@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }