模型:
bigscience/mt0-small
我们提供了BLOOMZ & mT0模型系列,能够在许多语言中进行零-shot遵循人类指令。我们对BLOOM & mT5预训练的多语言模型在我们的跨语言任务混合(xP3)上进行微调,并发现我们得到的模型能够在看不见的任务和语言上进行跨语言的泛化。
Multitask finetuned on 1239321 . Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | 12310321 | 12311321 | 12312321 | 12313321 | 12314321 | 12315321 | 12316321 | 12317321 | 12318321 | 12319321 | 12320321 |
Multitask finetuned on 12321321 . Recommended for prompting in non-English. | |||||||||||
Finetuned Model | 12322321 | 12323321 | 12324321 | ||||||||
Multitask finetuned on 12325321 . Released for research purposes only. Strictly inferior to above models! | |||||||||||
Finetuned Model | 12326321 | 12327321 | 12328321 | ||||||||
Original pretrained checkpoints. Not recommended. | |||||||||||
Pretrained Model | 12329321 | 12330321 | 12331321 | 12332321 | 12333321 | 12334321 | 12335321 | 12336321 | 12337321 | 12338321 | 12339321 |
我们建议使用该模型来执行通过自然语言表达的任务。例如,给定提示“ 翻译成英文:Je t’aime. ”,该模型很可能会回答“ I love you. ”。下面是一些来自我们论文的提示创意:
欢迎在社区标签中分享你的生成结果!
# pip install -q transformers from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-small" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-small" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto") inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate bitsandbytes from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-small" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
提示工程:性能可能因提示而异。对于BLOOMZ模型,我们建议清楚地表明输入何时结束,以避免模型尝试继续输入。例如,提示“ 翻译成英文:Je t'aime ”如果没有句号(.)在末尾,可能导致模型尝试继续翻译法语句子。更好的提示可以是“ 翻译成英文:Je t'aime. ”,“ 翻译成英文:Je t'aime. 翻译: ”或“ "Je t'aime." 用英文怎么说? ”,这样对于模型来说很明确何时应该回答。此外,我们建议尽可能为模型提供更多上下文。例如,如果您希望它用泰卢固语回答,那么告诉模型,例如“ 用一句话用泰卢固语解释神经网络中的反向传播是什么。 ”。
我们参考我们的 paper & bigscience/evaluation-results 中的表7,以获得在看不见的任务上的零-shot结果。侧边栏报告了每个数据集配置下的最佳提示的零-shot性能。
@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }