模型:
bigscience/mt0-xxl-mt
我们提出了BLOOMZ & mT0模型系列,这是一组能够在数十种语言中跨零翻译任务的模型。我们在我们的跨语言任务混合(xP3)上微调了BLOOM和mT5预训练的多语言语言模型,并发现我们的结果模型能够实现对未见任务和语言的跨语言泛化。
Multitask finetuned on 1239321 . Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | 12310321 | 12311321 | 12312321 | 12313321 | 12314321 | 12315321 | 12316321 | 12317321 | 12318321 | 12319321 | 12320321 |
Multitask finetuned on 12321321 . Recommended for prompting in non-English. | |||||||||||
Finetuned Model | 12322321 | 12323321 | 12324321 | ||||||||
Multitask finetuned on 12325321 . Released for research purposes only. Strictly inferior to above models! | |||||||||||
Finetuned Model | 12326321 | 12327321 | 12328321 | ||||||||
Original pretrained checkpoints. Not recommended. | |||||||||||
Pretrained Model | 12329321 | 12330321 | 12331321 | 12332321 | 12333321 | 12334321 | 12335321 | 12336321 | 12337321 | 12338321 | 12339321 |
我们建议使用该模型来执行以自然语言表达的任务。例如,给定提示“ 将其翻译成英语: Je t’aime. ”,该模型很可能会回答“ 我爱你。 ”。以下是一些来自我们论文的提示思路:
欢迎在“社区”选项卡中分享你的生成结果!
# pip install -q transformers from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-xxl-mt" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-xxl-mt" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto") inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
# pip install -q transformers accelerate bitsandbytes from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "bigscience/mt0-xxl-mt" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True) inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
提示工程:性能可能会因提示的不同而有所变化。对于BLOOMZ模型,我们建议清楚地表明输入何时结束,以避免模型试图继续。例如,没有末尾的句号(.)的提示“ 将其翻译成英语: Je t'aime ”可能导致模型试图继续法语句子。更好的提示是“ 将其翻译成英语: Je t'aime. ”,“ 将其翻译成英语: Je t'aime. 翻译: ”,“ “Je t'aime.” 在英语中是什么? ”等,这样模型就清楚何时应该回答。此外,我们建议尽可能为模型提供更多上下文。例如,如果你希望它用泰卢固语回答,则告诉模型,例如“ 用泰卢固语用一句话解释神经网络中的反向传播是什么。 ”。
我们参考我们的论文中的表格7来报告在未见任务上的零翻译结果。侧边栏报告了每个数据集配置的最佳提示的零翻译性能。
@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }