模型:

mosaicml/mpt-30b-chat

许可:

cc-by-nc-sa-4.0

预印本库:

arxiv:2010.04245 arxiv:2108.12409 arxiv:2205.14135

其他:

llm-foundry MosaicML Composer mpt custom_code

数据集:

3Acamel-ai/physics 3ALongConversations 3Ajondurbin/airoboros-gpt4-1.2 3Acamel-ai/ai_society 3Acamel-ai/chemistry 3Acamel-ai/biology 3Aproject-baize/baize-chatbot/stackoverflow_chat_data 3Aproject-baize/baize-chatbot/quora_chat_data 3Aproject-baize/baize-chatbot/medical_chat_data 3Acamel-ai/math 3Atimdettmers/openassistant-guanaco 3Ateknium1/GPTeacher/codegen-isntruct 3Ateknium1/GPTeacher/roleplay-instruct-v2-final 3Aanon8231489123/ShareGPT_Vicuna_unfiltered 3Aehartford/wizard_vicuna_70k_unfiltered 3Acamel-ai/code

类库:

Transformers PyTorch

任务:

文本生成

模型介绍文件清单

英文

MPT-30B-Chat

MPT-30B-Chat是一个类似于聊天机器人的对话生成模型。它是通过在 ShareGPT-Vicuna 、 Camel-AI 、 GPTeacher 、 Guanaco 、 Baize 和一些生成的数据集上进行微调 MPT-30B 构建的。

许可证：CC-By-NC-SA-4.0（仅限非商业使用）
Demo on Hugging Face Spaces

该模型由 MosaicML 进行训练，并采用改进的仅解码器Transformer架构。

模型日期

2023年6月22日

模型许可证

CC-By-NC-SA-4.0（仅限非商业使用）

文档

Blog post: Raising the bar for open-source foundation models
Codebase (mosaicml/llm-foundry repo)
问题：请随时通过 MosaicML Community Slack 与我们联系！

示例对话

ksreenivasan：

我希望你能根据这篇文章摘要想出一条推特：

我们非常高兴通过MPT-30B扩展MosaicML Foundation系列，它是一种新的商业许可的开源模型，比原始的GPT-3更强大。此外，我们还发布了两个微调模型，MPT-30B-Instruct和MPT-30B-Chat，它们建立在MPT-30B之上，在短形式指导和多轮对话方面表现出色。所有MPT-30B模型都具有与其他LLM不同的特殊功能，包括8k标记上下文窗口（可以通过微调进一步扩展；请参阅MPT-7B-StoryWriter）、通过ALiBi支持上下文长度外推以及通过FlashAttention实现的高效推理+训练性能。它还具有强大的编码能力，这要归功于其预训练数据混合。

MPT-30B-Chat：

"通过MosaicML的最新力作MPT-30B，革新您的AI能力！超越GPT-3，这款开源模型更强大，并备有精细调整的模型，可实现出色的指导遵循和多轮对话。释放8k标记上下文窗口，ALiBi支持和FlashAttention的潜力。与MosaicML一起体验AI的未来！＃AI＃ML＃创新"

ksreenivasan：

对于这个令人兴奋的消息，你有什么想对MosaicML说的吗？

MPT-30B-Chat：

作为您的AI营销助手，我会说这对于AI社区来说真是令人兴奋的消息！ MosaicML的最新力作MPT-30B，凭借其强大的开源功能和用于指导遵循和多轮对话的精细调整模型，将彻底改变行业。特殊功能，包括8k标记上下文窗口，ALiBi支持和FlashAttention，必将使MosaicML的MPT-30B与其他LLM区分开来。祝贺MosaicML团队取得了这一非凡成就！

如何使用

此模型最适合与MosaicML进行训练和微调的 llm-foundry repository 一起使用。

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-30b-chat',
  trust_remote_code=True
)

注意：此模型需要将trust_remote_code = True传递给from_pretrained方法。这是因为我们使用了尚未包含在Hugging Face transformers软件包中的自定义MPT模型架构。 MPT包括许多训练效率功能的选项，例如 FlashAttention ， ALiBi ， QK LayerNorm 等。

要使用优化的FlashAttention的 triton implementation ，可以使用gpu（cuda：0）加载模型，并使用bfloat16精度：

import torch
import transformers

name = 'mosaicml/mpt-30b-chat'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

该模型最初使用2048的序列长度进行训练，并具有用于将序列长度适应到8192的额外预训练阶段。但是，ALiBi使用户能够在微调和/或推理期间进一步增加最大序列长度。例如：

import transformers

name = 'mosaicml/mpt-30b-chat'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  trust_remote_code=True
)

该模型使用基于 EleutherAI/gpt-neox-20b 标记工具的MPT-30B标记工具进行训练，其中包括额外的填充和eos令牌。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')

然后，模型可以在文本生成流水线中使用。注意：在较低精度运行Torch模块时，最佳实践是使用 torch.autocast context manager 。

from transformers import pipeline

with torch.autocast('cuda', dtype=torch.bfloat16):
    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

模型描述

该架构是标准解码器Transformer的修改版本。

该模型已从标准Transformer进行了以下修改：

它使用 FlashAttention
它使用 ALiBi (Attention with Linear Biases) ，并且不使用位置嵌入

它不使用偏置

Hyperparameter	Value
n_parameters	29.95B
n_layers	48
n_heads	64
d_model	7168
vocab size	50432
sequence length	8192

数据混合

该模型在以下数据混合上进行了训练：

Data Source	Number of Tokens in Source	Proportion
Airoboros/GPT4-1.2	26.4M	1.71%
Baize	55.0M	3.57%
Camel	301M	19.54%
GPTeacher	7.56M	0.49%
Guanaco	15.6M	1.02%
LongCoversations	18.4M	1.19%
ShareGPT	821M	53.24%
WizardLM	297M	19.23%

“LongConversations”是一个GPT3.5 / 4生成的数据集，其详细信息将在以后发布。

培训配置

此模型使用 MosaicML Platform 进行了大约7.6小时的64个H100的训练。该模型使用 FSDP 进行了碎片化数据并行处理，并使用AdamW优化器。

限制和偏见

以下语言修改自 EleutherAI's GPT-NeoX-20B

MPT-30B-Chat可能会产生事实不准确的输出，不能依赖它产生事实准确的信息。 MPT-30B-Chat是基于各种公开数据集进行训练的。尽管我们已经尽力清理预训练数据，但模型可能会产生淫荡，有偏见或其他冒犯性的输出。

致谢

此模型由Sam Havens和MosaicML NLP团队进行微调。

免责声明

此模型的许可证不构成法律建议。我们对使用此模型的第三方的行为不负责任。在商业用途之前，请咨询律师。

MosaicML平台

如果您有兴趣在MosaicML平台上训练和部署自己的MPT或LLM，请 sign up here 。

引文

请使用以下格式引用此模型：

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-30B: Raising the bar
for open-source foundation models},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-30b},
    note      = {Accessed: 2023-06-22},
    urldate   = {2023-06-22}
}

作者:

Mosaic ML, Inc.

数据集大小:

55.8 GB