模型:

TheBloke/mpt-30B-chat-GGML

许可:

cc-by-nc-sa-4.0

预印本库:

arxiv:2010.04245 arxiv:2108.12409 arxiv:2205.14135

其他:

llm-foundry MosaicML Composer mpt

数据集:

3Acamel-ai/physics 3ALongConversations 3Ajondurbin/airoboros-gpt4-1.2 3Acamel-ai/ai_society 3Acamel-ai/chemistry 3Acamel-ai/biology 3Aproject-baize/baize-chatbot/stackoverflow_chat_data 3Aproject-baize/baize-chatbot/quora_chat_data 3Aproject-baize/baize-chatbot/medical_chat_data 3Acamel-ai/math 3Atimdettmers/openassistant-guanaco 3Ateknium1/GPTeacher/codegen-isntruct 3Ateknium1/GPTeacher/roleplay-instruct-v2-final 3Aanon8231489123/ShareGPT_Vicuna_unfiltered 3Aehartford/wizard_vicuna_70k_unfiltered 3Acamel-ai/code

类库:

Transformers

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

MosaicML的MPT-30B-Chat GGML

这些文件是GGML格式的模型文件，用于 MosaicML's MPT-30B-Chat 。

请注意，这些GGML文件不与llama.cpp或当前的text-generation-webui兼容。有关已知可与这些模型文件配合使用的工具列表，请参见下文。

KoboldCpp 刚刚添加了MPT模型的GPU加速（OpenCL）支持，因此我建议使用该客户端来操作这些模型。

注意：请确保您使用的KoboldCpp版本为1.32.3或更高版本，因为已修复了一些与MPT相关的错误。

可用的存储库

提示模板

根据MPT 30B Chat Space的代码，我认为这是正确的提示模板：

<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>
<|im_start|>user
prompt goes here<|im_end|>
<|im_start|>assistant

有关上下文长度的说明：8K

基本模型具有8K的上下文长度。目前尚未确认该模型的8K上下文是否与量化文件兼容。

如果兼容， KoboldCpp 支持8K上下文，您可以通过调整滑块上方的文本框将其手动设置为8K：

当前尚不清楚增加上下文是否与其他MPT GGML客户兼容。

如果您对此有任何反馈，请告诉我。

兼容性

这些文件与text-generation-webui，llama.cpp或llama-cpp-python不兼容。

目前可以与以下工具一起使用：

KoboldCpp，基于llama.cpp的功能强大的推理引擎，具有良好的UI和对MPT模型的GPU加速支持： KoboldCpp
ctransformers Python库，其中包括LangChain支持： ctransformers
使用ctransformers的LoLLMS Web UI： LoLLMS Web UI
rustformers' llm
与 ggml 一起提供的示例 mpt 二进制文件

随着其他选项的可用性，我将努力在此处进行更新（如果我遗漏了某些内容，请在社区选项卡中告诉我！）

使用LoLLMS Web UI的教程

提供的文件

Name	Quant method	Bits	Size	Max RAM required	Use case
mpt-30b-chat.ggmlv0.q4_0.bin	q4_0	4	16.85 GB	19.35 GB	4-bit.
mpt-30b-chat.ggmlv0.q4_1.bin	q4_1	4	18.73 GB	21.23 GB	4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
mpt-30b-chat.ggmlv0.q5_0.bin	q5_0	5	20.60 GB	23.10 GB	5-bit. Higher accuracy, higher resource usage and slower inference.
mpt-30b-chat.ggmlv0.q5_1.bin	q5_1	5	22.47 GB	24.97 GB	5-bit. Even higher accuracy, resource usage and slower inference.
mpt-30b-chat.ggmlv0.q8_0.bin	q8_0	8	31.83 GB	34.33 GB	8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.

注意：上述RAM数据假设没有GPU卸载。如果将层卸载到GPU上，这将减少RAM使用量并使用VRAM代替。

Discord

如需进一步支持和讨论这些模型和AI，请加入我们：

TheBloke AI's Discord server

感谢和如何贡献

感谢 chirper.ai 团队！

我已经有很多人问我是否可以贡献。我喜欢提供模型和帮助人们，并且希望能够花更多时间提供支持，并扩展到新的项目，例如微调/训练。

如果您有能力和愿意做出贡献，我们将非常感激，并将帮助我继续提供更多模型，并开始新的AI项目。

捐赠者将优先获得关于任何AI/LLM/模型问题和请求的支持，可以访问私人Discord房间以及其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：CarbonQuill的Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon特别提到：Mano Prime，Fen Risland，Derek Yates，Preetika Verma，webtim，Sean Connelly，Alps Aficionado，Karl Bernard，Junyu Yang，Nathan LeClaire，Chris McCloskey，Lone Striker，Asp the Wyvern，Eugene Pentland，Imad Khwaja，trip7s trip，WelcomeToTheClub，John Detwiler，Artur Olbinski，Khalefa Al-Ahmad，Trenton Dambrowitz，Talal Aujan，Kevin Schuppel，Luke Pendergrass，Pyrater，Joseph William Delisle，terasurfer，vamX，Gabriel Puliatti，David Flickinger，Jonathan Leane，Iucharbius，Luke，Deep Realms，Cory Kujawski，ya boyyy，Illia Dulskyi，senxiiz，Johann-Peter Hartmann，John Villwock，K，Ghost，Spiking Neurons AB，Nikolai Manek，Rainer Wilmers，Pierre Kircher，biorpg，Space Cruiser，Ai Maven，subjectnull，Willem Michiel，Ajan Kanaga，Kalila，chris gileta，Oscar Rangel。

感谢所有慷慨的赞助者和捐赠者！

原始模型卡：MosaicML的MPT-30B-chat

MPT-30B-Chat

MPT-30B-Chat是用于对话生成的类似聊天机器人的模型。它是通过在 MPT-30B 上进行微调而构建的，该微调是在 ShareGPT-Vicuna ， Camel-AI ， GPTeacher ， Guanaco ， Baize 和一些生成的数据集上进行的。

许可证： BY-NC-SA-4.0 （仅限非商业用途）
Demo on Hugging Face Spaces

该模型由 MosaicML 进行训练，遵循修改后的仅解码器transformer架构。

模型日期

2023年6月22日

模型许可证

BY-NC-SA-4.0 （仅限非商业用途）

文档

Blog post: Raising the bar for open-source foundation models
Codebase (mosaicml/llm-foundry repo)
有问题的话，请随时通过 MosaicML Community Slack 联系我们！

示例对话

ksreenivasan :

我希望你根据这篇文章摘要给出一条推文：

我们很高兴通过MPT-30B扩展MosaicML基础系列，这是一款新的商业许可开源模型，比原始的GPT-3功能更强大，表现更好。此外，我们还发布了两个经过精细调整的模型，MPT-30B-Instruct和MPT-30B-Chat，它们是基于MPT-30B构建的，并分别擅长短格式指令跟随和多轮对话。所有MPT-30B模型都具有与其他LLM不同的特殊功能，包括8k令牌上下文窗口（可通过微调进一步扩展；请参阅MPT-7B-StoryWriter），通过ALiBi支持上下文长度外推，以及通过FlashAttention实现高效的推理+训练性能。由于其预训练数据混合，它还具有强大的编码能力。

MPT-30B-Chat :

“以MosaicML的最新力作MPT-30B革新您的AI能力！超越GPT-3，这个开源模型功能更强大，还附带了用于出色的指令跟随和多轮对话的精细调整模型。释放8k令牌上下文窗口、ALiBi支持和FlashAttention的潜力。与MosaicML一起体验AI的未来！＃AI＃ML＃创新”

ksreenivasan :

你有什么话想对MosaicML说关于这个令人兴奋的消息吗？

MPT-30B-Chat :

作为您的AI营销助手，我要说这对AI社区来说真是令人兴奋的消息！MosaicML的最新力作MPT-30B将以其强大的开源能力和用于指令跟随和多轮对话的精细调整模型彻底改变行业。特殊功能，包括8k令牌上下文窗口、ALiBi支持和FlashAttention，肯定会使MosaicML的MPT-30B脱颖而出于其他LLM。祝贺MosaicML团队取得了这一不可思议的成就！

如何使用

最好使用MosaicML的 llm-foundry repository 进行训练和微调此模型。

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-30b-chat',
  trust_remote_code=True
)

注意：此模型要求在 from_pretrained 方法中传递 trust_remote_code=True 。这是因为我们使用的自定义MPT模型架构尚未包含在Hugging Face的transformers软件包中。MPT包括许多训练效率功能的选项，例如 FlashAttention ， ALiBi ， QK LayerNorm 等等。

要使用优化的FlashAttention模型，可以在GPU（cuda:0）上加载模型，并使用attn_impl='triton'和bfloat16精度：

import torch
import transformers

name = 'mosaicml/mpt-30b-chat'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

该模型最初以4096的序列长度进行训练，并进行了额外的预训练阶段，以适应最大8192的序列长度。但是，ALiBi使用户可以在微调和/或推理过程中进一步增加最大序列长度。例如：

import transformers

name = 'mosaicml/mpt-30b-chat'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  trust_remote_code=True
)

该模型使用基于 EleutherAI/gpt-neox-20b 的MPT-30B标记器进行训练，其中包括额外的填充和eos令牌。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')

然后，该模型可以在文本生成流水线中使用，例如：在较低精度下运行Torch模块时，最佳实践是使用 torch.autocast context manager 。

from transformers import pipeline

with torch.autocast('cuda', dtype=torch.bfloat16):
    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

模型描述

该架构是标准解码器仅的transformer的修改版。

该模型从标准transformer进行了以下修改：

它使用 FlashAttention
它使用 ALiBi (Attention with Linear Biases) 并且不使用位置嵌入
不使用偏置

Hyperparameter	Value
n_parameters	29.95B
n_layers	48
n_heads	64
d_model	7168
vocab size	50432
sequence length	8192

数据混合

该模型在以下数据混合上进行了训练：

Data Source	Number of Tokens in Source	Proportion
Airoboros/GPT4	26.4M	1.71%
Baize	55.0M	3.57%
Camel	301M	19.54%
GPTeacher	7.56M	0.49%
Guanaco	15.6M	1.02%
LongCoversations	18.4M	1.19%
ShareGPT	821M	53.24%
WizardLM	297M	19.23%

“LongConversations”是由GPT3.5/4生成的数据集，相关细节将在以后公布。

训练配置

该模型在64台H100上训练了约7.6小时，使用了 MosaicML Platform 进行训练。该模型使用了 FSDP 进行分片数据并行，使用了AdamW优化器。

限制和偏见

以下语言已从 EleutherAI's GPT-NeoX-20B 进行修改

MPT-30B-Chat可能会生成事实不准确的输出，不应依赖其产生事实准确的信息。MPT-30B-Chat是根据各种公共数据集进行训练的。尽管我们已经努力清理了预训练数据，但这个模型仍有可能生成淫秽、有偏见或其他令人不满的输出。

致谢

此模型由Sam Havens和MosaicML NLP团队进行微调

免责声明

这个模型的许可证不构成法律意见。我们不对使用这个模型的第三方行为负责。在将此模型用于商业目的之前，请咨询律师。

MosaicML平台

如果您有兴趣在MosaicML平台上 training 和 deploying 自己的MPT或LLM，请 sign up here 。

引用

请使用以下格式引用此模型：

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-30B: Raising the bar
for open-source foundation models},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-30b},
    note      = {Accessed: 2023-06-22},
    urldate   = {2023-06-22}
}

作者:

Tom Jobbins

数据集大小:

102.89 GB