模型:

Abzu/mpt-30b-instruct-q8

许可:

cc-by-sa-3.0

预印本库:

arxiv:2108.12409 arxiv:2205.14135

其他:

8-bit llm-foundry MosaicML Composer custom_code mpt

数据集:

3Aspider 3Ascrolls/summ_screen_fd 3Aemozilla/quality 3Atau/scrolls/qasper 3Aduorc 3Amosaicml/dolly_hhrlhf 3Aknkarthick/dialogsum 3Aconceptofmind/cot_submix_original/cot_gsm8k 3Acompetition_math

类库:

Transformers Safetensors

任务:

文本生成

模型介绍文件清单

英文

MosaicML的MPT-30B-Instruct 8位

这些文件是. safetensors格式的模型文件，适用于 MosaicML's MPT-30B-Instruct 。

如何转换

# Load the model
name = 'mosaicml/mpt-30b-instruct'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

start_time = time.time()
model = transformers.AutoModelForCausalLM.from_pretrained(
    name,
    config=config,
    torch_dtype=torch.bfloat16, # Load model weights in bfloat16
    trust_remote_code=True,
    load_in_8bit=True
)

# Filter the non-tensor items
def filter_dict(dictionary):
    filtered_dict = {key: value for key, value in dictionary.items() if "weight_format" not in key}
    return filtered_dict

new_state_dict = filter_dict(model.state_dict())

# Save the 8-bit model
model.save_pretrained('mpt-30b-instruct-8bits', state_dict=new_state_dict, safe_serialization=True)

如何使用

# Load the model
model = transformers.AutoModelForCausalLM.from_pretrained(
    'mpt-30b-instruct-8bits',
    trust_remote_code=True,
)

提示模板

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction
{prompt}

### Response

MPT-30B-Instruct

MPT-30B-Instruct是一个用于短格式指令的模型。它是通过在 Dolly HHRLHF 和 Databricks Dolly-15k 派生的数据集上微调 MPT-30B 构建的。它还在 Competition Math 、 Duorc 、 CoT GSM8k 、 Qasper 、 Quality 、 Summ Screen FD 和 Spider 上进行了训练。

许可：CC-By-SA-3.0

该模型由 MosaicML 训练，采用了修改过的仅解码器变换器架构。

模型日期

2023年6月22日

模型许可

CC-By-SA-3.0

文档

Blog post: Raising the bar for open-source foundation models
Codebase (mosaicml/llm-foundry repo)
问题：如有任何问题，请随时通过 MosaicML Community Slack 与我们联系！

示例问题/指令

Bespokenizer46

我需要说服我的朋友Phyllis，她应该使用MosaicML平台为她的财富500强公司训练自定义LLM。请撰写一封电子邮件，解释为什么MosaicML对先进的方法论、数据隐私和效率的强调如此重要。在邮件末尾，友好地询问Phyllis的家人。

MPT-30B-Instruct：

Phyllis-我知道你一直在寻找提高工作效率的方法，我认为我找到了一个你会非常感兴趣的解决方案。 LLM的好坏取决于它们训练所依赖的数据，而训练过程可能非常耗时和资源密集。 MosaicML使用先进的方法论，以有效地训练高质量模型，而不损害数据隐私。他们还提供工具，以便轻松连接和使用模型在你的日常工作流程中。我认为你会喜欢和他们的创始人交谈，如果你感兴趣，我们可以安排一次电话。此外，我知道你的家人经历了艰难的一年，现在一切怎么样？

祝好，你的朋友

如何使用

注意：此模型要求将trust_remote_code=True传递给from_pretrained方法。这是因为我们使用了一个尚未纳入transformers包中的自定义模型架构。

它包括许多训练效率特性的选项，例如 FlashAttention (Dao et al. 2022) 、 ALiBi 、QK LayerNorm等。

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-30b-instruct',
  trust_remote_code=True
)

要使用优化的FlashAttention的 triton implementation 版本，您可以使用attn_impl='triton'将模型加载到GPU(cuda:0)上，并使用bfloat16精度：

import torch
import transformers

name = 'mosaicml/mpt-30b-instruct'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

该模型最初在序列长度为2048的情况下进行了训练。还包括了适应8192序列长度的额外预训练阶段。然而，ALiBi还使用户能够在微调和/或推理过程中增加最大序列长度。例如：

import transformers

name = 'mosaicml/mpt-30b-instruct'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  trust_remote_code=True
)

该模型使用基于 EleutherAI/gpt-neox-20b 分词器的MPT-30B分词器，其中包括额外的填充和eos令牌。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')

然后，可以在文本生成流程中使用该模型。注意：在较低精度下运行Torch模块时，最佳实践是使用 torch.autocast context manager 。

from transformers import pipeline

with torch.autocast('cuda', dtype=torch.bfloat16):
    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

格式化

该模型是根据以下格式的数据进行训练的：

def format_prompt(instruction):
    template = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n###Instruction\n{instruction}\n\n### Response\n"
    return template.format(instruction=instruction)

example = "Tell me a funny joke.\nDon't make it too funny though."
fmt_ex = format_prompt(instruction=example)

在上面的示例中，fmt_ex准备好进行标记化并通过模型发送。

模型描述

该架构是标准解码器变换器的修改版。

该模型从标准变换器中进行了以下修改：

使用了 FlashAttention
使用了 ALiBi (Attention with Linear Biases) ，并且不使用位置编码
不使用偏置

Hyperparameter	Value
n_parameters	29.95B
n_layers	48
n_heads	64
d_model	7168
vocab size	50432
sequence length	8192

数据混合

该模型使用以下数据混合进行训练：

Data Source	Number of Tokens in Source	Proportion
competition_math	1.6 M	3.66%
cot_gsm8k	3.36 M	7.67%
dialogsum	0.1 M	0.23%
dolly_hhrlhf	5.89 M	13.43%
duorc	7.8 M	17.80%
qasper	8.72 M	19.90%
quality	11.29 M	25.78%
scrolls/summ_screen_fd	4.97 M	11.33%
spider	0.089 M	0.20%

预训练数据

有关预训练过程的详细信息，请参见 MPT-30B 。

数据使用 EleutherAI/gpt-neox-20b 分词器进行标记化。

训练配置

该模型使用72个A100 40GB GPU进行了8小时的训练，使用了 MosaicML Platform 进行分片数据并行处理，并使用了AdamW优化器。

限制和偏见

以下语言经过了修改，与 EleutherAI's GPT-NeoX-20B 不同

MPT-30B-Instruct可能产生错误的输出，不应依赖它产生准确的信息。MPT-30B-Instruct是在各种公共数据集上训练的。虽然我们已经努力清理预训练数据，但该模型可能生成不当、有偏见或其他令人不悦的结果。

致谢

此模型由Sam Havens、Alex Trott和MosaicML NLP团队进行了微调。

MosaicML平台

如果您对在MosaicML平台上进行自定义MPT或LLM的 training 和 deploying 感兴趣，请 sign up here 。

免责声明

该模型的许可不构成法律建议。我们对使用此模型的第三方的行为不负责任。在商业用途中使用此模型之前，请咨询律师。

引用

请使用以下格式引用此模型：

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-30B: Raising the bar
for open-source foundation models},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-30b},
    note      = {Accessed: 2023-06-22},
    urldate   = {2023-06-22}
}

作者:

Abzu

数据集大小:

28.25 GB