模型:

mosaicml/mpt-7b-8k-instruct

许可:

cc-by-sa-3.0

预印本库:

arxiv:2010.04245 arxiv:2108.12409 arxiv:2205.14135

其他:

llm-foundry MosaicML Composer mpt custom_code

数据集:

3Aspider 3Ascrolls/summ_screen_fd 3Aemozilla/quality 3Atau/scrolls/qasper 3Aduorc 3Amosaicml/dolly_hhrlhf 3Aknkarthick/dialogsum 3Aconceptofmind/cot_submix_original/cot_gsm8k 3Acompetition_math

类库:

Transformers PyTorch

任务:

文本生成

模型介绍文件清单

英文

MPT-7B-Instruct-8k

MPT-7B-Instruct-8k是一个用于长篇指导的模型，尤其适用于问题回答和对长文档进行摘要。它是通过在 Databricks Dolly-15k 和 Anthropic Helpful and Harmless (HH-RLHF) 数据集上微调 MPT-7B-8k 的衍生版本 Dolly HHRLHF 而构建的。同时它还在 Competition Math 、 Duorc 、 CoT GSM8k 、 Qasper 、 Quality 、 Summ Screen FD 和 Spider 上进行了训练，这与 MPT-30B-Instruct 训练的数据集相同。

License: CC-By-SA-3.0

这个模型是由 MosaicML 进行训练的，采用了修改后的仅解码器的Transformer架构。

Model Date

July 18, 2023

Model License

CC-By-SA-3.0

Documentation

Blog post: MPT-7B-8k
Codebase (mosaicml/llm-foundry repo)
Questions: 欢迎通过 MosaicML Community Slack 与我们联系！

How to Use

这个模型最适合与MosaicML llm-foundry repository 一起用于训练和微调。

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct-8k',
  trust_remote_code=True
)

注意：该模型在使用from_pretrained方法时需要传递trust_remote_code=True参数。这是因为我们使用了自定义的MPT模型架构，该架构尚未包含在Hugging Face transformers包中。MPT包括许多训练效率特性选项，例如 FlashAttention 、 ALiBi 、 QK LayerNorm 等。

若要使用优化过的FlashAttention中的 triton implementation ，可以使用attn_impl='triton'将模型加载到GPU（cuda:0）上，并使用bfloat16精度：

import torch
import transformers

name = 'mosaicml/mpt-7b-instruct-8k'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

该模型最初在序列长度为2048的情况下进行了训练，并进行了额外的预训练阶段，使序列长度的最大值达到8192。但是ALiBi使用户能够在微调和/或推理过程中进一步增加最大序列长度。例如：

import transformers

name = 'mosaicml/mpt-7b-instruct-8k'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  trust_remote_code=True
)

这个模型使用了基于 EleutherAI/gpt-neox-20b tokenizer的MPT-7B-chat分词器，其中包含了额外的ChatML标记。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-7b-8k')

然后，可以在文本生成流程中使用该模型。注意：在使用较低精度的Torch模块时，最好使用torch.quantization。

from transformers import pipeline

with torch.autocast('cuda', dtype=torch.bfloat16):
    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

Model Description

该架构是对标准解码器-只Transformer的修改。

该模型的以下方面进行了修改：

它使用了 FlashAttention
它使用了 ALiBi (Attention with Linear Biases) ，并且不使用位置编码
它不使用偏置

Hyperparameter	Value
n_parameters	6.7B
n_layers	32
n_heads	32
d_model	4096
vocab size	50432
sequence length	2048

Data Mix

该模型是在以下数据组合上进行训练的：

Data Source	Number of Tokens in Source	Proportion
competition_math	1.6 M	3.66%
cot_gsm8k	3.36 M	7.67%
dialogsum	0.1 M	0.23%
dolly_hhrlhf	5.89 M	13.43%
duorc	7.8 M	17.80%
qasper	8.72 M	19.90%
quality	11.29 M	25.78%
scrolls/summ_screen_fd	4.97 M	11.33%
spider	0.089 M	0.20%

Training Configuration

该模型用8个80GB的A100进行了约6.3小时的训练，使用了 MosaicML Platform 。模型采用了数据并行的分片训练，并使用了AdamW优化器。

Limitations and Biases

以下语言摘自 EleutherAI's GPT-NeoX-20B

MPT-7B-Instruct-8k可能会产生事实不准确的输出，不能依赖它产生准确的信息。MPT-7B-Instruct-8k是在各种公共数据集上进行训练的。虽然我们已经付出了巨大努力来清理预训练数据，但这个模型可能会生成淫秽、有偏见或攻击性的输出。

Acknowledgements

该模型由MosaicML NLP团队进行了微调。

Disclaimer

此模型的许可证不构成法律建议。我们对使用此模型的第三方的行为不负责任，请在将该模型用于商业目的之前咨询律师。

MosaicML Platform

如果您有兴趣在MosaicML平台上进行 training 和 deploying 自己的MPT或LLMs模型训练，请 sign up here 。

Citation

请使用以下格式引用这个模型：

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-30B: Raising the bar
for open-source foundation models},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-30b},
    note      = {Accessed: 2023-06-22},
    urldate   = {2023-06-22}
}

作者:

Mosaic ML, Inc.

数据集大小:

12.39 GB