模型:

k0t1k/mosaicml-mpt-7b-instruct-lora

任务:

文本生成

类库:

PyTorch Transformers

数据集:

mosaicml/dolly_hhrlhf 3Amosaicml/dolly_hhrlhf

其他:

mpt custom_code Composer MosaicML llm-foundry

预印本库:

arxiv:2205.14135 arxiv:2108.12409 arxiv:2010.04245

许可:

cc-by-sa-3.0

模型介绍文件清单

英文

MPT-7B-Instruct

MPT-7B-Instruct 是一个用于短格式指令的模型。它是通过在来自 Databricks Dolly-15k 和 Anthropic Helpful and Harmless (HH-RLHF) 数据集的 dataset 衍生数据上微调 MPT-7B 而得到的。

许可证：CC-By-SA-3.0
Demo on Hugging Face Spaces

该模型是由 MosaicML 训练的，采用了修改过的仅解码器的 transformer 架构。

模型日期

2023年5月5日

模型许可

CC-By-SA-3.0

文档

Blog post: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
Codebase (mosaicml/llm-foundry repo)
有问题？请通过 MosaicML Community Slack 联系我们！

示例问题/指令

Longboi24 ：

什么是 quoll？

MPT-7B-Instruct ：

天竺鼠（发音为“cool"）是澳大利亚的本地食肉有袋类哺乳动物，也被称为袋鼠或者墙袋鼠。在亚洲和南美洲的其他地区也有被称为袋鼠或者墙袋鼠的动物。

如何使用

注意：该模型需要在 from_pretrained 方法中传递 trust_remote_code=True。这是因为我们使用了一个尚未包含在 transformers 包中的自定义模型架构。

它包括许多训练效率优化功能的选项，如 FlashAttention (Dao et al. 2022) 、 ALiBi 、QK LayerNorm 等等。

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)

注意：该模型需要在 from_pretrained 方法中传递 trust_remote_code=True。这是因为我们使用了一个尚未包含在 Hugging Face transformers 包中的自定义 MPT 模型架构。MPT 包含许多训练效率优化功能，如 FlashAttention 、 ALiBi 、 QK LayerNorm 等等。

要使用经过优化的 FlashAttention 的 triton implementation 版本，你可以用 attn_impl='triton' 加载模型，并将模型转换为 bfloat16：

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')

尽管该模型的序列长度训练时是2048，但 ALiBi 可以在微调和/或推理过程中增加最大序列长度。例如：

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)

该模型是使用 EleutherAI/gpt-neox-20b 分词器训练的。

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

格式化

该模型是使用dolly-15k格式的数据进行训练的：

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)

example = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week? Explain before answering."
fmt_ex = PROMPT_FOR_GENERATION_FORMAT.format(instruction=example)

在上述示例中，fmt_ex 已经准备好进行分词并发送到模型中。

模型描述

该模型是标准解码器-only transformer 的修改版本。

该模型从标准 transformer 进行了以下修改：

它使用了 FlashAttention
它使用了 ALiBi (Attention with Linear Biases) ，并且不使用位置编码
它不使用偏置

Hyperparameter	Value
n_parameters	6.7B
n_layers	32
n_heads	32
d_model	4096
vocab size	50432
sequence length	2048

预训练数据

有关预训练过程的更多细节，请参阅 MPT-7B 。

数据是使用 EleutherAI/gpt-neox-20b 分词器进行分词的。

限制和偏差

以下语言是从 EleutherAI's GPT-NeoX-20B 修改而来的。

MPT-7B-Instruct 可能会产生事实不准确的输出，不应依赖它来产生事实准确的信息。MPT-7B-Instruct 是基于多个公共数据集进行训练的。虽然我们已经尽最大努力清理预训练数据，但该模型仍有可能生成猥亵、偏见或其他冒犯性的输出。

致谢

该模型由 Sam Havens 和 MosaicML NLP 团队微调。

MosaicML 平台

如果你对在 MosaicML 平台上进行 training 和 deploying 自己的 MPT 或 LLMs 感兴趣，请 sign up here 。

免责声明

该模型的许可证不构成法律建议。我们对使用该模型的第三方的行为不承担责任。在商业用途中使用该模型之前，请咨询律师。

引用

请使用以下格式引用此模型：

@online{MosaicML2023Introducing,
    author    = {MosaicML NLP Team},
    title     = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
    year      = {2023},
    url       = {www.mosaicml.com/blog/mpt-7b},
    note      = {Accessed: 2023-03-28}, % change this date
    urldate   = {2023-03-28} % change this date
}

作者:

Konstantin Kotik

数据集大小:

12.39 GB