英文

StableLM-Tuned-Alpha

模型描述

StableLM-Tuned-Alpha 是构建在 StableLM-Base-Alpha 模型之上的一系列包括3B和7B参数的仅解码语言模型,并在多个聊天和指令跟随数据集上进一步微调。

使用方法

您可以使用以下代码片段开始与 StableLM-Tuned-Alpha 进行聊天:

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("StabilityAI/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("StabilityAI/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [50278, 50279, 50277, 1, 0]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

StableLM Tuned 应该使用按照 ...... 格式进行的提示。系统提示是

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.

模型详情

  • 开发者 : Stability AI
  • 模型类型 : StableLM-Tuned-Alpha 模型是基于NeoX transformer架构的自回归语言模型。
  • 语言 : 英文
  • 库 : HuggingFace Transformers
  • 许可证 : 经过微调的检查点(StableLM-Tuned-Alpha )根据非商业创作共用许可协议( CC BY-NC-SA-4.0 )授权,与 Stanford Alpaca 指定的原始非商业许可证一致。
  • 联系方式 : 如果对模型有疑问或意见,请发送邮件至 lm@stability.ai

训练

Parameters Hidden Size Layers Heads Sequence Length
3B 4096 16 32 4096
7B 6144 16 48 4096

训练数据集

StableLM-Tuned-Alpha 模型是在五个数据集的基础上进行微调的: Alpaca ,由OpenAI的text-davinci-003引擎生成的52000个指令和演示数据集; GPT4All Prompt Generations ,包含由GPT-4生成的400k个提示和回答; Anthropic HH ,由AI助手的有益性和无害性偏好组成; DataBricks Dolly ,包含InstructGPT论文中Databricks员工在能力领域中生成的15000个指令/回答,包括头脑风暴、分类、闭合型问答、生成、信息提取、开放型问答和摘要;以及 ShareGPT Vicuna (English subset) ,从 ShareGPT 检索到的谈话数据集。

训练过程

模型通过在上述数据集上进行监督式微调来学习,使用混合精度(FP16)进行训练,并使用AdamW进行优化。我们概述以下超参数:

Parameters Batch Size Learning Rate Warm-up Weight Decay Betas
3B 256 2e-5 50 0.01 (0.9, 0.99)
7B 128 2e-5 100 0.01 (0.9, 0.99)

使用和限制

使用目的

这些模型旨在由开源社区的聊天类应用程序使用,符合 CC BY-NC-SA-4.0 许可证的规定。

限制和偏见

尽管上述数据集有助于引导基础语言模型生成“更安全”的文本分布,但并非所有偏见和有害性都可以通过微调来减轻。我们要求用户在生成的回答中注意潜在的问题。请勿将模型输出视为人类判断的替代品或真理的来源。请谨慎使用。

致谢

感谢 Dakota Mahan( @dmayhem93 )的帮助。

引用

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
@misc{vicuna2023,
    title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality},
    url = {https://vicuna.lmsys.org},
    author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
    month = {March},
    year = {2023}
}
@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}