# 使用Ctranslate2进行快速推断

使用C++在CPU或GPU上进行int8推断，加速推断速度同时减少内存使用量2倍-4倍。

" HuggingFaceH4/starchat-alpha "的量化版本

pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0

于2023-06-02转换

ct2-transformers-converter --model HuggingFaceH4/starchat-alpha --output_dir /home/michael/tmp-ct2fast-starchat-alpha --force --copy_files merges.txt all_results.json training_args.bin tokenizer.json README.md dialogue_template.json tokenizer_config.json eval_results.json vocab.json TRAINER_README.md train_results.json generation_config.json trainer_state.json special_tokens_map.json added_tokens.json requirements.txt .gitattributes --quantization int8_float16 --trust_remote_code

检查点兼容 " ctranslate2>=3.14.0 " 和 " hf-hub-ctranslate2>=2.0.8 "

compute_type=int8_float16，用于device="cuda"
compute_type=int8，用于device="cpu"

from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer

model_name = "michaelfeil/ct2fast-starchat-alpha"
# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
model = GeneratorCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name,
        device="cuda",
        compute_type="int8_float16",
        # tokenizer=AutoTokenizer.from_pretrained("HuggingFaceH4/starchat-alpha")
)
outputs = model.generate(
    text=["def fibonnaci(", "User: How are you doing? Bot:"],
    max_length=64,
    include_prompt_in_result=False
)
print(outputs)

许可和其他说明：

这只是一个量化版本。许可证条件预计与原始的huggingface存储库相同。

原始描述

StarChat Alpha模型卡片

StarChat是一系列从StarCoder微调而来的语言模型，作为有用的编码助手。StarChat Alpha是这些模型的第一个版本，作为alpha版本仅供教育或研究目的使用。特别地，该模型未经过RLHF等技术与人类偏好进行对齐，因此可能会生成有问题的内容（特别是在提示下进行时）。

模型细节

模型描述

模型类型：在" oasst1 "和" databricks-dolly-15k "数据集的混合上微调的具有16B参数的类似GPT模型。
语言（自然语言处理）：英语
许可证：BigCode Open RAIL-M v1
从模型微调： bigcode/starcoderbase

模型来源[可选]

存储库： https://github.com/bigcode-project/starcoder
演示： https://huggingface.co/spaces/HuggingFaceH4/starchat-playground

用途

StarChat Alpha旨在用于教育和/或研究目的，并在这方面可以用于探索开源语言模型的编程能力。

偏见、风险和限制

StarChat Alpha未经过RLHF等技术与人类偏好进行对齐，也没有使用ChatGPT进行响应的循环过滤，因此模型可能生成有问题的输出（特别是在提示下进行时）。主要在代码数据上训练的模型也会具有与GitHub社区人口组成相一致的更倾斜的族群偏见，有关此更多信息，请参阅派生自The Stack的 StarCoder dataset 。

由于基础模型在大量代码语料库上进行了预训练，因此可能会生成语法正确但语义错误的代码片段。例如，它可能会生成无法编译或产生错误结果的代码。它还可能生成易受安全漏洞攻击的代码。我们观察到该模型还有产生虚假URL的倾向，在点击之前应小心检查。

StarChat Alpha是从基础模型" StarCoder Base "微调而来，请参考其模型卡片的" Limitations Section "以获取相关信息。特别是，该模型在一些性别偏见、毒性倾向和建议存在已知安全漏洞的代码完成的风险等方面进行了评估；这些评估结果在其" technical report "中报道。

如何开始使用该模型

使用下面的代码开始使用该模型。

from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/starchat-alpha")
# Inputs use chat tokens 
inputs = "<|system|>\n<|end|>\n<|user|>How can I sort a list in Python?<|end|>\n<|assistant|>"
outputs = pipe(inputs)

引用[可选]

BibTeX：

@article{Tunstall2023starchat-alpha,
  author = {Tunstall, Lewis and Lambert, Nathan and Rajani, Nazneen and Beeching, Edward and Le Scao, Teven and von Werra, Leandro and Han, Sheon and Schmid, Philipp and Rush, Alexander},
  title = {Creating a Coding Assistant with StarCoder},
  journal = {Hugging Face Blog},
  year = {2023},
  note = {https://huggingface.co/blog/starchat},
}

作者:

Michael

数据集大小:

14.51 GB