模型:

TheBloke/chronos-wizardlm-uc-scot-st-13B-GPTQ

任务:

文本生成

类库:

Transformers

其他:

llama text-generation-inference

许可:

other

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Austism's Chronos WizardLM UC Scot ST 13B GPTQ

这些文件是 Austism's Chronos WizardLM UC Scot ST 13B 的 GPTQ 4bit 模型文件。

它是使用 GPTQ-for-LLaMa 进行 4bit 量化的结果。

可用的存储库

如何轻松下载和使用此模型在 text-generation-webui 中

请确保您正在使用 text-generation-webui 的最新版本

点击 Model 标签。

在 Download custom model or LoRA 下，输入 TheBloke/chronos-wizardlm-uc-scot-st-13B-GPTQ。

点击 Download。

模型将开始下载，下载完成后将自动加载。

如果您想要任何自定义设置，请设置它们，然后点击 Save settings for this model，接着点击右上角的 Reload the Model。

请注意，您不需要再手动设置 GPTQ 参数。它们将从 quantize_config.json 文件中自动设置。

准备就绪后，点击 Text Generation 标签并输入提示即可开始！

如何从 Python 代码中使用此 GPTQ 模型

首先确保已安装 AutoGPTQ ：

pip install auto-gptq

然后尝试以下示例代码：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/chronos-wizardlm-uc-scot-st-13B-GPTQ"
model_basename = "chronos-wizardlm-uc-scot-st-13B-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

提供的文件

chronos-wizardlm-uc-scot-st-13B-GPTQ-4bit-128g.no-act.order.safetensors

这将适用于 AutoGPTQ 和 GPTQ-for-LLaMa 的 CUDA 版本。有关 GPTQ-for-LLaMa Triton 模式的问题报告。如果有问题，请改用 AutoGPTQ。

它使用 group_size 128 创建以提高推理准确性，但未使用 --act-order (desc_act) ，以增加兼容性并提高推理速度。

chronos-wizardlm-uc-scot-st-13B-GPTQ-4bit-128g.no-act.order.safetensors
- 适用于 CUDA 或 Triton 模式下的 AutoGPTQ。
- 适用于 GPTQ-for-LLaMa 中的 CUDA 模式。可能与 GPTQ-for-LLaMa Triton 模式存在问题。
- 适用于 text-generation-webui，包括一键安装程序。
- 参数：Groupsize = 128。Act Order / desc_act = False。

Discord

如需进一步支持以及讨论这些模型和人工智能，请加入我们：

TheBloke AI's Discord server

感谢和如何贡献

感谢 chirper.ai 团队！

我有很多人问我是否可以做出贡献。我喜欢提供模型和帮助人们，并且很愿意花更多的时间来做这些工作，同时还扩展到新的项目，如微调/训练。

如果您能够且愿意进行贡献，我将非常感激，并将帮助我继续提供更多模型，并开始进行新的人工智能项目。

捐赠者将优先获得有关所有 AI/LLM/模型问题和请求的支持，访问私人 Discord 房间以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别感谢：来自 CarbonQuill 的 Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon 特别鸣谢：Ajan Kanaga，Kalila，Derek Yates，Sean Connelly，Luke，Nathan LeClaire，Trenton Dambrowitz，Mano Prime，David Flickinger，vamX，Nikolai Manek，senxiiz，Khalefa Al-Ahmad，Illia Dulskyi，trip7s trip，Jonathan Leane，Talal Aujan，Artur Olbinski，Cory Kujawski，Joseph William Delisle，Pyrater，Oscar Rangel，Lone Striker，Luke Pendergrass，Eugene Pentland，Johann-Peter Hartmann。

感谢所有慷慨的赞助者和捐赠者！

原始模型卡片：Austism's Chronos WizardLM UC Scot ST 13B

(chronos-13b+(WizardLM Uncensored+CoT+Storytelling)) 80/20 merge

旨在与 chronos 类似，具有不同的写作能力和指令跟踪能力。

作者:

Tom Jobbins

数据集大小:

6.94 GB