模型:

TheBloke/LongChat-13B-GPTQ

许可:

other

其他:

text-generation-inference custom_code llama

类库:

Transformers

任务:

文本生成

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

LmSys' Long Chat 13B GPTQ

这些文件是用于 LmSys' Long Chat 13B 的GPTQ 4位模型文件。

这是使用 GPTQ-for-LLaMa 进行4位量化的结果。

这个GPTQ提供了最多16K的上下文大小

增加的上下文已经通过最新版本的 text-generation-webui 与 ExLlama 一起进行了测试。

请勿将此模型用于2048上下文。请使用标准的Vicuna 1.3模型。

它还通过使用AutoGPTQ的Python代码进行了测试，并且trust_remote_code=True。

请仔细阅读下面的内容以了解如何使用它。

可用的存储库

提示模板

A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input
USER: prompt
ASSISTANT:

如何在使用ExLlama的text-generation-webui中轻松下载和使用此模型

请确保您正在使用text-generation-webui的最新版本

点击 Model 选项卡。

在 Download custom model or LoRA 下，输入 TheBloke/LongChat-13B-GPTQ 。

点击 Download 。

模型开始下载。一旦完成，将显示“完成”

取消选择 Autoload the model

在左上角，点击 Model 旁边的刷新图标。

在 Model 下拉菜单中，选择刚刚下载的模型：LongChat-13B-GPTQ

要使用增加的上下文，请将 Loader 设置为 ExLlama ，将 max_seq_len 设置为16384、8192或4096，并将 compress_pos_emb 设置为8（用于16384上下文）、4（用于8192上下文）或2（用于4096上下文）。

现在点击 Save Settings 然后点击 Reload

模型将自动加载，现在已经准备好使用！

准备好后，点击 Text Generation 选项卡，输入提示即可开始！

如何在Python代码中使用此GPTQ模型与AutoGPTQ

首先确保已安装AutoGPTQ和Einops：

pip3 install einops auto-gptq

然后运行以下代码。请注意，为了使其工作，config.json已经硬编码为序列长度为8192。

如果要尝试4096或16384，请手动编辑config.json以将max_position_embeddings设置为所需值。

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/LongChat-13B-GPTQ"
model_basename = "longchat-13b-16k-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device_map='auto',
        use_triton=use_triton,
        quantize_config=None)

model.seqlen = 8192

# Note: check the prompt template is correct for this model.
prompt = "Tell me about AI"
prompt_template=f'''USER: {prompt}
ASSISTANT:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

提供的文件

longchat-13b-16k-GPTQ-4bit-128g.no-act.order.safetensors

这将与AutoGPTQ、ExLlama和CUDA版本的GPTQ-for-LLaMa一起工作。有报道称近期GPTQ-for-LLaMa Triton模式存在问题。如果遇到问题，请改用AutoGPTQ。

它使用group_size 128来增加推断准确性，但没有使用--act-order（desc_act）来提高兼容性和改善推断速度。

longchat-13b-16k-GPTQ-4bit-128g.no-act.order.safetensors
- 可与具有增加上下文（4096、8192、16384或其他中间值）的ExLlama一起使用
- 可在Python代码中与AutoGPTQ一起使用，包括具有增加上下文，如果设置trust_remote_code=True。
- 应该可以与CUDA模式的GPTQ-for-LLaMa一起工作，但不确定增加上下文是否可用-待确认。可能在GPTQ-for-LLaMa Triton模式中出现问题。
- 与text-generation-webui一起工作，包括一键安装程序。
- 参数：Groupsize = 128。 Act Order / desc_act = False。

Discord

有关这些模型和AI的支持和讨论，请加入我们：

TheBloke AI's Discord server

感谢和如何贡献

感谢 chirper.ai 团队！

我被很多人问是否可以做出贡献。我喜欢提供模型和帮助人们，也很愿意能够花更多的时间提供支持，并扩展到新的项目，如微调/训练等。

如果您能够并且愿意做出贡献，我将非常感激，并将帮助我继续提供更多的模型，并开始新的AI项目。

赞助者将优先获得对AI/LLM/模型问题和请求的支持，可以访问私人Discord房间，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别感谢：来自CarbonQuill的Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon特别提及：Pyrater，WelcomeToTheClub，Kalila，Mano Prime，Trenton Dambrowitz，Spiking Neurons AB，Pierre Kircher，Fen Risland，Kevin Schuppel，Luke，Rainer Wilmers，vamX，Gabriel Puliatti，Alex，Karl Bernard，Ajan Kanaga，Talal Aujan，Space Cruiser，ya boyyy，biorpg，Johann-Peter Hartmann，Asp the Wyvern，Ai Maven，Ghost，Preetika Verma，Nikolai Manek，trip7s trip，John Detwiler，Fred von Graf，Artur Olbinski，subjectnull，John Villwock，Junyu Yang，Rod A，Lone Striker，Chris McCloskey，Iucharbius，Matthew Berman，Illia Dulskyi，Khalefa Al-Ahmad，Imad Khwaja，chris gileta，Willem Michiel，Greatston Gnanesh，Derek Yates，K，Alps Aficionado，Oscar Rangel，David Flickinger，Luke Pendergrass，Deep Realms，Eugene Pentland，Cory Kujawski，terasurfer，Jonathan Leane，senxiiz，Joseph William Delisle，Sean Connelly，webtim，zynix，Nathan LeClaire。

感谢所有慷慨的赞助者和捐助者！

原始模型卡片：LmSys' Long Chat 13B

longchat-13b-16k 模型卡片

模型细节

模型类型：longchat-13b-16k是一个开源聊天机器人，通过在从ShareGPT收集的用户共享对话上进行fine-tuning llama-13b并使用 blog 中报道的等化旋转嵌入技术进行训练。

模型日期：longchat-13b-16k是在2023年6月进行训练的。

开发该模型的组织：长聊开发人员：Dacheng Li*，Rulin Shao*，Anze Xie，Ying Sheng，Lianmin Zheng，Ion Stoica，Xuezhe Ma和Hao Zhang

获取更多信息的论文或资源： https://github.com/DachengLi1/LongChat

有关模型的问题或评论，请发送至： https://github.com/DachengLi1/LongChat

预期使用方式

主要预期用途：longchat-13b-16k的主要用途是用于研究目的。

主要预期用户：模型的主要预期用户是自然语言处理，机器学习和人工智能领域的研究人员。

训练数据集

收集自ShareGPT.com的18K个对话。

评估数据集

我们发布的 LongEval 对模型质量进行了初步评估。

作者:

Tom Jobbins

数据集大小:

6.95 GB