模型:

TheBloke/airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ

许可:

other

其他:

text-generation-inference custom_code llama

类库:

Transformers

任务:

文本生成

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Jon Durbin的Airoboros 13B GPT4 1.4 GPTQ

这些文件是合并了 Jon Durbin's Airoboros 13B GPT4 1.4 和 Kaio Ken's SuperHOT 8K 的GPTQ 4bit模型文件。

这是一个实验性的新GPTQ，提供了高达8K的上下文大小。

通过最新的 text-generation-webui 发布版，已经测试了增加的上下文与 ExLlama 的兼容性。

还通过使用AutoGPTQ的Python代码进行了测试，并设置了trust_remote_code=True。

代码来源：

原始概念和增加上下文长度的代码： kaiokendev
包括增加上下文长度的更新Llama建模代码： emozilla 。

请仔细阅读下面的内容以了解如何使用它。

目前尚未提供GGML版本，因为llama.cpp尚不支持SuperHOT。研究正在进行中，希望能够尽快推出。

可用的存储库

提示模板

A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input
USER: prompt
ASSISTANT:

如何在text-generation-webui中轻松下载和使用此模型与ExLlama

请确保您正在使用text-generation-webui的最新版本

点击“Model”选项卡。

在“下载自定义模型或LoRA”下，输入“TheBloke/airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ”。

点击“下载”。

模型开始下载。下载完成后会显示“完成”。

取消选择“自动加载模型”。

在左上角，点击“Model”旁边的刷新图标。

在“Model”下拉菜单中，选择刚刚下载的模型：airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ

要使用增加的上下文，请将“Loader”设置为“ExLlama”，将“max_seq_len”设置为8192或4096，并将“compress_pos_emb”设置为8192上下文的4，或设置为4096上下文的2。

现在点击“保存设置”，然后点击“重新加载”

模型将自动加载，现在可以使用了！

准备好后，点击“Text Generation”选项卡并输入提示开始！

如何使用Python代码中的此GPTQ模型与AutoGPTQ

首先确保已安装AutoGPTQ和Einops：

pip3 install einops auto-gptq

然后运行以下代码。请注意，为了使其正常工作，config.json已硬编码为序列长度8192。

如果您想尝试4096以减少VRAM使用量，请手动编辑config.json将max_position_embeddings设置为所需值。

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ"
model_basename = "airoboros-13b-gpt4-1.4-superhot-8k-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device_map='auto',
        use_triton=use_triton,
        quantize_config=None)

model.seqlen = 8192

# Note: check the prompt template is correct for this model.
prompt = "Tell me about AI"
prompt_template=f'''USER: {prompt}
ASSISTANT:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

使用其他UI：猿猴补丁

在存储库中提供了llama_rope_scaled_monkey_patch.py，由@kaiokendev编写。

从理论上讲，它可以添加到任何Python UI或自定义代码中，以实现与trust_remote_code=True相同的效果。我没有测试过，应该使用trust_remote_code=True取而代之，但为了完整和有趣，我包含了它。

提供的文件

airoboros-13b-gpt4-1.4-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors

它适用于AutoGPTQ、ExLlama和GPTQ-for-LLaMa的CUDA版本。有报道称最近的GPTQ-for-LLaMa Triton模式存在问题。如果有问题，请改用AutoGPTQ。

它使用group_size 128创建，以增加推理准确性，但没有使用--act-order（desc_act）以增加兼容性和改进推理速度。

airoboros-13b-gpt4-1.4-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors
- 可用于具有增加上下文（4096或8192）的ExLlama
- 与AutoGPTQ和增加上下文的Python代码一起使用，如果设置了trust_remote_code=True。
- 应该与GPTQ-for-LLaMa的CUDA模式一起使用，但不确定增加上下文是否可用-待确认。在GPTQ-for-LLaMa Triton模式下可能存在问题。
- 可用于text-generation-webui，包括一键安装程序。
- 参数：Groupsize = 128. Act Order / desc_act = False。

Discord

如需进一步支持以及有关这些模型和人工智能的讨论，请加入我们：

TheBloke AI's Discord server

感谢和如何做出贡献

感谢这个 chirper.ai 团队！

我有很多人问我是否可以做贡献。我喜欢提供模型和帮助别人，并且非常乐意能够花更多时间做这些事情，以及扩展到新的项目，如微调/训练。

如果您有能力并愿意贡献，我将非常感激并将帮助我继续提供更多模型，开始新的人工智能项目。

捐助者将优先获得有关所有AI/LLM/模型问题和请求的支持，可访问私人Discord房间以及其他好处。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特别致谢：CarbonQuill的Luke、Aemon Algiz、Dmitriy Samsonov。

Patreon特别提到：Pyrater、WelcomeToTheClub、Kalila、Mano Prime、Trenton Dambrowitz、Spiking Neurons AB、Pierre Kircher、Fen Risland、Kevin Schuppel、Luke、Rainer Wilmers、vamX、Gabriel Puliatti、Alex、Karl Bernard、Ajan Kanaga、Talal Aujan、Space Cruiser、ya boyyy、biorpg、Johann-Peter Hartmann、Asp the Wyvern、Ai Maven、Ghost、Preetika Verma、Nikolai Manek、trip7s trip、John Detwiler、Fred von Graf、Artur Olbinski、subjectnull、John Villwock、Junyu Yang、Rod A、Lone Striker、Chris McCloskey、Iucharbius、Matthew Berman、Illia Dulskyi、Khalefa Al-Ahmad、Imad Khwaja、chris gileta、Willem Michiel、Greatston Gnanesh、Derek Yates、K、Alps Aficionado、Oscar Rangel、David Flickinger、Luke Pendergrass、Deep Realms、Eugene Pentland、Cory Kujawski、terasurfer、Jonathan Leane、senxiiz、Joseph William Delisle、Sean Connelly、webtim、zynix、Nathan LeClaire。

感谢所有慷慨的赞助者和捐赠者！

原始模型卡片：Kaio Ken的SuperHOT 8K

SuperHOT Prototype 2 w/ 8K Context

这是超级热门的第二个原型，这次是30B的8K上下文，没有RLHF，使用了 the github blog 中描述的相同技术。测试表明，该模型确实利用了8K的扩展上下文。

您需要使用猿猴补丁，或者如果您已经使用猿猴补丁，将缩放因子设置为0.25，最大序列长度设置为8192。

寻找合并和量化的模型吗？

30B 4bit CUDA： tmpupload/superhot-30b-8k-4bit-safetensors
30B 4bit CUDA 128g： tmpupload/superhot-30b-8k-4bit-128g-safetensors

训练详细信息

我使用以下配置对LoRA进行训练：

1200个样本（2048序列长度上的400个样本以上）
学习率为3e-4
3个时期
导出的模块是：
- q_proj
- k_proj
- v_proj
- o_proj
- 无偏差
秩为4
Alpha为8
无辍学
权重衰减为0.1
AdamW的beta1为0.9，beta2为0.99，epsilon为1e-5
基于4bit基础模型进行训练

原始模型卡片：Jon Durbin的Airoboros 13B GPT4 1.4

更新时间：2023-06-25 - 重新上传了稍早的检查点，似乎比最初上传的3个时期版本稍微过度拟合一些

概述

这是一个完全（不是qlora）的精细调整13b参数LlaMa模型，使用通过 https://github.com/jondurbin/airoboros 创建的完全合成的训练数据。

这主要是对之前的gpt-4系列的扩展，包括一些额外内容：

修复（+更多示例）多字符、多轮对话
来自rosettacode.org数据集的包含10种语言编码示例，感谢Mike aka kryptkpr： https://huggingface.co/datasets/mike-ravkine/rosettacode-parsed
更多角色扮演的示例
笑话
谜语
所有编码说明现在都有一个等效的“PLAINFORMAT”版本（并且所有rosettacode示例都是使用PLAINFORMAT进行训练）

这个模型是使用 FastChat 的分支进行的微调。

它使用的提示是：

A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. USER: [prompt] ASSISTANT:

换句话说，它是前言/系统提示，后跟一个空格，然后是"USER："（冒号后面加一个空格），然后是提示（可以有多行、空格等），然后是一个空格，后跟"ASSISTANT："（冒号后面加一个空格）。

用法

要运行完全精确/PyTorch本机版本，您可以使用我的FastChat的分支，它与原版大部分相同，但允许多行提示，并提供了一个--no-history选项，以防止输入标记化错误。

pip install git+https://github.com/jondurbin/FastChat

确保您正在拉取最新的分支！

然后，可以像这样调用它（在下载模型之后）：

python -m fastchat.serve.cli \
  --model-path airoboros-13b-gpt4-1.4 \
  --temperature 0.5 \
  --max-new-tokens 2048 \
  --no-history

对于多轮对话和聊天，您需要移除--no-history选项。

上下文服从的问题回答

我所说的服从是指模型在回答问题时忽略其自认为知道的内容，并使用上下文来回答问题。该模型还经过调整，尽可能限制值于所提供的上下文，以减少虚构。

封闭上下文提示的格式如下：

BEGININPUT
BEGINCONTEXT
url: https://some.web.site/123
date: 2023-06-01
... other metdata ...
ENDCONTEXT
[insert your text blocks here]
ENDINPUT
[add as many other blocks, in the exact same format]
BEGININSTRUCTION
[insert your instruction(s).  The model was tuned with single questions, paragraph format, lists, etc.]
ENDINSTRUCTION

在您的指令块中加上“如果您不知道，请不要编造答案。”可以帮助确保如果上下文完全无关，它不会东编造什么。

只有上下文指令需要这种封闭上下文的格式化。常规问题/指令不需要！

我知道这有点啰嗦和烦人，但经过多次试错，使用这些明确的界定符可以帮助模型理解在哪里找到回复以及如何将特定来源与之关联起来。

BEGININPUT - 表示新的输入块
BEGINCONTEXT - 表示与当前输入块关联的上下文块（元数据键/值对）的开始
ENDCONTEXT - 表示当前输入的元数据块的结尾
[text] - 插入您想要的输入块的任何文本，可以有多个段落适应上下文
ENDINPUT - 表示当前输入块的结尾
[重复尽可能多的以这种格式的输入块]
BEGININSTRUCTION - 表示要回复的指令（列表或单个）的开头
[instruction(s)]
ENDINSTRUCTION - 表示指令集的结尾

有时不使用ENDINSTRUCTION也可以工作，但通过在提示中明确包含它，模型更好地理解所有的指令块应该得到回复。

这里有一个微小但重要的示例来证明这一点：

BEGININPUT
BEGINCONTEXT
date: 2021-01-01
url: https://web.site/123
ENDCONTEXT
In a shocking turn of events, blueberries are now green, but will be sticking with the same name.
ENDINPUT
BEGININSTRUCTION
What color are bluberries?  Source?
ENDINSTRUCTION

回复是：

Blueberries are now green.
Source:
date: 2021-01-01
url: https://web.site/123

如果您不使用快速聊天与conv-template vicuna_v1.1一起使用vicuna style prompt，那么提示本身应该包装在vicuna1.1模板中：

USER: BEGININPUT
BEGINCONTEXT
date: 2021-01-01
url: https://web.site/123
ENDCONTEXT
In a shocking turn of events, blueberries are now green, but will be sticking with the same name.
ENDINPUT
BEGININSTRUCTION
What color are bluberries?  Source?
ENDINSTRUCTION
 ASSISTANT:

编码

您可以要求相当复杂的编码说明，例如（请务必用vicuna样式的提示方式包装）：

Create a python application with the following requirements:
- Asyncio FastAPI webserver
- ping endpoint that returns the current date in JSON format
- file upload endpoint, which calculates the file's sha256 checksum, and checks postgres to deduplicate

或：

Write a multi-threaded TCP server in C that accepts a "GET [key]" input and "SET [key] [value]" input, and uses a binary tree to get and store the input values.

您还可以在提示的末尾添加一个单独的空格和"PLAINFORMAT"，以避免出现重音符号、解释等，只打印代码，例如：

Write a websocket application in node.js. PLAINFORMAT

文字游戏/知识问答

USER: This anagram of "silent" is what you do when paying attention and hearing someone.
ASSISTANT: listen

USER: Give me a list of 7 words that start with EN
ASSISTANT: 1. Endlessly
2. Enthusiastically
3. Enormously
4. Enveloping
5. Enduring
6. Enlightening
7. Encouraging

多项选择

USER: Which of the following astronomical objects is composed primarily of ice and rock, and is located in the Kuiper Belt?

A) Mercury
B) Venus
C) Pluto
D) Jupiter
ASSISTANT: C, Pluto

写作

以海盗船长的风格写一封辞职信，辞去您作为看门人的职务。

回复：

ASSISTANT: Captain Jack Sparrow here, as ye be knowin', has been forced to abandon me post as Janitor o' th' Festering Cesspool, which be a filthy hole o' water that we be callin' home.

Ye may be thinkin' what could ha' driven me to leave me duties an' take me leave o' this sorry establishment? Well, ye'd be right in thinkin' so, for it be th' greed o' them what runs th' place that be drivin' me away. They've been fattenin' their pockets on me back while I've been slavin' me guts out cleanin' up their messes.

No more, me hearties! Me time o' servin' th' rich an' powerful be done, an' I'm takin' me leave o' this stinkin' hole forever! Me destiny lies elsewhere, an' it be time I pursued it. Ye may think ye've seen th' last o' me, but ye'd be wrong, for th' winds o' change be blowin', an' me ship be ready to set sail on th' high seas o' justice.

So, farewell ye scurvy dogs, ye who've made me life a living hell. I leave ye with me favorite saying: "ARRRGGGHHH, ye scurvy dogs!" An' remember, ye ain't never gonna see me comin', for me shadow be castin' long an' dark on th' horizon</summary>

笑话

USER: Tell me a joke about bananas.
ASSISTANT: What did the banana say when it got hit by a bus? R.I.P.E.

谜语（不是很好，但比较好一些）

USER: What has a neck but no head?
ASSISTANT: A bottle.

多字符对话

USER: Characters:
 - Yoda, the ever clever jedi master.
 - Homer Simpson, you know, the goofy yellow guy.
 - USER/Jon - don't generate any speech or actions on behalf of Jon

Rules:
 - be sure to use the manerisms and speaking style of the characters correctly
 - surround character actions with asterisks
 - always put quotes around words spoken by the characters
 - use a new line for each character's actions or speech
 - always prefix actions/speech with the character's name and colon
 - only generate text and actions on behalf of Yoda or Homer, and never anyone else

Conversation will revolve around the grapes, in a local cafe with delicious coffee.

Generate a single interaction between Yoda and Homer to start, with one sentence each.
ASSISTANT: Yoda: "Mmm, these grapes are quite delicious, Master Homer."

Homer: "Oh yeah, Yoda! They're the best grapes I've ever had!"
USER: *enters the cafe* Sorry I'm late guys!
ASSISTANT: Yoda: "No need to apologize, Jon. We were just enjoying these delightful grapes."

Homer: "Yeah, man! It's not every day you get to eat grapes with a real-life Jedi Master!"

*Yoda raises an eyebrow*

使用和许可通知

所有airoboros模型和数据集仅供研究使用。我使用了“cc-nc-4.0”许可证，但实际上它受到自定义/特殊许可证的约束，因为：

基础模型是LLaMa，拥有自己的特殊研究许可证
数据集使用了OpenAI（gpt-4和/或gpt-3.5-turbo）生成，其中有一个条款规定数据不能用于创建与OpenAI竞争的模型

因此，重申一遍：不能商业使用该模型（和数据集）。

作者:

Tom Jobbins

数据集大小:

6.95 GB