模型:

TheBloke/Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-fp16

许可:

other

其他:

text-generation-inference custom_code llama

类库:

Transformers PyTorch

任务:

文本生成

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Eric Hartford's Wizard Vicuna 13B Uncensored fp16

这是 fp16 pytorch 格式的模型文件 Eric Hartford's Wizard Vicuna 13B Uncensored 合并到 Kaio Ken's SuperHOT 8K 上。

Kaio Ken's SuperHOT 13b LoRA 被合并到基本模型上，通过使用trust_remote_code=True，在推理过程中可以实现8K上下文。

请注意，config.json已设置为序列长度8192。如果您想尝试较小的序列长度，可以将其修改为4096。

可用的仓库

如何从Python代码中使用此模型

首先确保您已安装Einops：

pip3 install auto-gptq

然后运行以下代码。config.json默认设置为序列长度8192，但您也可以在Python代码中进行配置。

使用trust_remote_code=True激活的提供的建模代码将自动从配置的max_position_embeddings中设置scale参数。例如，对于8192，scale设置为4。

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, pipeline
import argparse

model_name_or_path = "TheBloke/Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-fp16"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
# Change this to the sequence length you want
config.max_position_embeddings = 8192

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
        config=config,
        trust_remote_code=True,
        device_map='auto')

# Note: check to confirm if this is correct prompt template is correct for this model!
prompt = "Tell me about AI"
prompt_template=f'''USER: {prompt}
ASSISTANT:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

使用其他UI：Monkey Patch

提供的仓库中包含llama_rope_scaled_monkey_patch.py，由@kaiokendev编写。

它理论上可以添加到任何Python UI或自定义代码中，以实现与trust_remote_code=True相同的结果。我没有测试过，而且应该使用trust_remote_code=True取代它，但我包含它是为了完整性和利益。

Discord

如需进一步支持，以及有关这些模型和人工智能的讨论，请加入我们：

TheBloke AI's Discord server

感谢和如何贡献

感谢 chirper.ai 团队！

我有很多人问我是否可以贡献。我喜欢提供模型并帮助人们，非常乐意能够花更多的时间做这些，并扩展到新的项目，如微调/训练。

如果您能够和愿意进行贡献，我将非常感激，并且这将帮助我继续提供更多模型，并开始进行新的人工智能项目。

捐助者将在所有有关AI / LLM /模型的问题和请求上获得优先支持，可以访问私人Discord聊天室，以及其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：CarbonQuill的Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon特别提到：zynix，ya boyyy，Trenton Dambrowitz，Imad Khwaja，Alps Aficionado，chris gileta，John Detwiler，Willem Michiel，RoA，Mano Prime，Rainer Wilmers，Fred von Graf，Matthew Berman，Ghost，Nathan LeClaire，Iucharbius，Ai Maven，Illia Dulskyi，Joseph William Delisle，Space Cruiser，Lone Striker，Karl Bernard，Eugene Pentland，Greatston Gnanesh，Jonathan Leane，Randy H，Pierre Kircher，Willian Hasse，Stephen Murray，Alex，terasurfer，Edmond Seymore，Oscar Rangel，Luke Pendergrass，Asp the Wyvern，Junyu Yang，David Flickinger，Luke，Spiking Neurons AB，subjectnull，Pyrater，Nikolai Manek，senxiiz，Ajan Kanaga，Johann-Peter Hartmann，Artur Olbinski，Kevin Schuppel，Derek Yates，Kalila，K，Talal Aujan，Khalefa Al-Ahmad，Gabriel Puliatti，John Villwock，WelcomeToTheClub，Daniel P. Andersen，Preetika Verma，Deep Realms，Fen Risland，trip7s trip，webtim，Sean Connelly，Michael Levine，Chris McCloskey，biorpg，vamX，Viktor Bowallius，Cory Kujawski。

感谢所有慷慨的赞助者和捐助者！

原始模型卡片：Kaio Ken的SuperHOT 8K

SuperHOT原型2 / 8K上下文

这是SuperHOT的第二个原型，这次使用30B和8K上下文，并且没有使用RLHF，使用了和 the github blog 中描述的相同的技术。测试结果表明，模型确实利用了8K的扩展上下文。

您将需要使用monkeypatch或者如果已经使用了monkeypatch，请将缩放因子更改为0.25，并将最大序列长度更改为8192

寻找合并和量化的模型？

30B 4-bit CUDA： tmpupload/superhot-30b-8k-4bit-safetensors
30B 4-bit CUDA 128g： tmpupload/superhot-30b-8k-4bit-128g-safetensors

培训详细信息

我使用以下配置进行了LoRA的训练：

1200个样本（序列长度大于2048的样本超过400个）
- 学习率为3e-4
- 3个epochs
- 导出的模块为：
- q_proj
- k_proj
- v_proj
- o_proj
- 无偏置
- 秩为4
- Alpha为8
- 无dropout
- 权重衰减为0.1
- AdamW的beta1为0.9，beta2为0.99，epsilon为1e-5
- 训练在4-bit基本模型上进行

原始模型卡片：Eric Hartford的Wizard Vicuna 13B Uncensored

这是 wizard-vicuna-13b 训练的子数据集，去除了包含了对齐/道德化的响应。目的是训练一个没有内置对齐的WizardLM模型，以便可以单独添加对齐（以任何方式）例如使用RLHF的LoRA。

向开源AI / ML社区和帮助我的所有人致敬。

注意：

未经审查的模型没有保护措施。

您对使用模型的任何行为负责，就像您对使用刀具，枪支，打火机或汽车等危险物品的任何行为负责一样。

发布该模型生成的任何内容与您自己发布相同。

您对发布的内容负责，您不能将模型与刀具，枪支，打火机或汽车一样，将责任归咎于它所做的任何事情。

作者:

Tom Jobbins

数据集大小:

24.25 GB