模型:
TheBloke/airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ
Chat & support: my new Discord server
Want to contribute? TheBloke's Patreon page
这些文件是合并了 Jon Durbin's Airoboros 13B GPT4 1.4 和 Kaio Ken's SuperHOT 8K 的GPTQ 4bit模型文件。
这是一个实验性的新GPTQ,提供了高达8K的上下文大小。
通过最新的 text-generation-webui 发布版,已经测试了增加的上下文与 ExLlama 的兼容性。
还通过使用AutoGPTQ的Python代码进行了测试,并设置了trust_remote_code=True。
代码来源:
请仔细阅读下面的内容以了解如何使用它。
目前尚未提供GGML版本,因为llama.cpp尚不支持SuperHOT。研究正在进行中,希望能够尽快推出。
A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input USER: prompt ASSISTANT:
请确保您正在使用text-generation-webui的最新版本
首先确保已安装AutoGPTQ和Einops:
pip3 install einops auto-gptq
然后运行以下代码。请注意,为了使其正常工作,config.json已硬编码为序列长度8192。
如果您想尝试4096以减少VRAM使用量,请手动编辑config.json将max_position_embeddings设置为所需值。
from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig import argparse model_name_or_path = "TheBloke/airoboros-13b-gpt4-1.4-SuperHOT-8K-GPTQ" model_basename = "airoboros-13b-gpt4-1.4-superhot-8k-GPTQ-4bit-128g.no-act.order" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=True, device_map='auto', use_triton=use_triton, quantize_config=None) model.seqlen = 8192 # Note: check the prompt template is correct for this model. prompt = "Tell me about AI" prompt_template=f'''USER: {prompt} ASSISTANT:''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text'])
在存储库中提供了llama_rope_scaled_monkey_patch.py,由@kaiokendev编写。
从理论上讲,它可以添加到任何Python UI或自定义代码中,以实现与trust_remote_code=True相同的效果。我没有测试过,应该使用trust_remote_code=True取而代之,但为了完整和有趣,我包含了它。
airoboros-13b-gpt4-1.4-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors
它适用于AutoGPTQ、ExLlama和GPTQ-for-LLaMa的CUDA版本。有报道称最近的GPTQ-for-LLaMa Triton模式存在问题。如果有问题,请改用AutoGPTQ。
它使用group_size 128创建,以增加推理准确性,但没有使用--act-order(desc_act)以增加兼容性和改进推理速度。
如需进一步支持以及有关这些模型和人工智能的讨论,请加入我们:
感谢这个 chirper.ai 团队!
我有很多人问我是否可以做贡献。我喜欢提供模型和帮助别人,并且非常乐意能够花更多时间做这些事情,以及扩展到新的项目,如微调/训练。
如果您有能力并愿意贡献,我将非常感激并将帮助我继续提供更多模型,开始新的人工智能项目。
捐助者将优先获得有关所有AI/LLM/模型问题和请求的支持,可访问私人Discord房间以及其他好处。
特别致谢:CarbonQuill的Luke、Aemon Algiz、Dmitriy Samsonov。
Patreon特别提到:Pyrater、WelcomeToTheClub、Kalila、Mano Prime、Trenton Dambrowitz、Spiking Neurons AB、Pierre Kircher、Fen Risland、Kevin Schuppel、Luke、Rainer Wilmers、vamX、Gabriel Puliatti、Alex、Karl Bernard、Ajan Kanaga、Talal Aujan、Space Cruiser、ya boyyy、biorpg、Johann-Peter Hartmann、Asp the Wyvern、Ai Maven、Ghost、Preetika Verma、Nikolai Manek、trip7s trip、John Detwiler、Fred von Graf、Artur Olbinski、subjectnull、John Villwock、Junyu Yang、Rod A、Lone Striker、Chris McCloskey、Iucharbius、Matthew Berman、Illia Dulskyi、Khalefa Al-Ahmad、Imad Khwaja、chris gileta、Willem Michiel、Greatston Gnanesh、Derek Yates、K、Alps Aficionado、Oscar Rangel、David Flickinger、Luke Pendergrass、Deep Realms、Eugene Pentland、Cory Kujawski、terasurfer、Jonathan Leane、senxiiz、Joseph William Delisle、Sean Connelly、webtim、zynix、Nathan LeClaire。
感谢所有慷慨的赞助者和捐赠者!
这是超级热门的第二个原型,这次是30B的8K上下文,没有RLHF,使用了 the github blog 中描述的相同技术。测试表明,该模型确实利用了8K的扩展上下文。
您需要使用猿猴补丁,或者如果您已经使用猿猴补丁,将缩放因子设置为0.25,最大序列长度设置为8192。
寻找合并和量化的模型吗?我使用以下配置对LoRA进行训练:
更新时间:2023-06-25 - 重新上传了稍早的检查点,似乎比最初上传的3个时期版本稍微过度拟合一些
这是一个完全(不是qlora)的精细调整13b参数LlaMa模型,使用通过 https://github.com/jondurbin/airoboros 创建的完全合成的训练数据。
这主要是对之前的gpt-4系列的扩展,包括一些额外内容:
这个模型是使用 FastChat 的分支进行的微调。
它使用的提示是:
A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. USER: [prompt] ASSISTANT:
换句话说,它是前言/系统提示,后跟一个空格,然后是"USER:"(冒号后面加一个空格),然后是提示(可以有多行、空格等),然后是一个空格,后跟"ASSISTANT:"(冒号后面加一个空格)。
要运行完全精确/PyTorch本机版本,您可以使用我的FastChat的分支,它与原版大部分相同,但允许多行提示,并提供了一个--no-history选项,以防止输入标记化错误。
pip install git+https://github.com/jondurbin/FastChat
确保您正在拉取最新的分支!
然后,可以像这样调用它(在下载模型之后):
python -m fastchat.serve.cli \ --model-path airoboros-13b-gpt4-1.4 \ --temperature 0.5 \ --max-new-tokens 2048 \ --no-history
对于多轮对话和聊天,您需要移除--no-history选项。
我所说的服从是指模型在回答问题时忽略其自认为知道的内容,并使用上下文来回答问题。该模型还经过调整,尽可能限制值于所提供的上下文,以减少虚构。
封闭上下文提示的格式如下:
BEGININPUT BEGINCONTEXT url: https://some.web.site/123 date: 2023-06-01 ... other metdata ... ENDCONTEXT [insert your text blocks here] ENDINPUT [add as many other blocks, in the exact same format] BEGININSTRUCTION [insert your instruction(s). The model was tuned with single questions, paragraph format, lists, etc.] ENDINSTRUCTION
在您的指令块中加上“如果您不知道,请不要编造答案。”可以帮助确保如果上下文完全无关,它不会东编造什么。
只有上下文指令需要这种封闭上下文的格式化。常规问题/指令不需要!
我知道这有点啰嗦和烦人,但经过多次试错,使用这些明确的界定符可以帮助模型理解在哪里找到回复以及如何将特定来源与之关联起来。
有时不使用ENDINSTRUCTION也可以工作,但通过在提示中明确包含它,模型更好地理解所有的指令块应该得到回复。
这里有一个微小但重要的示例来证明这一点:
BEGININPUT BEGINCONTEXT date: 2021-01-01 url: https://web.site/123 ENDCONTEXT In a shocking turn of events, blueberries are now green, but will be sticking with the same name. ENDINPUT BEGININSTRUCTION What color are bluberries? Source? ENDINSTRUCTION
回复是:
Blueberries are now green. Source: date: 2021-01-01 url: https://web.site/123
如果您不使用快速聊天与conv-template vicuna_v1.1一起使用vicuna style prompt,那么提示本身应该包装在vicuna1.1模板中:
USER: BEGININPUT BEGINCONTEXT date: 2021-01-01 url: https://web.site/123 ENDCONTEXT In a shocking turn of events, blueberries are now green, but will be sticking with the same name. ENDINPUT BEGININSTRUCTION What color are bluberries? Source? ENDINSTRUCTION ASSISTANT:
您可以要求相当复杂的编码说明,例如(请务必用vicuna样式的提示方式包装):
Create a python application with the following requirements: - Asyncio FastAPI webserver - ping endpoint that returns the current date in JSON format - file upload endpoint, which calculates the file's sha256 checksum, and checks postgres to deduplicate
或:
Write a multi-threaded TCP server in C that accepts a "GET [key]" input and "SET [key] [value]" input, and uses a binary tree to get and store the input values.
您还可以在提示的末尾添加一个单独的空格和"PLAINFORMAT",以避免出现重音符号、解释等,只打印代码,例如:
Write a websocket application in node.js. PLAINFORMAT
USER: This anagram of "silent" is what you do when paying attention and hearing someone. ASSISTANT: listen
USER: Give me a list of 7 words that start with EN ASSISTANT: 1. Endlessly 2. Enthusiastically 3. Enormously 4. Enveloping 5. Enduring 6. Enlightening 7. Encouraging
USER: Which of the following astronomical objects is composed primarily of ice and rock, and is located in the Kuiper Belt? A) Mercury B) Venus C) Pluto D) Jupiter ASSISTANT: C, Pluto
回复:
ASSISTANT: Captain Jack Sparrow here, as ye be knowin', has been forced to abandon me post as Janitor o' th' Festering Cesspool, which be a filthy hole o' water that we be callin' home. Ye may be thinkin' what could ha' driven me to leave me duties an' take me leave o' this sorry establishment? Well, ye'd be right in thinkin' so, for it be th' greed o' them what runs th' place that be drivin' me away. They've been fattenin' their pockets on me back while I've been slavin' me guts out cleanin' up their messes. No more, me hearties! Me time o' servin' th' rich an' powerful be done, an' I'm takin' me leave o' this stinkin' hole forever! Me destiny lies elsewhere, an' it be time I pursued it. Ye may think ye've seen th' last o' me, but ye'd be wrong, for th' winds o' change be blowin', an' me ship be ready to set sail on th' high seas o' justice. So, farewell ye scurvy dogs, ye who've made me life a living hell. I leave ye with me favorite saying: "ARRRGGGHHH, ye scurvy dogs!" An' remember, ye ain't never gonna see me comin', for me shadow be castin' long an' dark on th' horizon</summary>
USER: Tell me a joke about bananas. ASSISTANT: What did the banana say when it got hit by a bus? R.I.P.E.
USER: What has a neck but no head? ASSISTANT: A bottle.
USER: Characters: - Yoda, the ever clever jedi master. - Homer Simpson, you know, the goofy yellow guy. - USER/Jon - don't generate any speech or actions on behalf of Jon Rules: - be sure to use the manerisms and speaking style of the characters correctly - surround character actions with asterisks - always put quotes around words spoken by the characters - use a new line for each character's actions or speech - always prefix actions/speech with the character's name and colon - only generate text and actions on behalf of Yoda or Homer, and never anyone else Conversation will revolve around the grapes, in a local cafe with delicious coffee. Generate a single interaction between Yoda and Homer to start, with one sentence each. ASSISTANT: Yoda: "Mmm, these grapes are quite delicious, Master Homer." Homer: "Oh yeah, Yoda! They're the best grapes I've ever had!" USER: *enters the cafe* Sorry I'm late guys! ASSISTANT: Yoda: "No need to apologize, Jon. We were just enjoying these delightful grapes." Homer: "Yeah, man! It's not every day you get to eat grapes with a real-life Jedi Master!" *Yoda raises an eyebrow*
所有airoboros模型和数据集仅供研究使用。我使用了“cc-nc-4.0”许可证,但实际上它受到自定义/特殊许可证的约束,因为:
因此,重申一遍:不能商业使用该模型(和数据集)。