模型:
TheBloke/Manticore-13B-Chat-Pyg-SuperHOT-8K-GPTQ
Chat & support: my new Discord server
Want to contribute? TheBloke's Patreon page
这些文件是用 OpenAccess AI Collective's Manticore 13B Chat 合并 Kaio Ken's SuperHOT 8K 后的GPTQ 4位模型文件。
使用 GPTQ-for-LLaMa 将其量化为4位的结果。
这是一个实验性的新GPTQ模型,可以提供高达8K的上下文大小
使用最新版本的 text-generation-webui 的 ExLlama 进行了测试。
它还通过使用AutoGPTQ的Python代码进行了测试,并且 trust_remote_code=True 。
代码按照以下方式进行了归功:
请仔细阅读以下内容以了解如何使用它。
尚未提供GGML版本,因为llama.cpp尚不支持SuperHOT。正在调查并希望很快解决此问题。
请确保您正在使用text-generation-webui的最新版本
首先确保您已安装AutoGPTQ和Einops:
pip3 install einops auto-gptq
然后运行以下代码。请注意,为了使其工作, config.json 已经在代码中硬编码为8192的序列长度。
如果要尝试使用4096以减少VRAM使用量,请手动编辑 config.json 将 max_position_embeddings 设置为所需值。
from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig import argparse model_name_or_path = "TheBloke/Manticore-13B-Chat-Pyg-SuperHOT-8K-GPTQ" model_basename = "manticore-13b-chat-pyg-superhot-8k-GPTQ-4bit-128g.no-act.order" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=True, device_map='auto', use_triton=use_triton, quantize_config=None) model.seqlen = 8192 # Note: check the prompt template is correct for this model. prompt = "Tell me about AI" prompt_template=f'''USER: {prompt} ASSISTANT:''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text'])
仓库中提供了 llama_rope_scaled_monkey_patch.py 文件,由 @kaiokendev编写。
它理论上可以添加到任何Python UI或自定义代码中,以实现与 trust_remote_code=True 相同的结果。我尚未测试过,应使用 trust_remote_code=True 替代,但是为了完整性和广泛性而包括在内。
manticore-13b-chat-pyg-superhot-8k-GPTQ-4bit-128g.no-act.order.safetensors
这将适用于 AutoGPTQ,ExLlama 和 GPTQ-for-LLaMa 的CUDA版本。有报道称最近GPTQ-for-LLaMa的Triton模式存在问题。如果有问题,请使用AutoGPTQ。
它是使用group_size 128创建的,以提高推理准确性,但没有使用 --act-order (desc_act) 来提高兼容性和改善推理速度。
如需进一步支持和讨论这些模型和AI,请加入我们的 Discord:
感谢 chirper.ai 团队!
我收到很多人询问他们是否可以贡献。我乐于提供模型并帮助人们,非常希望能够有更多的时间进行此类工作,以及扩大到诸如微调/培训等新项目。
如果您有能力和意愿进行贡献,将非常感激,并将帮助我继续提供更多的模型,并开始进行新的AI项目。
捐助者将在任何AI/LLM/模型问题和请求上获得优先支持,可以进入私人Discord频道,并获得其他福利。
特别感谢给: CarbonQuill的Luke,Aemon Algiz,Dmitriy Samsonov。
Patreon特别感谢: zynix,ya boyyy,Trenton Dambrowitz,Imad Khwaja,Alps Aficionado,chris gileta,John Detwiler,Willem Michiel,RoA,Mano Prime,Rainer Wilmers,Fred von Graf,Matthew Berman,Ghost,Nathan LeClaire,Iucharbius,Ai Maven,Illia Dulskyi,Joseph William Delisle,Space Cruiser,Lone Striker,Karl Bernard,Eugene Pentland,Greatston Gnanesh,Jonathan Leane,Randy H,Pierre Kircher,Willian Hasse,Stephen Murray,Alex,terasurfer,Edmond Seymore,Oscar Rangel,Luke Pendergrass,Asp the Wyvern,Junyu Yang,David Flickinger,Luke,Spiking Neurons AB,subjectnull,Pyrater,Nikolai Manek,senxiiz,Ajan Kanaga,Johann-Peter Hartmann,Artur Olbinski,Kevin Schuppel,Derek Yates,Kalila,K,Talal Aujan,Khalefa Al-Ahmad,Gabriel Puliatti,John Villwock,WelcomeToTheClub,Daniel P. Andersen,Preetika Verma,Deep Realms,Fen Risland,trip7s trip,webtim,Sean Connelly,Michael Levine,Chris McCloskey,biorpg,vamX,Viktor Bowallius,Cory Kujawski。
感谢所有慷慨的赞助者和捐助者!
这是SuperHOT的第二个原型,这次是30B与8K上下文,无RLHF,使用与 the github blog 中描述的相同技术。测试表明,模型确实利用了8K的扩展上下文。
您将需要使用monkeypatch或者如果已经使用monkeypatch,请将缩放因子更改为0.25,最大序列长度更改为8192
寻找合并和量化的模型吗?我使用以下配置训练了LoRA:
Manticore 13B Chat 在 Manticore 基础上使用新的数据集进行了扩展,其中包括Pygmalion数据集的去重子集。它还删除了所有使用###的Alpaca样式提示,改为仅使用CHAT样式提示,包括USER:、ASSISTANT:以及、和令牌。
有问题、评论、反馈、捐款还是想提供帮助吗?请在我们的 Discord 上联系我们,或发送电子邮件至 wing@openaccessaicollective.org
Manticore 13B Chat是在以下数据集上进行了微调:同时还使用了原始Manticore 13B的数据集。
Manticore 13B Chat 使用以下数据集的25%进行训练。将数据集合并、随机化,然后分为4个部分。
Manticore 13B
未添加到Manticore 13B的内容:
特别感谢Nanobit对Axolotl的帮助,TheBloke对这些模型进行量化使其对所有人更易访问,ehartford提供的清洁数据集,以及0x000011b对RP数据集的贡献。
在HF Spaces中尝试使用模型。该演示使用量化的GGML模型,以便在较小的GPU(甚至CPU)上快速返回预测结果。量化的GGML可能会导致模型质量的微小损失。
Manticore是在8xA100 80GB 上使用 Axolotl 构建的
Manticore没有使用像RLHF这样的技术来与人类偏好进行对齐,也没有使用像ChatGPT这样的响应循环过滤来部署,因此模型可能会产生问题的输出(特别是在提示要求这样做的情况下)。 从基础模型LlaMa13B进行微调,请参阅其模型卡片的限制部分以获取相关信息。
USER: make up a sonnet in the style of shakespeare about a skier skiing in the glades on a bluebird day. ASSISTANT: A skier doth glide 'neath sun’s glow, His edges sharp to cut the snow, Bluebird sky above, While he doth revel on the slopes below, The cold air biting his nose, But not his spirits, For he is filled with a sense of joy, As he carves through the powdery drifts. His body moves in graceful rhythm, And each turn he makes with ease, He feels like he could ski forever, With no care for the world outside these glades.
USER: tell me a joke about alpacas ASSISTANT: Did you hear about the famous alpaca ballet dancer? She was a prima llama. USER: okay, that is actually pretty good ASSISTANT: What do you call an alpaca in a bowtie? A llama-cholla! USER: why is that funny? ASSISTANT: Because it’s a play on words. “Llama-cholla” sounds like “lame chola,” which is Spanish slang for someone who acts cool or pretentious. So, the joke is implying that alpacas are vain and act cool with their bowties.
USER: What is 44 + 77? ASSISTANT: 121 USER: What is the sum of 33 and 77? ASSISTANT: 110 USER: tell me 5 words that start with “ex” and the sum of 33 and 77. ASSISTANT: exotic, expressive, explicit, experimental, and extravagant. The sum of 33 and 77 is 110. USER: tell me 3 words that start with “egr” and end in “s” ASSISTANT: egret, eggplant, and egrets.
USER: Sally has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? ASSISTANT: Sally has a total of 6 siblings, so she has a total of 6 - 2 = 4 sisters. USER: Sally has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Use multi-step reasoning to explain your answer. ASSISTANT: Sally has a total of three brothers, so she has a total of three * 2 = six siblings. Her family consists of three boys and six girls in total. This means that Sally has three sisters since there are six female relatives and only three male ones. USER: But when brothers have sisters, aren’t they shared or the same? ASSISTANT: No, not every sibling shares their mother. For example, I have two brothers who share one mother but my third brother has a different mother from both of them. USER: Let’s assume they all share the same mother and father. Now explain with multi-step reasoning. ASSISTANT: If Sally has three brothers then that means she has six siblings altogether. Each brother has two sisters, which means that there are twelve female relatives in her family - one for each of her brothers and herself. Since there are only six females in the entire family, that means that Sally’s mother has no other children than her own three sons - so she can’t be the mother of any of the sisters. So all those sisters have to belong to one of the brothers, and since they are shared by two brothers, then each brother has a single sister. Therefore, there is only one sister for Sally, making the final answer 1.