模型:
TheBloke/open-llama-13b-open-instruct-GPTQ
Chat & support: my new Discord server
Want to contribute? TheBloke's Patreon page
这些文件是用于 VMWare's OpenLlama 13B Open Instruct 的GPTQ 4位模型文件。
这是使用 GPTQ-for-LLaMa 进行4位量化的结果。
Below is an instruction that describes a task. Write a response that appropriately completes the request ### Instruction: prompt ### Response:
请确保您正在使用最新版本的text-generation-webui
首先确保您已安装 AutoGPTQ :
pip install auto-gptq
然后尝试以下示例代码:
from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig import argparse model_name_or_path = "TheBloke/open-llama-13b-open-instruct-GPTQ" model_basename = "open-llama-13b-open-instruct-GPTQ-4bit-128g.no-act.order" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=False, device="cuda:0", use_triton=use_triton, quantize_config=None) # Note: check the prompt template is correct for this model. prompt = "Tell me about AI" prompt_template=f'''### Instruction: {prompt} ### Response:''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text'])
open-llama-13b-open-instruct-GPTQ-4bit-128g.no-act.order.safetensors
这将适用于AutoGPTQ、ExLlama和CUDA版本的GPTQ-for-LLaMa。有关GPTQ-for-LLaMa Triton模式的问题报告。如果遇到问题,请使用AutoGPTQ。
它是使用 group_size 128 创建的,以提高推断准确性,但没有使用 --act-order(desc_act)以提高兼容性和推断速度。
如需进一步支持以及关于这些模型和AI的讨论,请加入我们:
感谢 chirper.ai 团队!
我有很多人问我是否可以贡献。我喜欢提供模型和帮助人们,也希望能够花更多时间做这些,并扩展到新的项目,如微调/训练。
如果您能够并愿意贡献,我将非常感激,并将帮助我继续提供更多模型,并开始新的AI项目。
赞助者将优先获得有关所有AI/LLM/模型问题和请求的支持,可以进入专属的Discord房间,以及其他福利。
特别感谢 : CarbonQuill的Luke, Aemon Algiz, Dmitriy Samsonov.
Patreon特别感谢 : Mano Prime, Fen Risland, Derek Yates, Preetika Verma, webtim, Sean Connelly, Alps Aficionado, Karl Bernard, Junyu Yang, Nathan LeClaire, Chris McCloskey, Lone Striker, Asp the Wyvern, Eugene Pentland, Imad Khwaja, trip7s trip, WelcomeToTheClub, John Detwiler, Artur Olbinski, Khalefa Al-Ahmad, Trenton Dambrowitz, Talal Aujan, Kevin Schuppel, Luke Pendergrass, Pyrater, Joseph William Delisle, terasurfer , vamX, Gabriel Puliatti, David Flickinger, Jonathan Leane, Iucharbius , Luke, Deep Realms, Cory Kujawski, ya boyyy, Illia Dulskyi, senxiiz, Johann-Peter Hartmann, John Villwock, K, Ghost , Spiking Neurons AB, Nikolai Manek, Rainer Wilmers, Pierre Kircher, biorpg, Space Cruiser, Ai Maven, subjectnull, Willem Michiel, Ajan Kanaga, Kalila, chris gileta, Oscar Rangel.
感谢我所有慷慨的赞助者和捐赠者!
经过指令调整的全面训练的Open LLama 13B模型。该模型可用于商业用途。
注意:该模型是使用Alpaca提示模板进行训练的。注意:快速分词器会导致编码错误,请在实例化标记器时设置 use_fast = False 参数。注意:模型可能会在处理代码时遇到困难,因为分词器合并了多个空格。
import os import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = 'VMware/open-llama-13b-open-instruct' tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map='sequential') prompt_template = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:" prompt = 'Explain in simple terms how the attention mechanism of a transformer model works' inputt = prompt_template.format(instruction= prompt) input_ids = tokenizer(inputt, return_tensors="pt").input_ids.to("cuda") output1 = model.generate(input_ids, max_length=512) input_length = input_ids.shape[1] output1 = output1[:, input_length:] output = tokenizer.decode(output1[0]) print(output)
我们的 RAIL Github Repository 中将提供微调脚本
待定