模型:
TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ
Chat & support: my new Discord server
Want to contribute? TheBloke's Patreon page
这些文件是用于 OpenAssistant LLaMA 30B SFT 7 的 GPTQ 模型文件。
提供了多个 GPTQ 参数排列,有关提供的选项、参数及其创建方式的详细信息,请参见下面的提供的文件。
这些模型是使用 Latitude.sh 提供的硬件进行量化的。
<|prompter|>{prompt}<|endoftext|><|assistant|>
提供了多个量化参数,以允许您根据硬件和需求选择最佳参数。
每个单独的量化在不同的分支中。请查看下面关于从不同分支中获取的说明。
Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
---|---|---|---|---|---|---|---|
main | 4 | None | True | 16.94 GB | True | GPTQ-for-LLaMa | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
gptq-4bit-32g-actorder_True | 4 | 32 | True | 19.44 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 32g gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
gptq-4bit-64g-actorder_True | 4 | 64 | True | 18.18 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 64g uses less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
gptq-4bit-128g-actorder_True | 4 | 128 | True | 17.55 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 128g uses even less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
gptq-8bit--1g-actorder_True | 8 | None | True | 32.99 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
gptq-8bit-128g-actorder_False | 8 | 128 | False | 33.73 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
gptq-3bit--1g-actorder_True | 3 | None | True | 12.92 GB | False | AutoGPTQ | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
gptq-3bit-128g-actorder_False | 3 | 128 | False | 13.51 GB | False | AutoGPTQ | 3-bit, with group size 128g but no act-order. Slightly higher VRAM requirements than 3-bit None. |
git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ`
请确保您正在使用最新版本的 text-generation-webui 。
强烈建议使用 text-generation-webui 的一键安装程序,除非您知道如何进行手动安装。
首先确保已安装 AutoGPTQ :
GITHUB_ACTIONS=true pip install auto-gptq
然后尝试以下示例代码:
from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig model_name_or_path = "TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ" model_basename = "OpenAssistant-SFT-7-Llama-30B-GPTQ-4bit--1g.act.order" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename use_safetensors=True, trust_remote_code=False, device="cuda:0", use_triton=use_triton, quantize_config=None) """ To download from a specific branch, use the revision parameter, as in this example: model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, revision="gptq-4bit-32g-actorder_True", model_basename=model_basename, use_safetensors=True, trust_remote_code=False, device="cuda:0", quantize_config=None) """ prompt = "Tell me about AI" prompt_template=f'''<|prompter|>{prompt}<|endoftext|><|assistant|> ''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text'])
提供的文件与 AutoGPTQ(CUDA和Triton模式),GPTQ-for-LLaMa(仅测试了CUDA)和Occ4m的GPTQ-for-LLaMa分支兼容。
ExLlama 适用于 4 位的 Llama 模型。请参阅上面提供的文件表以了解每个选项的文件兼容性。
如需进一步支持和讨论这些模型和人工智能,请加入:
感谢 chirper.ai 团队!
我有很多人问我是否可以做出贡献。我喜欢提供模型并帮助人们,并且很想能够花更多时间做这些事情,以及扩大到新的项目,如微调/训练。
如果您能够并且愿意进行贡献,我将非常感激,并且将帮助我继续提供更多模型,并开始进行新的人工智能项目。
捐赠者将在任何与AI / LLM / 模型相关的问题和请求上获得优先支持,可以访问私人Discord房间,以及其他福利。
特别感谢 :来自CarbonQuill的Luke,Aemon Algiz。
Patreon特别提到 :Space Cruiser,Nikolai Manek,Sam,Chris McCloskey,Rishabh Srivastava,Kalila,Spiking Neurons AB,Khalefa Al-Ahmad,WelcomeToTheClub,Chadd,Lone Striker,Viktor Bowallius,Edmond Seymore,Ai Maven,Chris Smitley,Dave,Alexandros Triantafyllidis,Luke@flexchar,Elle,ya boyyy,Talal Aujan,Alex, Jonathan Leane,Deep Realms,Randy H,subjectnull,Preetika Verma,Joseph William Delisle,Michael Levine,chris gileta,K,Oscar Rangel,LangChain4j,Trenton Dambrowitz,Eugene Pentland,Johann-Peter Hartmann,Femi Adebogun,Illia Dulskyi,senxiiz,Daniel P. Andersen,Sean Connelly,Artur Olbinski,RoA,Mano Prime,Derek Yates,Raven Klaugh,David Flickinger,Willem Michiel,Pieter,Willian Hasse,vamX,Luke Pendergrass,webtim,Ghost,Rainer Wilmers,Nathan LeClaire,Will Dee,Cory Kujawski,John Detwiler,Fred von Graf,biorpg,Iucharbius,Imad Khwaja,Pierre Kircher,terasurfer,Asp the Wyvern,John Villwock,theTransient,zynix,Gabriel Tamborski,Fen Risland,Gabriel Puliatti,Matthew Berman,Pyrater,SuperWojo,Stephen Murray,Karl Bernard,Ajan Kanaga,Greatston Gnanesh,Junyu Yang。
感谢所有慷慨的赞助者和捐赠者!
由于Meta AI附加了LLaMA模型的许可证,无法直接分发基于LLaMA的模型。相反,我们为OA模型提供了XOR权重。
感谢Mick编写的xor_codec.py脚本,使该过程成为可能
注意:此过程适用于oasst-sft-7-llama-30b模型。将来可以将相同的过程应用于其他模型,但校验和将不同。
此过程仅在Linux上进行了测试(特别是Ubuntu)。有些用户报告说该过程在Windows上无效。如果您只有Windows机器,我们建议使用WSL。
要使用基于LLaMA的 OpenAssistant 模型,应该有原始 LLaMA 模型权重的副本,并将它们添加到此处的 llama 子目录中。如果您无法获得原始 LLaMA,请参阅下面的斜体提示,了解可能使用的替代方法。
确保 LLaMA 30B checkpoint 与正确的md5sums匹配:
f856e9d99c30855d6ead4d00cc3a5573 consolidated.00.pth d9dbfbea61309dc1e087f5081e98331a consolidated.01.pth 2b2bed47912ceb828c0a37aac4b99073 consolidated.02.pth ea0405cdb5bc638fee12de614f729ebc consolidated.03.pth 4babdbd05b8923226a9e9622492054b6 params.json
如果您没有原始的 LLaMA 权重的副本,并且无法获取,则可能仍然可以完成此过程。有些用户报告称可以使用 this model 作为 XOR 转换的基础。这还将允许您跳至第7步。但是,我们仅支持从 LLaMA 原始checkpoint 开始的转换,并且如果您在使用此替代方法时遇到问题,我们无法提供支持。
重要提示:请按照以下步骤将原始的 LLaMA checkpoint 转换为 HuggingFace Transformers 兼容的格式。如果使用了任何依赖项的错误版本,可能会得到与 XOR 文件不兼容的权重。
python3.10 -m venv xor_venv source xor_venv/bin/activate
git clone https://github.com/huggingface/transformers.git cd transformers git checkout d04ec99bec8a0b432fc03ed60cea9a1a20ebaf3c pip install .
pip install torch==1.13.1 accelerate==0.18.0 sentencepiece==0.1.98 protobuf==3.20.1
accelerate==0.18.0 certifi==2022.12.7 charset-normalizer==3.1.0 filelock==3.12.0 huggingface-hub==0.13.4 idna==3.4 numpy==1.24.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 packaging==23.1 protobuf==3.20.1 psutil==5.9.5 PyYAML==6.0 regex==2023.3.23 requests==2.28.2 sentencepiece==0.1.98 tokenizers==0.13.3 torch==1.13.1 tqdm==4.65.0 transformers @ file:///mnt/data/koepf/transformers typing_extensions==4.5.0 urllib3==1.26.15
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir <input_path_llama_base> --output_dir <output_path_llama30b_hf> --model_size 30B
462a2d07f65776f27c0facfa2affb9f9 ./pytorch_model-00007-of-00007.bin e1dc8c48a65279fb1fbccff14562e6a3 ./pytorch_model-00003-of-00007.bin 9cffb1aeba11b16da84b56abb773d099 ./pytorch_model-00001-of-00007.bin aee09e21813368c49baaece120125ae3 ./generation_config.json 92754d6c6f291819ffc3dfcaf470f541 ./pytorch_model-00005-of-00007.bin 3eddc6fc02c0172d38727e5826181adb ./pytorch_model-00004-of-00007.bin eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model 99762d59efa6b96599e863893cf2da02 ./pytorch_model-00006-of-00007.bin 598538f18fed1877b41f77de034c0c8a ./config.json fdb311c39b8659a5d5c1991339bafc09 ./tokenizer.json fecfda4fba7bfd911e187a85db5fa2ef ./pytorch_model.bin.index.json edd1a5897748864768b1fab645b31491 ./tokenizer_config.json 6b2e0a735969660e720c27061ef3f3d3 ./special_tokens_map.json 5cfcb78b908ffa02e681cce69dbe4303 ./pytorch_model-00002-of-00007.bin
重要提示:现在您应该拥有正确的 LLaMA 权重,并准备好应用 XOR。如果上述校验和与您的校验和不匹配,则存在问题。
python xor_codec.py oasst-sft-7-llama-30b/ oasst-sft-7-llama-30b-xor/ llama30b_hf/
执行过程中您应该期望看到一条警告消息:
在处理 'added_tokens.json' 时出现异常
这是正常的。如果其他文件出现类似的消息,则表示出现了问题。
8ae4537c64a1ef202d1d82eb0d356703 ./pytorch_model-00007-of-00007.bin d84f99d23369e159e50cb0597b6c9673 ./pytorch_model-00003-of-00007.bin f7de50a725d678eb65cc3dced727842f ./pytorch_model-00001-of-00007.bin 27b0dc092f99aa2efaf467b2d8026c3f ./added_tokens.json aee09e21813368c49baaece120125ae3 ./generation_config.json 31a2b04b139f4af043ad04478f1497f5 ./pytorch_model-00005-of-00007.bin a16a2dfacbde77a1659a7c9df7966d0a ./pytorch_model-00004-of-00007.bin eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model baa778a8679d47b085446faf97b72758 ./pytorch_model-00006-of-00007.bin b2d64f2198ab7b53e3b8d12fbcadeb3c ./config.json deb33dd4ffc3d2baddcce275a00b7c1b ./tokenizer.json 76d47e4f51a8df1d703c6f594981fcab ./pytorch_model.bin.index.json ed59bfee4e87b9193fea5897d610ab24 ./tokenizer_config.json 704373f0c0d62be75e5f7d41d39a7e57 ./special_tokens_map.json e836168cdbbb74db51d04f25ed6408ce ./pytorch_model-00002-of-00007.bin
如果是这样,那么您已成功解码了权重,并且应该能够在 HuggingFace Transformers 中使用该模型。如果您的校验和与上述校验和不匹配,则存在问题。
llama-30b-sft-7: dtype: fp16 log_dir: "llama_log_30b" learning_rate: 1e-5 model_name: /home/ubuntu/Open-Assistant/model/model_training/.saved/llama-30b-super-pretrain/checkpoint-3500 #model_name: OpenAssistant/llama-30b-super-pretrain output_dir: llama_model_30b deepspeed_config: configs/zero3_config_sft.json weight_decay: 0.0 residual_dropout: 0.0 max_length: 2048 use_flash_attention: true warmup_steps: 20 gradient_checkpointing: true gradient_accumulation_steps: 12 per_device_train_batch_size: 2 per_device_eval_batch_size: 3 eval_steps: 101 save_steps: 485 num_train_epochs: 4 save_total_limit: 3 use_custom_sampler: true sort_by_length: false #save_strategy: steps save_strategy: epoch datasets: - oasst_export: lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz val_split: 0.05 - vicuna: val_split: 0.05 max_val_set: 800 fraction: 1.0 - dolly15k: val_split: 0.05 max_val_set: 300 - grade_school_math_instructions: val_split: 0.05 - code_alpaca: val_split: 0.05 max_val_set: 250