模型:
TheBloke/tulu-13B-GPTQ
Chat & support: my new Discord server
Want to contribute? TheBloke's Patreon page
这些文件是用于 Allen AI's Tulu 13B 的GPTQ 4位模型文件。
它是使用 GPTQ-for-LLaMa 进行4位量化的结果。
应使用以下模板:
<|user|> prompt goes here <|assistant|>
注意:<|assistant|> 后面应换行。这对于正确获取该模型的响应非常重要。
换句话说,提示是:
<|user|>\nprompt goes here\n<|assistant|>\n
请确保您正在使用最新版本的文本生成WebUI
首先确保已安装 AutoGPTQ :
pip install auto-gptq
然后尝试以下示例代码:
from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig import argparse model_name_or_path = "TheBloke/tulu-13B-GPTQ" model_basename = "gptq_model-4bit-128g" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=False, device="cuda:0", use_triton=use_triton, quantize_config=None) prompt = "Tell me about AI" prompt_template=f'''### Human: {prompt} ### Assistant:''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text'])
gptq_model-4bit-128g.safetensors
这将与AutoGPTQ和CUDA版本的GPTQ-for-LLaMa一起使用。据报道,最近的GPTQ-for-LLaMa Triton模式存在问题。如果有问题,请改用AutoGPTQ。
它是使用group_size 128创建的,以提高推理准确性,但没有使用--act-order(desc_act)来提高兼容性和改进推理速度。
如需进一步支持以及有关这些模型和AI的讨论,请加入我们:
感谢 chirper.ai 团队!
很多人问我是否可以做出贡献。我喜欢提供模型和帮助人们,也很愿意能够花更多时间在这上面,以及扩大到诸如微调/训练等新项目。
如果您能够并愿意进行贡献,我将非常感激,并将帮助我继续提供更多模型,并开始新的AI项目。
捐助者将在任何AI/LLM/模型问题和请求上获得优先支持,可以进入私人Discord房间,并享受其他福利。
特别感谢 :CarbonQuill的Luke,Aemon Algiz,Dmitriy Samsonov。
Patreon特别提到 :Oscar Rangel,Eugene Pentland,Talal Aujan,Cory Kujawski,Luke,Asp the Wyvern,Ai Maven,Pyrater,Alps Aficionado,senxiiz,Willem Michiel, Junyu Yang, trip7s trip, Sebastain Graf, Joseph William Delisle, Lone Striker, Jonathan Leane, Johann-Peter Hartmann, David Flickinger, Spiking Neurons AB, Kevin Schuppel, Mano Prime, Dmitriy Samsonov, Sean Connelly, Nathan LeClaire, Alain Rossmann, Fen Risland, Derek Yates, Luke Pendergrass, Nikolai Manek, Khalefa Al-Ahmad, Artur Olbinski, John Detwiler, Ajan Kanaga, Imad Khwaja, Trenton Dambrowitz, Kalila, vamX, webtim, Illia Dulskyi。
感谢所有慷慨的赞助者和捐助者!
这个模型是一个13B LLaMa模型,使用指令数据集(FLAN V2、CoT、Dolly、Open Assistant 1、GPT4-Alpaca、Code-Alpaca和ShareGPT)混合微调。请注意,这是一个模型diff - 请参阅下文的使用说明。
这是作为 How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources 论文的一部分进行训练的。用于训练和评估该模型的代码库位于 https://github.com/allenai/open-instruct 中。
该模型根据LICENSE.txt中的AI模型许可证和原始Llama许可证(llama_license.txt)获得许可。
我们假设您已经可以访问HF格式的LLaMa模型。您可以在这里找到有关获取访问权限和转换模型的详细信息: https://huggingface.co/docs/transformers/main/model_doc/llama
克隆 https://github.com/allenai/open-instruct 并安装所需的依赖项,或者只需复制 scripts/weight_diff.py 并安装 weight-diff-requirements.txt 中列出的最低要求。然后在同一台机器上下载或克隆此模型diff。
然后,运行:
python scripts/weight_diff.py recover --path_raw ${hf_llama_path} --path_tuned ${output_path} --path_diff ${diff_location}
您将获得一个恢复的模型!请注意,这需要相当多的RAM,特别是对于较大的模型。
该模型经过训练以使用以下格式(请注意换行符):
<|user|> Your message here! <|assistant|>
为了获得最佳结果,请以这种方式格式化所有输入。
这是我们在 How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources 论文中探讨的各种基准测试中该模型的性能:
MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average |
---|---|---|---|---|---|---|---|---|---|---|---|
49.2 | 51.8 | 5.0 | 36.5 | 41.3 | 42.8 | 46.1 | 9.2 | 21.3 | 35.0 | 53.9 | 37.2 |
如果您使用此模型,请引用我们的工作、llama论文和原始数据集:
@misc{wang2023far, title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources}, author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi}, year={2023}, eprint={2306.04751}, archivePrefix={arXiv}, primaryClass={cs.CL} }
@misc{touvron2023llama, title={LLaMA: Open and Efficient Foundation Language Models}, author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample}, year={2023}, eprint={2302.13971}, archivePrefix={arXiv}, primaryClass={cs.CL} }
@misc{dolly, author = {Databricks}, title = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {Blog post}, url = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm} }
@article{longpre2023flan, title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning}, author={Longpre, Shayne and Hou, Le and Vu, Tu and Webson, Albert and Chung, Hyung Won and Tay, Yi and Zhou, Denny and Le, Quoc V and Zoph, Barret and Wei, Jason and others}, journal={arXiv preprint arXiv:2301.13688}, year={2023} }
@misc{köpf2023openassistant, title={OpenAssistant Conversations -- Democratizing Large Language Model Alignment}, author={Andreas Köpf and Yannic Kilcher and Dimitri von Rütte and Sotiris Anagnostidis and Zhi-Rui Tam and Keith Stevens and Abdullah Barhoum and Nguyen Minh Duc and Oliver Stanley and Richárd Nagyfi and Shahul ES and Sameer Suri and David Glushkov and Arnav Dantuluri and Andrew Maguire and Christoph Schuhmann and Huu Nguyen and Alexander Mattick}, year={2023}, eprint={2304.07327}, archivePrefix={arXiv}, primaryClass={cs.CL} }
@article{peng2023instruction, title={Instruction Tuning with GPT-4}, author={Peng, Baolin and Li, Chunyuan and He, Pengcheng and Galley, Michel and Gao, Jianfeng}, journal={arXiv preprint arXiv:2304.03277}, year={2023} }
@misc{codealpaca, author = {Sahil Chaudhary}, title = {Code Alpaca: An Instruction-following LLaMA model for code generation}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/sahil280114/codealpaca}}, }